Sunday, January 31, 2016

How to Save Bitcoin's Node Network from Centralization - CoinDesk

Jameson Lopp is a software engineer at BitGo, creator of statoshi.info and founder of bitcoinsig.com. He enjoys building web services and is intrigued by problems of scale. 

In this feature, Lopp examines the causes of the diminishing number of bitcoin nodes across the network and discusses what it might take to reverse the trend.

cost benefit

Decentralization is, I would argue, the most important property of the bitcoin network. Without it, many of bitcoin's other properties, such as its ability to facilitate transactions without a third party or provide a permissionless platform for innovation, would be compromised.

There are many facets that contribute to bitcoin's decentralization, the most important of which is the network of nodes that comprise bitcoin's infrastructure by holding copies of the blockchain and sharing block and transaction data across the network.

And yet, despite their importance, the number of nodes has been dwindling for years, arguably centralizing the network.

Slide077

I've been writing about the decline in node counts for a couple years and have been monitoring my nodes with the Statoshi software I released in 2014. Because the performance of nodes and the bitcoin network in general has become a hot topic in recent scalability debates, I hope to shed some light on a few points that haven't received much attention.

In the early days of bitcoin, the only way to participate on the network was by running a full node. Over the years, the ecosystem has flourished and now there are many wallet options for users to choose from. Most wallets today are either lightweight clients that query full nodes for data, or they are hosted by third parties and thus do not require the user to run a full node.

As a result, most new users are opting against running a full node, while some existing node operators have chosen to shut theirs down. How many nodes does bitcoin really need?

Depending upon your perspective, you could reach several conclusions:

  • One: Since bitcoin is trustless, the only node that matters is the node that you run.
  • Hundreds: Or enough to make it infeasible for any single entity to shutdown a significant portion of the network due to geographic and jurisdictional diversity.
  • Thousands: Or enough to support high demand from SPV clients for connection slots. SPV clients are not necessarily just wallets, but can also be peer-to-peer apps like Lighthouse.
  • On the opposite end of the spectrum, we can never have too many nodes or decentralize the network too much. That said, how should we react to the fact that fewer than 1% of bitcoin users run a full node?

    When I asked Bitcoin Core develop Pieter Wuille several years ago about the importance of node counts, he had this to say:

    "What full nodes do is make sure the network is honest. And this is not so much a question of how many there are, it's more about how hard it is to run one."

    Pieter is one of the most prolific bitcoin developers in terms of code and features added to the protocol; he knows what he's talking about. Pieter is also the author of Segregated Witness, which will hopefully provide us with a path to implement various scalability solutions for bitcoin.Because bitcoin has become popular enough that we are running into the 1MB hard cap on block sizes, there is a great deal of contention about how we can scale the network in order to support more users without adversely affecting bitcoin's decentralization.

    Block size debate

    One argument that comes up often during the block size debate is based upon cost of running a node. There is a theory that higher costs (such as additional computational resource requirements to validate and relay larger blocks) will result in fewer nodes and vice versa.Developer Paul Sztorc introduced the concept of CONOP (cost of node-option) in his excellent post, Measuring Decentralization. He argues that lower costs should result in more people undertaking actions that are beneficial to them. This argument makes sense to me if you assume that there aren't more variables at play than just the cost of operating a node.

    Later in this post we'll discuss other factors that likely affect CONOP.

    After observing and participating in scalability debates over the past year, I find myself continually coming back to the same problem: there are no defined minimum resource requirements for running a node.

    As a result there is no target for bitcoin developers to take into account when discussing the possibility of making protocol changes that would result in increased resource requirements to run a full node. If a minimum specification is to be developed, it should probably be based upon current hardware that is being used to run full nodes.An ARM-based device such as a Raspberry Pi or ODROID+ appears to be the current minimum viable specs to run a node. It can keep up with 1MB blocks, though it takes two weeks to perform the initial blockchain sync (to block 390,000) due to the low-powered CPU.

    You can buy a Bitseed for $170 or a Bitcoin Mini for $140. If you're tech savvy you can build your own Raspberry Pi node  for $100 or you can build a fairly powerful node for about $200 that should be able to perform well for several years.

    Screen Shot 2016-01-28 at 4.47.30 PM

    Another overlooked problem when debating the acceptable cost of running a node is that we have never defined the target user base for running a full node.

    Demographic polls that have been conducted over the years continue to indicate that most bitcoin users are technically-oriented Caucasian males under the age of 30, but this is a reflection most early technology adopters.

    There seems to be a general sentiment in the community that in order for bitcoin to succeed long-term, we need to find a way to bring it to the masses.

    Still, as the following chart from BitNodes shows, nodes are heavily concentrated in North America and Western Europe.

    Screen Shot 2016-01-28 at 4.48.53 PM

    Who do we want running a full node? The naive answer would be "everyone," but clearly that's not feasible since Internet access is not yet ubiquitous.

    I suspect that reliable affordable broadband Internet access is a major reason for the current geographic distribution of nodes.

    Gavin Andresen once said:

    "Most ordinary folks should NOT be running a full node. We need full nodes that are always on, have more than eight connections, and have a high-bandwidth connection to the Internet."

    Data collected with Statoshi shows that a highly connected node needs on average 200 Kb/s downstream and 1.5 Mb/s upstream, though usage is much spikier and can easily see peaks of 2 Mb/s downstream and 40 Mb/s upstream.

    According to Akamai's State of the Internet report, the average available bandwidth is 5 Mb/s, but their list only covers a quarter of the world.

    Estimates show that as of 2014 only 60% of the global population is using the Internet.

    A minimum node specification

    A well-designed minimum specification should set targets for the performance characteristics desired for a node, the resources required to meet those performance targets and a cost to obtain hardware that meets the performance targets.

    I'd recommend that it incorporate logic similar to that developed by Jonas Nick, Greg Sanders, and Mark Friedenbach for block size validation costs. Their approach is well thought-out, though a min spec would need to be more complex because it would have additional dimensions.

    For example, a min spec might look something like this:

  • Target hardware resource cost: $200
  • Target worst case time to validate a block: 10 seconds
  • Minimum network I/O: 2 Mb/s
  • Minimum disk I/O: 2 Mb/s
  • Minimum CPU: 5,000 MIPS
  • Minimum RAM: 1 GB
  • Jean-Paul Kogelman gave a great example of how an established minimum specification would help assist with decision-making during scalability debates by examining recent transaction signature verification cost changes.

    In versions of Bitcoin Core before 0.12, OpenSSL is used to verify signatures. As of 0.12, signatures are verified with secp256k1, which is approximately five times faster than OpenSSL. As a result, transaction (and thus block) verification time should become nearly five times faster.

    Since this should drop the worst case time to verify a block by nearly 80%, the minimum specification then gives us a simple binary choice:

  • Adjust the minimum resource requirements downward appropriately
  • Adjust other parameters such as number of signature operations per transaction and number of transactions per block upward appropriately to bring us back in line with the minimum performance targets.
  • When changes are proposed to the protocol that have a performance impact, if a minimum specification is available then it should be clear how it is affected by the changes. As technology progresses and the cost of computational resources drops, it should also be clear how the resource requirements can be increased without raising the cost of operating a node.

    Thus, the appropriate options for responding to changes should be less controversial than what we've experienced with the block size debate. If, for example, it is clear that node operators who are running hardware on minimum requirements will not be adversely affected by increasing the allowable signature operations per block to match the performance gain from secp256k1, it should not be controversial to increase it.

    Cost versus benefit

    I find it to be an admirable goal to try to keep node operation costs low and accessible to the average user.

    On the other hand, if we keep the resource requirements of nodes at the level of whatever the latest Raspberry Pi model on a (global average) residential Internet connection can handle, I'm not sure how helpful it will be if the demand for inclusion in blocks results in transaction fees that price out more users.

    Stated differently, if the cost of using the network rises to the point of excluding the average user from making transactions on bitcoin's blockchain, then they probably aren't going to care that they can run a node at trivial cost. Think of it as a balance between the cost of transaction verification and the cost of transacting.

    Layer-two networks (such as Lightning Network and 21's micropayment network) can certainly play a role in easing the burden here, but remember that even users of layer two networks will eventually need to settle against bitcoin's blockchain.

    There are numerous costs to run a node, such as:

  • Initial learning curve (time cost)
  • Installation, configuration and initial sync cost (time, bandwidth, CPU)
  • Ongoing running costs (bandwidth, CPU, RAM, disk)
  • Maintenance costs (time to perform troubleshooting and upgrades).
  • The initial learning curve to see the value of bitcoin can take weeks or months. Figuring out how to run a node can take a few hours – I'm fairly certain most people never even make it to the port forwarding step.

    Initial sync time will take from several hours to several weeks depending upon the machine's specs. I'd subjectively peg maintenance costs at one hour per month in a worst-case scenario.

    Thus far we've examined the cost of running a node from a variety of perspectives. It's sensible to theorize that higher costs will result in fewer nodes and lower costs will result in higher nodes - but what if the cost isn't the only factor?

    BitPay CEO Stephen Pair succinctly stated:

    "There are only as many nodes on the Bitcoin network as there is demand to perform independent and trustless validation of transactions."

    I think that Pair and Stzorc are both correct, and thus the node count is a function of the demand for trustless transaction validation versus the cost of running a node. As such, I'd posit that node count is also dependent upon the value being stored and transacted by bitcoin users.

    While some claim that running a node today is purely altruistic, there are incentives for doing so:

  • Investment: If you're highly invested in bitcoin, you may wish to support the network in order to protect that investment.
  • Performance: It is orders of magnitude faster to query a local copy of the blockchain as opposed to querying blockchain data services over the Internet.
  • Permissionlessness and censorship resistance: By receiving and sending transactions from your own node, no one has the power to stop you from doing so.
  • Privacy: If you're querying other nodes or services about blockchain data, they can use those queries to try to deanonymize you.
  • Trustlessness: Owning a copy of the ledger that you have validated yourself means you don't have to trust a third party to be honest about the state of the ledger.
  • It is my perspective that, instead of aiming for any individual to run a node, the goal should be for anyone with a nontrivial amount of value in bitcoin to run a node. Those who have the most value at risk have the greatest incentive to expend resources to protect their assets by operating in a trustless manner.

    We've seen BTCC recently deploy 100 nodes and we know many other bitcoin businesses run their own nodes. I myself oversee the operation of multiple mainnet and testnet nodes on behalf of BitGo and also run several nodes personally because I have a great deal of resources invested in bitcoin and desire to support the network.

    If a user only owns $100 worth of bitcoin, then it doesn't make much sense for them to run a full node unless the time and resource cost to run a node is on the order of a few minutes and a few pennies.

    In order to get perspectives from bitcoin users regarding their decision to run or not run a full node, I ran a survey and collected more than 500 responses. This is clearly not a rigorously conducted scientific poll, but hopefully it's better than nothing.

    You can view the high level analytics here and the raw data is available here.

    Some key takeaways from from this survey:

  • 24% of those surveyed used to run a full node but don't any longer
  • 42% of non-operators don't see any incentive to run a node
  • 44% or more of node operators use their node for their own direct benefit
  • 57% of users are willing to dedicate over 100 KB/S of upstream bandwidth to a node
  • 58% of users are not willing to pay more than $10/month to run a node
  • 81% of node operators run a node at home.
  • The most surprising result was that there appears to be no correlation between a user's investment in bitcoin and their interest in running a node.

    This may have been too vague a question, however, since it didn't ask for specific monetary amounts.

    I still believe that any entity (especially a business) that transacts or stores significant amounts of value is more incentivized to run a node.

    Screen Shot 2016-01-30 at 12.37.23 PM

    Screen Shot 2016-01-30 at 12.38.56 PM

    Conclusions

    Recall the often cited theory that higher costs will result in fewer nodes.

    This may not be a valid assumption, since higher transaction volume may be a result of higher adoption and thus more entities willing to run full nodes.

    Yes, the cost will be higher and may very well rise over the $10 a month threshold that the average user is (currently) willing to pay, but if the utility of the bitcoin network continues to increase and more entities are transacting large amounts of value, they will have greater incentive to pay higher costs to operate in a trustless manner.

    On the other hand, we should also keep in mind that there is little use participating in a decentralized system when the validation cost is low but the transaction cost is extremely high due to contention for block space.

    If we're approaching the block size debate from a resource usage standpoint, it seems to me that someone is going to be excluded either way. Not raising the block size will exclude some users from sending transactions while raising the block size will exclude some users from running nodes.

    There are many variables at play and we should strive keep them in balance so that we can grow the ecosystem while keeping it decentralized.

    In order of decreasing priority, I recommend that bitcoin developers:

    Determine a minimum resource specification for running a full node with target performance characteristics such as worst case time to validate a block.

    Focus on increasing the transaction volume that the bitcoin network can support, thereby increasing its utility and the number of users (and use cases) it can service. As a result, there should be more entities performing high value storage and transfer that will be incentivized to run their own nodes.

    Focus on making it easier to run a node from a learning curve standpoint. This should also occur naturally as bitcoin builds a longer history and reputation.

    Make it easier to run a node from a computational resources standpoint. Enabling a node to instantly run in SPV mode while syncing the blockchain in the background would be a nice first step. Bootstrapping a node from UTXO commitments would be a giant leap forward.

    Investigate directly financially incentivizing node operation such as by providing data services in exchange for fees.

    If we can keep the cost of running a node from increasing at a rate faster than the value of running a node, we should be able to keep the network infrastructure decentralized even while increasing the burden placed upon node operators.

    The demographics of node operators will likely continue to change, but I encourage bitcoin users to embrace changes to the ecosystem so long as the fundamental property of decentralization remains intact.

    Follow Jameson on Twitter.

    Cost benefit image via Shutterstock

    Bitcoin ProtocolBlock SizeNodes