Testing full validation syncing performance of 7 Bitcoin node implementations.
As I’ve noted many times in the past, backing your bitcoin wallet with a fully validating node gives you the strongest security model and privacy model that is available to Bitcoin users. Three years ago I started running an annual comprehensive comparison of various implementations to see how well they performed full blockchain validation. Now it's time to see what has changed over the past year!
The computer I use as a baseline is high-end but uses off-the-shelf hardware. I bought this PC at the beginning of 2018. It cost about $2,000 at the time. I'm using a gigabit internet connection to ensure that it is not a bottleneck.
Note that no Bitcoin implementation strictly fully validates the entire chain history by default. As a performance improvement, most of them don’t validate signatures before a certain point in time. This is considered safe because those blocks and transactions are buried under so much proof of work. In order for someone to create a blockchain that had invalid transactions before that point in time would cost so much mining resources, it would fundamentally break certain security assumptions upon which the network operates.
For the purposes of these tests I need to control as many variables as possible; some implementations may skip signature checking for a longer period of time in the blockchain than others. As such, the tests I'm running do not use the default settings - I change one setting to force the checking of all transaction signatures and I often tweak other settings in order to make use of the higher number of CPU cores and amount of RAM on my machine.
The amount of data in the Bitcoin blockchain is relentlessly increasing with every block that is added, thus it's a never-ending struggle for node implementations to continue optimizing their code in order to prevent the initial sync time for a new node from becoming obscenely long. After all, if it becomes unreasonably expensive or time consuming to start running a new node, most people who are interested in doing so will chose not to, which weakens the overall robustness of the network.
Last year's test was for syncing to block 655,000 while this year's is syncing to block 705,000. This is a data increase of 20% from 307.8GB to 369.5GB. As such, we should expect implementations that have made no performance changes to take about 20% longer to sync than 1 year ago.
What's the absolute best case syncing time we could expect if you had limitless bandwidth and disk I/O? Since you have to perform over 2.08 billion ECDSA verification operations in order to reach block 705,000 and it takes my machine about 4,600 nanoseconds per operation via libsecp256k1... it would take my machine 2.66 hours to verify the entire blockchain if bandwidth and disk I/O were not bottlenecks. Note that last year it took my machine 7,000 nanoseconds per ECDSA verify operation; the secp256k1 library keeps being further optimized.
On to the results!
On initial start bcoin spent a long time just trying to connect to peers it had saved locally from last year's run. For some reason it didn't start syncing even though it had connected to 7 of the 8 desired outbound peers. After 20 minutes I killed bcoin, deleted the hosts.json file, and tried syncing again. It began downloading blocks immediately without waiting to have 8 peers connected.
On its first run, downloading blocks from publicly available peers:
- Reached block 655,000 in 40 hours 30 min. 85% slower than last year.
- Reached block 705,000 in 47 hours. This is 114% longer than last year!
Something weird's happening on the public network that I'll investigate and discuss in a later post. Syncing from a node on my local network did much better.
- Reached block 655,000 in 24 hours 9 min. 10% slower than last year.
- Reached block 705,000 in 27 hours 46 min. 27% longer than last year.
In terms of disk I/O, bcoin 2.2.0 syncing to height 705,000 used:
- 90 MB disk reads
- 3.5 TB disk writes
Why did bcoin take more than the expected 20% longer to sync this year? A few theories:
- bcoin's utxo cache is limited in comparison to other nodes due to nodejs heap size, thus it ends up doing far more disk writes.
- perhaps bcoin's p2p networking logic isn't as optimized when only downloading from a single peer on my local network.
Bitcoin Core 0.22
My first run syncing from publicly available peers took exactly 8 hours which was 40% longer than last year, far worse than the expected 20% increase!
Syncing from a node on my local network:
- Reached block 655,000 in 5 hours 29 min. 3% faster than last year.
- Reached block 705,000 in 6 hours 34 min. 15.5% longer than last year.
The full sync used:
- 10GB RAM
- 140MB disk reads
- 393GB disk writes
It only took 24% longer to sync 20% more data, so they seem to be treading water lately. I do believe there is a lot of room left for improvement, though - especially in terms of UTXO caching. Due to the amount of disk I/O (13 TB of disk writes), I expect sync time for btcd on a spinning disk would be far longer.
I made the following changes to get maximum performance according to the documentation:
- Installed secp256k1 library and built gocoin with sipasec.go
- Disabled wallet functionality via
Gocoin still has the best dashboard of any node implementation.
Gocoin even has some ECDSA benchmark tests that you can run quite easily. I ran them several times and got a result of each ECDSA verification taking ~7,000 nanoseconds. Then I updated and recompiled libsecp256k1 and the benchmark tests showed each verification taking ~4,600 nanoseconds.
However, after 20 minutes of syncing... the node crashed! It received corrupt block data from a peer. To be specific, the number of transactions in the block did not match the "transaction length" field in the block data.
I contacted the software maintainer and within a matter of minutes they issued a one line patch that I applied manually, rebuilt, and started a new sync.
- Reached block 655,000 after 6 hours 45 min. 12% faster than last year.
- Reached block 705,000 in 8 hours 6 min. 6% longer than last year.
- 14 GB RAM
- 701 GB download bandwidth
- 22 MB disk reads
- 298 GB disk writes
It's particularly noteworthy that Gocoin does not seem to be suffering from the public network bottleneck issues I'm seeing in other implementations this year. My initial guess is that this is for 3 reasons:
- Gocoin makes 20 peer connections rather than 8
- It allocates a larger cache for downloading future blocks that have not yet been verified - it usually keeps 2500 to 5000 blocks in this cache, while other implementations use more like ~1,000 blocks.
- Seems like it may be downloading the same block from multiple peers and storing whichever block gets downloaded first.
Libbitcoin Node 3.6.0
Libbitcoin Node wasn’t too bad — though I had to build from source since there isn't a recently precompiled release. So I had to clone the git repository, checkout the “version3” branch and run the install.sh script to compile the node and its dependencies.
Libbitcoin Node takes 27% longer to sync to chain tip this year than it did last year.
My libbitcoin config:
peer = <local network node IP address>:8333
# I set this to the number of virtual cores since default is physical cores
cores = 12
checkpoint = 000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f:0
cache_capacity = 100000
block_latency_seconds = 5
During the source of syncing, Libbitcoin Node used:
- 4.5 TB disk reads
- 62 TB disk writes
- 6 GB RAM
- 700 GB download bandwidth
Seems there's probably more caching improvements that could be made. While it was using all the CPU cores, they were only hovering around 30%. The bottleneck is disk I/O as I was seeing disk reads hovering in the 150 MB/s to 220 MB/s range.
Mako is a new implementation by Chris Jeffrey that's written in C. In order to create a production optimized build:
I manually enabled script verification threading by uncommenting this line.
Next I built the project via: cmake . -DCMAKE_BUILD_TYPE=Release
And ran it with
CPU usage was only around 50% - the bottleneck here is clearly disk I/O, as I observed mako constantly writing 100 MB/S. This is because mako has not yet implemented a UTXO cache and thus has to write UTXO set changes to disk every time it processes a transaction.
Mako also started causing high load / 10 second pauses on my machine after block 400,000 or so. This seems to be some kind of OS / file system flushing issue.
During the course of syncing, Mako used:
- 10 GB disk reads
- 6 TB disk writes
On October 15th 2021 the Parity Bitcoin README was updated to note:
THIS IS UNMAINTAINED HISTORICAL SOFTWARE
As such, I'll no longer be testing this implementation.
Last year's Stratis sync was a failure because their UTXO database did not actually delete UTXOs and thus you needed several terabytes of disk space to complete syncing. Since then they have released a completely new implementation in a different Github repository.
My Stratis bitcoin.conf:
7.5 GB RAM
13.5 GB disk reads
1 TB disk writes
It froze at height 362314 and then I had to force kill the process. I was able to run the process again and it resumed syncing. Unfortunately it only ran another 2 days before crashing. I'm marking this as "incomplete."
I made a note to check out warpd when I heard about it earlier this year. Unfortunately it looks like they never got to the point of actually storing and syncing blockchain data; at the moment it's a barebones network protocol shell that you can run to interact with a peer node one message at a time.
Disk Resource Usage
Reading and writing data to disk is one of the slowest operations you can ask a computer to perform. As such, node implementations that keep more data in fast-access RAM will likely outperform those that have to interact with disks. It's worth noting that some implementations like Bcoin may be underreported due to spawning child processes whose resource usage doesn't get aggregated up to the master process.
- Bitcoin Core v22: 6 hours, 53 minutes
- Gocoin 1.9.9: 8 hours, 6 minutes
- Mako 3d8a5180: 1 day, 10 minutes
- Bcoin 2.2.0: 1 day, 3 hours, 47 minutes
- Libbitcoin Node 3.2.0: 1 day, 22 hours, 41 minutes
- BTCD v0.22.0-beta: 4 days, 7 hours, 15 minutes
- Stratis 3.0.6: Did not complete; definitely over a week
Rankings remain unchanged from last year other than due to the new entrant of Mako.
Delta vs Last Year's Tests
We can see that most implementations are taking longer to sync, which is to be expected given the append-only nature of blockchains. Remember that the total size of the blockchain has grown by 20% since my last round of tests, thus we would expect that an implementation with no new performance improvements or bottlenecks should take ~20% longer to sync.
- Bitcoin Core v22:+1 hour, 5 minutes (15.5% longer)
- Gocoin 1.9.9: +1 hour, 21 minutes (15.5% longer)
- BTCD v0.22.0-beta: +19 hours, 44 minutes (24% longer)
- Bcoin 2.2.0: +3 hours, 37 minutes (27% longer)
- Libbitcoin Node 3.2.0: +9 hours, 11 minutes (33% longer)
As we can see, Bitcoin Core and Gocoin are the only implementations that have improved their syncing performance since last year's tests. I think this is mostly attributable to secp256k1 library optimizations.
Exact Comparisons Are Difficult
While I ran each implementation on the same hardware to keep those variables static, there are other factors that come into play.
- There’s no guarantee that my ISP was performing exactly the same throughout the duration of all the syncs.
- Some implementations may have connected to peers with more upstream bandwidth than other implementations. This could be random or it could be due to some implementations having better network management logic.
- Not all implementations have caching; even when configurable cache options are available it’s not always the same type of caching.
- Not all nodes perform the same indexing functions. For example, Libbitcoin Node always indexes all transactions by hash — it’s inherent to the database structure. Thus this full node sync is more properly comparable to Bitcoin Core with the transaction indexing option enabled.
- Your mileage may vary due to any number of other variables such as operating system and file system performance.
- As I mentioned a few times, something odd is happening with publicly available nodes that seems to be slowing down some of the implementations. That investigation will require a separate article.
Update: I ended up doing more research on the bandwidth made available by peers on the network and published my results here:
Gocoin & Libbitcoin Node do well even with the current state of the publicly available nodes on the network because they are bandwidth greedy. They'll download a block from multiple peers and take the one that arrives first. Other nodes would likely benefit from adopting this approach, as bandwidth can be a bottleneck, especially given the unpredictability of peers on the network.
Given that the strongest security model a user can obtain in a public permissionless crypto asset network is to fully validate the entire history themselves, I think it’s important that we keep track of the resources required to do so.
We know that due to the nature of blockchains, the amount of data that needs to be validated for a new node that is syncing from scratch will relentlessly continue to increase over time. Thus far the tests I run are on the same hardware each year, but on the bright side we do know that hardware performance per dollar will also continue to increase each year.
It's important that we ensure the resource requirements for syncing a node do not outpace the hardware performance that is available at a reasonable cost. If they do, then larger and larger swaths of the populace will be priced out of self sovereignty in these systems.