The Challenges of Optimizing Unspent Output Selection
When we talk about bitcoins being stored on the block chain, most people speak in terms of “address balances” which makes sense from an accounting perspective, but this is not how coins are actually represented in the data structure of the block chain. Instead of an address “containing coins,” “coins” are actually stored as “unspent transaction outputs” or UTXOs for short. A UTXO can be associated with a bitcoin address, though you can also have many different UTXOs that are associated with the same bitcoin address. The “address balance” is the sum of all of the values of UTXOs associated with the address. To learn more about UTXOs from a non-technical viewpoint, I recommend reading Richard Gendel Brown’s “Welcome to Bitcoin Island.” For a technical overview, check out the developer guide.
When Bitcoin wallet software creates a new transaction, it has a fair amount of flexibility in how it can arrange the internal data of the transaction. This is because a user instructs the wallet: “send X bitcoins to address Y” and the wallet needs to comb through the user’s UTXOs in order to find enough outputs with an aggregate value that reaches the target of X bitcoins. Unfortunately there is not a simple, straightforward way to go about selecting UTXOs because there are several considerations to be made.
The naive approach would be to simply look for the smallest output that is larger than the amount you want to spend and use it, otherwise start adding the next largest outputs until you have enough outputs to meet the spend target. However, this leads to fracturing of outputs until the wallet becomes littered with unspendable “dust.” This can be both a performance problem and an irritant to end users if they ever try to sweep a wallet that only has a tiny balance remaining — in a wallet that has been used for many transactions it can result in having so many UTXOs that it’s not feasible to spend them all in a single transaction.
Although a bitcoin transaction can contain hundreds of inputs and hundreds of outputs, this comes at a cost. The more inputs and outputs a transaction has, the larger the data size of the transaction. At time of writing, Bitcoin nodes will reject transactions that are larger than 100KB in size. Also, because the size of blocks is limited (currently 1 MB) you are competing with other people to get your transaction confirmed by miners adding it to a block. If you broadcast a large data size transaction with no fee or an insignificant fee, you risk nodes refusing to relay it and you risk miners choosing to deprioritize confirming it in favor of transactions with higher fees per bytes of data. For reference, I track a number of metrics around transactions and fees with Statoshi — you can see here that the average fee at time of writing is currently ~30 satoshis per byte of transaction data.
If we’re going to optimize the algorithm by which a wallet selects UTXOs for use in transaction creation, we have to decide what our goals are and thus for which transaction attributes we want to optimize. Four broad goals for which I can envision optimizing UTXO selection are as follows:
- Preventing block chain bloat:
A) The creation of small change outputs should be avoided when possible. They have performance costs both for the wallet and for everyone who ingests the block chain. They can also be more expensive for the end user to spend due to dust rules and market competition around fees.
B) Consolidation of small UTXOs: it is natural that over the course of time a wallet will end up having control over many small UTXOs. In order to decrease the size of the wallet’s UTXO set and thus the entire block chain’s UTXO set, it is preferable to spend many very small UTXOs in a single transaction in order to remove them from the UTXO set. - Supporting High Transaction Volume:
A) For high volume wallets that send more than one transaction every few blocks it’s important to keep plenty of confirmed UTXOs available. Otherwise the wallet will have to spend unconfirmed UTXOs, which makes a transaction impossible to confirm until the parent transaction is first confirmed. Thus you may want to create multiple change outputs if the wallet doesn’t have many confirmed UTXOs. - Privacy:
A) UTXOs should be picked non-deterministically and as few different public keys as possible should be involved. When the wallet selects a UTXO, any other UTXOs assigned to the same public key should be preferred in selection.
B) Small inputs that are significantly smaller than the size of the change and don’t share their public key with a larger input shouldn’t be used, as they increase the transaction fee and decrease privacy.
C) If a change output has to be created, it would be preferable to create a change output that is of the same magnitude as the payment value. This helps obscure which output went to a recipient and which was change back to the sender.
D) Change outputs should be inserted at a random position for transactions with multiple outputs, not always first or last. - Minimizing transaction fees:
A) A transaction input set should be preferred when it costs less to send. Which means that the transaction should be as simple and small in data size as possible, having few inputs and outputs. Prioritize only using a single UTXO as the transaction’s input.
B) Prioritizing selection of UTXOs by coin age can also come into play at this point — by sending coins that are old enough a transaction will be eligible to be determined as “high priority” by miners and be confirmed with no fee attached.
C) If spending a UTXO costs more (via additional fee requirements) than what it contributes to the transaction’s output value, it should not be added.
How do real world wallets work? Well, the Bitcoin Core wallet uses some fairly complex logic:
- If any single UTXO matches the Target (spend value) exactly, it will be used.
- If the sum of all UTXOs smaller than the Target happens to match the Target exactly, they will be used. (This prevents bugs while sweeping a wallet.)
- If the sum of all UTXO smaller than the Target doesn’t exceed the target, the smallest UTXO that is larger than the Target will be used.
- Else Bitcoin Core does 1000 rounds of randomly combining unspent transaction outputs until their sum is greater than or equal to the Target. If it happens to find an exact match, it stops early and uses that. The reasoning for this step is noted in the code: “the randomness serves no real security purpose but is just needed to prevent degenerate behavior.”
- Otherwise it finally settles for the smaller of either the smallest UTXO greater than the Target or the smallest combination of UTXO it discovered in Step 4.
- When constructing the final transaction, if any change outputs are small enough to be considered “dust,” instead of creating the outputs Bitcoin Core instead leaves them out and donates the value to the miner as part of the transaction fee.
Bitcoin Core’s logic is optimized to minimize the size of change outputs, that is, goal 1A listed above. Is this the “best” or “correct” way to select UTXOs? It depends upon your perspective — if you value privacy, minimizing transaction fees, or UTXO consolidation then it is suboptimal.
It’s clear that there is no “one size fits all” solution to this problem, and in fact the three broad optimization goals outlined above tend to be in direct opposition. The answer for wallet authors, given the incompatible goals, may be to give the end user more control over the wallet’s UTXO selection algorithm. It should be an advanced feature for power users, but I can imagine a simple drop down where the user selects between optimizing for “Fees / Performance / Privacy.”
These implementation choices are up to wallet engineers and I’m calling on them to be good stewards of the block chain. Remember that we are writing data to a shared resource; we might as well do so prudently. At time of writing there are 17.6M UTXOs that require 596MB of data to store. If Bitcoin continues to increase in popularity, these numbers must necessarily rise. But by being thoughtful block chain engineers we can at least prevent the UTXO set from growing faster than is needed.