Merkle Trees & SPV Proofs

A block header has room for exactly one 32-byte commitment to its transactions — the Merkle root. But a block can hold thousands of transactions. How can 32 bytes faithfully stand in for all of them, in a way that lets anyone prove a specific transaction is inside without downloading the rest? The answer is a Merkle tree, and it is the quiet workhorse that makes lightweight (“SPV”) wallets possible. This page builds the tree from hash functions up, then shows the proof that powers it.

Building the tree

Take every transaction in the block, hash each one, then repeatedly pair and hash until a single hash remains:

                      ROOT = H(H_AB , H_CD)        ← goes in the block header
                     /                    \
            H_AB = H(H_A,H_B)      H_CD = H(H_C,H_D)
             /          \            /          \
        H_A=H(txA)  H_B=H(txB)  H_C=H(txC)  H_D=H(txD)
           |           |           |           |
          txA         txB         txC         txD

Each node is the hash of its two children; the leaves are the (double-SHA-256) hashes of the individual transactions. The single hash at the top is the Merkle root. Because hashing is an avalanche function, changing any single transaction changes its leaf, which changes its parent, and so on all the way up to the root — and the root sits in the header, which sits under the block’s proof of work. The transactions are cryptographically welded to the block.

The payoff: a Merkle proof in O(log n)

The real magic isn’t the root — it’s the proof of inclusion, also called an authentication path or Merkle branch. To convince someone that txC is in a block, you don’t hand them all the transactions. You hand them just the sibling hashes along the path from txC to the root:

To prove txC is in the block, provide:  H_D  and  H_AB

Verifier recomputes:
    H_C  = H(txC)               (they hash the tx themselves)
    H_CD = H(H_C , H_D)         (using provided H_D)
    ROOT = H(H_AB , H_CD)       (using provided H_AB)

If ROOT == the header's Merkle root  →  txC really is in this block.

For a block of n transactions, the path has only about log₂(n) hashes. A block with a million transactions would need a proof of roughly 20 hashes — under a kilobyte — instead of the entire multi-megabyte block. That logarithmic scaling is the entire reason Merkle trees exist here: it makes “prove membership” absurdly cheap.

Block size (txs)	Hashes in a Merkle proof
16	4
1,024	10
1,048,576	20

SPV: verifying without the whole chain

This is what lets a phone wallet work. A full node stores and validates every block in full. An SPV (Simplified Payment Verification, from Section 8 of Satoshi’s whitepaper) client does not. Instead it downloads only the 80-byte headers of every block — a few hundred megabytes for all of history — which lets it follow the most-work chain and check each block’s proof of work. When it needs to know whether a specific payment confirmed, it asks a full node for a Merkle proof and verifies the transaction against the header it already trusts:

SPV client holds:  every block HEADER (incl. trusted Merkle roots + PoW)
SPV client asks:   "give me the Merkle proof for my transaction txC"
SPV client checks: recompute path → does it match the header's root?
Result:            "yes, txC is buried in a block with real work on top"

Two sharp edges worth knowing

The duplicate-last-hash quirk

Merkle trees pair nodes two at a time, but block transaction counts are usually odd. Bitcoin’s rule when a level has an odd number of nodes is to duplicate the last hash and pair it with itself:

   level with 3 nodes:   H_A   H_B   H_C
   pad by duplicating:    H_A   H_B   H_C  (H_C)
   pairs:                (H_A,H_B)   (H_C,H_C)

It’s a small implementation detail, but it has bitten Bitcoin once. Because a node could also be formed from a duplicated child, an attacker could craft a different transaction list that produced the same Merkle root — a vulnerability cataloged as CVE-2012-2459. A malformed block could be made to look valid to a careless verifier while differing internally. It was fixed by having nodes detect and reject blocks containing such duplicated structures. The lesson is general: the security of a commitment depends on there being exactly one way to produce a given root, and padding rules must not quietly break that.

The Merkle root is only as good as the header it sits in

A Merkle proof tells you a transaction matches a root. It is only meaningful if you independently trust that root — i.e. that it came from a real header with real proof of work buried under it. Feed an SPV client fake headers and its proofs become worthless; this is why SPV clients follow the most-work header chain, not just any headers offered to them.

The thread

How does this help untrusting strangers agree on one ledger? The Merkle tree lets a 32-byte header commit to thousands of transactions, and lets anyone prove a payment is in a block by exchanging a handful of hashes instead of the whole block. A stranger with only the headers can confirm “my money landed, and there’s real work piled on top” — verifying inclusion in the shared ledger without trusting the messenger and without storing the ledger. It is how the smallest device on the network can still participate in the same truth as the largest.

Check your understanding

How is the Merkle root constructed from a block’s transactions, and why does changing one transaction change the root?
What exactly is in a Merkle proof, and why does its size grow only as log₂(n)?
What does an SPV client download and store, and what does a Merkle proof let it verify?
State clearly what SPV proves and what it does not — and why that’s a trust trade-off.
What is the duplicate-last-hash rule, and what problem (CVE-2012-2459) arose from it?