Skip to content

Privacy & Deanonymization

The single most expensive misconception in Bitcoin is “it’s anonymous.” It is pseudonymous: every transaction is public, permanent, and globally searchable, and the entire field of chain analysis exists to link those pseudonyms back to real people. A senior engineer who understands the heuristics below understands both why surveillance works and why the countermeasures are shaped the way they are.

The threat model: a permanent, public graph

Section titled “The threat model: a permanent, public graph”

Every transaction is a node in a graph; every input/output edge connects coins through time. Analysts don’t need to break any cryptography — they mine structure and metadata:

addr X ──┐ ┌── addr P (payment?)
addr Y ──┼─► [ TX ] ──────┤
addr Z ──┘ └── addr Q (change?)
(inputs) (outputs)

Their job is to cluster addresses into real-world identities and to label clusters using off-chain data (exchange KYC records, public donation addresses, leaked databases, web-scraped addresses, IP logs from transaction relay). The heuristics below are how the clustering happens.

Heuristic 1 — Common-Input-Ownership (the big one)

Section titled “Heuristic 1 — Common-Input-Ownership (the big one)”

If a transaction has multiple inputs, assume they’re all controlled by the same entity.

inputs: [coin A] + [coin B] + [coin C] → one signer had the keys to all three

This is usually true, because to sign a transaction you need the private keys for every input. One spend that combines coins from five of your past addresses collapses all five into one cluster. This heuristic is the workhorse of every chain-analysis firm — and exactly what dusting attacks (see UTXO set vs chain) try to trigger. CoinJoin exists specifically to break it.

A typical payment has two outputs: the payment and the change coming back to you. If an analyst can pick out which output is change, they’ve found a fresh address that still belongs to the sender — extending the cluster forward in time. Tell-tale signs of the change output:

  • The round-number tell. You pay 0.5 BTC exactly; the other output is 0.4173829 BTC. The precise, un-round one is almost certainly change.
  • Address-type fingerprinting. If inputs are Taproot (bc1p…) and one output is Taproot while the other is legacy, the matching-type output is likely your wallet’s change.
  • Address reuse. If one output address has been seen before in the sender’s cluster, it’s change.
  • Unnecessary-input heuristic. If the wallet pulled in more inputs than the payment needed, the surplus reveals which output absorbs the remainder.

Heuristic 3 — Amount, timing, and network metadata

Section titled “Heuristic 3 — Amount, timing, and network metadata”
  • Amount correlation. A distinctive amount (say 1.337 BTC) appearing as an input shortly after appearing as an output elsewhere links the two, even across several hops (“peeling chains”).
  • Timing & round-trips. Funds that leave an exchange and return in a recognizable pattern betray the owner.
  • Network-level leaks. The node that first broadcasts a transaction is often near its originator. Without Tor/Dandelion-style protections, your IP can be tied to your transactions regardless of how clean your on-chain behavior is. Privacy is on-chain and network-layer.

Defense 1 — CoinJoin: break common-input-ownership

Section titled “Defense 1 — CoinJoin: break common-input-ownership”

A CoinJoin is a single transaction in which many people contribute inputs and receive equal-valued outputs, so the common-input-ownership heuristic produces a false cluster.

inputs: Alice, Bob, Carol, Dave outputs: 0.1 BTC ▸ ?
0.1 BTC ▸ ?
0.1 BTC ▸ ? ← which output belongs
0.1 BTC ▸ ? to which input?

With equal-sized outputs, an observer cannot tell which output belongs to which input — the linkage is hidden in a combinatorial haystack. The bigger the anonymity set (number of equal participants), the stronger the privacy. CoinJoin is collaborative and trustless: no participant can steal another’s coins, because each still signs only their own input and the transaction is invalid unless everyone’s outputs are present.

Defense 2 — PayJoin (P2EP): poison the heuristics

Section titled “Defense 2 — PayJoin (P2EP): poison the heuristics”

A PayJoin is a payment where the receiver also contributes an input. It looks like an ordinary transaction, but it quietly breaks two assumptions at once:

  • Common-input-ownership becomes false — the inputs are owned by two parties (payer and payee), yet an analyst will wrongly cluster them as one.
  • The amounts are misleading — because the receiver added value, the on-chain output amounts no longer correspond to the actual payment amount.
inputs: [payer's coin] + [RECEIVER's coin] ← analyst assumes one owner: WRONG
outputs: [payment + receiver's input] + [change] ← on-chain amount ≠ real payment: WRONG

PayJoin’s beauty is that it’s steganographic — it’s indistinguishable from a normal payment, so it degrades the entire chain-analysis dataset, not just the privacy of the people using it. If enough payments might be PayJoins, analysts can no longer trust the common-input heuristic at all.

You cannot make Bitcoin perfectly anonymous, but you can dramatically raise the cost of surveilling you:

DO AVOID
───────────────────────────────── ─────────────────────────────
fresh address per receipt address reuse
coin control (label/freeze UTXOs) blindly merging all coins
CoinJoin / PayJoin where appropriate consolidating right after KYC withdrawal
broadcast over Tor leaking your IP at broadcast
keep KYC and non-KYC coins separate co-spending them in one input set

How does this help untrusting strangers agree on one ledger? The very property that lets strangers verify the ledger without trust — everything is public — is the same property that makes privacy hard. Bitcoin’s transparency is load-bearing for consensus and corrosive for anonymity; privacy techniques are the ongoing effort to reclaim confidentiality without asking anyone to trust a hidden ledger. It’s the central tension of an open, verifiable money.

  1. Why is “pseudonymous, not anonymous” the precise description? What’s permanent and public?
  2. State the common-input-ownership heuristic and explain why it’s usually correct.
  3. Give three independent signals an analyst uses to identify the change output.
  4. How does CoinJoin defeat common-input-ownership, and why are equal output amounts essential?
  5. What two heuristics does PayJoin break at once, and why does it being indistinguishable from a normal payment help everyone’s privacy, not just the participants’?