Skip to content

Hash Functions (SHA-256)

If you learn only one cryptographic primitive deeply, make it this one. Hashing is the single most-used building block in Bitcoin: it links blocks, identifies transactions, builds Merkle trees, forms addresses, and is the “work” in Proof of Work. Get this page solid and the rest of the course gets dramatically easier.

A cryptographic hash function takes an input of any size and produces an output of fixed size, called the digest (or just “the hash”).

any data (1 byte or 1 terabyte) ──► H(...) ──► fixed-size fingerprint

Bitcoin uses SHA-256 (Secure Hash Algorithm, 256-bit), part of the SHA-2 family published by NIST in 2001. Its output is always 256 bits = 32 bytes = 64 hexadecimal characters, no matter the input.

Think of a hash as a digital fingerprint: a short, fixed-length identifier that stands in for a much larger piece of data.

The properties that make it useful (and why Bitcoin needs each)

Section titled “The properties that make it useful (and why Bitcoin needs each)”

A cryptographic hash isn’t just any function that shrinks data. It must have a specific set of properties. For each one, note the Bitcoin job it enables.

The same input always produces the same output. H("hello") is the same on every computer, forever. → Why Bitcoin needs it: every node must independently compute the same block hash and the same transaction ID, or they could never agree on one ledger.

Any input → exactly 256 bits. → Why: uniform, compact identifiers. A 1-byte transaction and a huge one both get a tidy 32-byte ID.

Computing H(x) is cheap. → Why: nodes verify enormous numbers of hashes when validating blocks and transactions; it has to be efficient.

Given an output h, it is computationally infeasible to find any input m with H(m) = h. You cannot run the function backwards. → Why: lets Bitcoin commit to data and hide secrets. Addresses, for example, are hashes of public keys — you can publish the hash without exposing what produced it.

Given a specific input m1, it’s infeasible to find a different input m2 ≠ m1 with the same hash. → Why: you can’t take an existing transaction and craft a different one that shares its ID.

It’s infeasible to find any two different inputs m1 ≠ m2 with H(m1) = H(m2). → Why: this is what makes the blockchain tamper-evident. If you change even one bit of a past block, its hash changes, which breaks the link to the next block (Part 3). Nobody can secretly swap data while keeping the same fingerprint.

Changing the input even slightly — a single bit — flips roughly half the output bits, in a way that looks completely random and unpredictable. → Why: it makes tampering glaringly obvious, and it makes mining a fair lottery (Part 4): you can’t “nudge” the input toward a desired hash; you can only keep trying.

The reason “infeasible” above is not hand-waving: a 256-bit output has 2²⁵⁶ possible values. That’s about 1.16 × 10⁷⁷ — a number comparable to the estimated count of atoms in the observable universe. To find a preimage by brute force, you’d expect to try on the order of 2²⁵⁶ inputs. No amount of current or foreseeable computing power gets close. This astronomical size is why one-wayness and collision-resistance hold in practice.

Your Mac has SHA-256 built in. Open a terminal and run:

Terminal window
echo -n "hello" | shasum -a 256

You should get:

2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

Now change just the capitalization of the first letter:

Terminal window
echo -n "Hello" | shasum -a 256

The output is completely different — not “a little different,” totally unrelated-looking. That’s the avalanche effect with your own eyes.

Type below and watch the digest recompute instantly. Change a single character and notice how many hex digits flip (shown in the caption, and highlighted where they changed) — that’s the avalanche effect, live. Everything is computed locally in your browser.

Where Bitcoin uses hashing (a preview map)

Section titled “Where Bitcoin uses hashing (a preview map)”

You’ll meet all of these later; here’s the map so the pieces connect:

UseWhat gets hashedCovered in
Transaction IDs (txid)the whole transactionPart 2
Merkle rootall transactions in a block, tree-hashedPart 1 (Merkle) / Part 3
Block hash / Proof of Workthe block headerParts 3–4
AddressesRIPEMD160(SHA256(public key))Parts 1 & 7

Bitcoin often hashes things twice: SHA256(SHA256(x)). You’ll see this for block hashes, txids, and Merkle nodes. The original motivation was defense against a theoretical weakness of the SHA-2 construction (so-called length-extension attacks). For now just remember: when Bitcoin says “the hash,” it usually means double SHA-256. We’ll point it out each time it matters.

  1. What does it mean that a hash is deterministic, and why is that essential for nodes to agree?
  2. Explain preimage resistance vs collision resistance in your own words. Which one makes the blockchain tamper-evident, and why?
  3. What is the avalanche effect, and what two Bitcoin behaviors does it enable?
  4. Roughly how many possible SHA-256 outputs are there, and why does that make brute-forcing a preimage infeasible?
  5. Using the live tool above, change one character in the input. In one sentence, describe what happened to the digest.