A junior engineer picks SHA-256 to hash passwords because “it’s secure — 256 bits!” A senior engineer picks bcrypt. A cryptographer picks Argon2id. They are all hashing, but they are solving completely different problems. The junior’s passwords will be cracked in hours. The senior’s will hold for years. The cryptographer’s will hold for decades.
The difference is not bit length. It is what the algorithm was designed to resist. SHA-256 resists collision attacks. bcrypt resists brute-force on CPUs. scrypt resists GPUs. Argon2id resists everything we know how to throw at it — CPUs, GPUs, FPGAs, and ASICs.
This post opens the hood on the four most widely used hash functions. You will understand the internal mechanics, the specific attack vectors each one defends against, and the precise scenarios where each is the right choice. If you want to see these algorithms applied in a production FastAPI authentication system, the FastAPI Auth series builds on the concepts covered here.
What Hashing Actually Is
A hash function takes arbitrary-length input and produces a fixed-length output (the digest) with three properties:
- Deterministic: Same input always produces the same output
- One-way: Given the output, you cannot recover the input
- Avalanche effect: A single-bit input change flips ~50% of output bits
But there are two fundamentally different categories of hash functions, and confusing them is the root of most hashing mistakes:
| Category | Designed For | Speed Goal | Examples |
|---|---|---|---|
| Cryptographic hash | Data integrity, signatures, checksums | As fast as possible | SHA-256, SHA-3, BLAKE3 |
| Password hash | Credential storage | As slow as safely tolerable | bcrypt, scrypt, Argon2id |
A cryptographic hash that processes 1 GB/sec is a feature. A password hash that processes 1 GB/sec is a vulnerability.
SHA-256: The Speed Demon
What It Is
SHA-256 (Secure Hash Algorithm, 256-bit) is part of the SHA-2 family designed by the NSA and published by NIST in 2001. It produces a 256-bit (32-byte) digest from any input.
Where It Is Used
- File integrity: Verifying downloads, Docker image digests, Git commit hashes
- Digital signatures: TLS certificates, JWT signing (HS256)
- Blockchain: Bitcoin mining, Merkle trees
- HMAC: Message authentication codes (HMAC-SHA256)
- API key hashing: Hashing high-entropy secrets for storage (as we did in the FastAPI Auth series)
- Content addressing: CAS systems, deduplication
How It Works Internally
SHA-256 processes input in 512-bit (64-byte) blocks through 64 rounds of bitwise operations. Here is the pipeline:
Step 1: Padding
The message is padded to a multiple of 512 bits. Append a 1 bit, then zeros, then the original message length as a 64-bit integer.
Original: "abc" = 01100001 01100010 01100011Padded: 01100001 01100010 01100011 1 000...000 00000000 00011000 ├── message ──────────────┤ ├─ zeros ─┤ ├─ length ──┤ Total: exactly 512 bitsStep 2: Initialize Hash Values
Eight 32-bit words (H0 through H7) are initialized to the fractional parts of the square roots of the first 8 primes:
# Initial hash values (first 32 bits of fractional parts of sqrt(2..19))H = [ 0x6a09e667, # sqrt(2) 0xbb67ae85, # sqrt(3) 0x3c6ef372, # sqrt(5) 0xa54ff53a, # sqrt(7) 0x510e527f, # sqrt(11) 0x9b05688c, # sqrt(13) 0x1f83d9ab, # sqrt(17) 0x5be0cd19, # sqrt(19)]These are not arbitrary — they are derived from mathematical constants to prevent the designers from inserting a backdoor (“nothing-up-my-sleeve numbers”).
Step 3: Message Schedule
Each 512-bit block is expanded into 64 32-bit words. The first 16 words come directly from the block. Words 16-63 are generated by mixing earlier words:
# Message schedule expansionfor i in range(16, 64): s0 = right_rotate(W[i-15], 7) ^ right_rotate(W[i-15], 18) ^ (W[i-15] >> 3) s1 = right_rotate(W[i-2], 17) ^ right_rotate(W[i-2], 19) ^ (W[i-2] >> 10) W[i] = (W[i-16] + s0 + W[i-7] + s1) & 0xFFFFFFFFStep 4: Compression — 64 Rounds
Each round mixes the eight working variables using bitwise operations (AND, XOR, NOT, rotations) and addition with round constants:
# One compression round (simplified)def sha256_round(a, b, c, d, e, f, g, h, K_i, W_i): S1 = right_rotate(e, 6) ^ right_rotate(e, 11) ^ right_rotate(e, 25) ch = (e & f) ^ (~e & g) # "choice" — e picks between f and g temp1 = (h + S1 + ch + K_i + W_i) & 0xFFFFFFFF
S0 = right_rotate(a, 2) ^ right_rotate(a, 13) ^ right_rotate(a, 22) maj = (a & b) ^ (a & c) ^ (b & c) # "majority" — 2-of-3 vote temp2 = (S0 + maj) & 0xFFFFFFFF
# Shift registers and inject new values return (temp1 + temp2) & 0xFFFFFFFF, a, b, c, (d + temp1) & 0xFFFFFFFF, e, f, gAfter all 64 rounds, the working variables are added to the current hash values, and the process repeats for the next block.
Performance
SHA-256 is designed for speed:
| Platform | Throughput | Operations/sec (single hash) |
|---|---|---|
| Modern CPU (single core) | ~500 MB/s | ~15 million |
| GPU (RTX 4090) | ~20 GB/s | ~10 billion |
| ASIC (Bitcoin miner) | — | ~100 trillion |
This speed is why SHA-256 is catastrophically bad for passwords. An 8-character alphanumeric password has ~48 bits of entropy. At 10 billion hashes/sec on a GPU, that is exhausted in under 3 days.
When to Use SHA-256
- Checksums and data integrity verification
- HMAC for message authentication
- Content-addressable storage
- Hashing high-entropy secrets (API keys, tokens) where brute-force is infeasible
- Digital signatures and certificate chains
- Merkle trees and blockchain applications
Never use for: Passwords, low-entropy secrets, or anything a human chose.
SHA-3 (Keccak): The Backup Plan
SHA-3 is not a replacement for SHA-2 — it is an insurance policy. After theoretical weaknesses were found in SHA-1, NIST ran a competition (2007–2012) to select a fundamentally different hash algorithm. Keccak won.
How It Differs From SHA-2
SHA-2 uses the Merkle-Damgard construction — it processes blocks sequentially, feeding each block’s output into the next. SHA-3 uses the sponge construction:
The sponge has two phases:
- Absorb: Input blocks are XORed into the state and permuted
- Squeeze: Output blocks are extracted from the state
The internal state is 1600 bits — much larger than SHA-256’s 256-bit state. This makes length-extension attacks structurally impossible (SHA-256 is vulnerable to them without HMAC wrapping).
SHA-3 vs SHA-2
| Property | SHA-256 | SHA3-256 |
|---|---|---|
| Construction | Merkle-Damgard | Sponge |
| Internal state | 256 bits | 1600 bits |
| Length extension | Vulnerable | Immune |
| Speed (software) | Faster | ~30% slower |
| Speed (hardware) | Optimized everywhere | Less hardware support |
| Security margin | Good (no practical attacks) | Larger (newer design) |
When to use SHA-3: When you need defense in depth (dual hashing), length-extension resistance without HMAC, or when regulatory compliance mandates it. For most applications, SHA-256 remains the practical choice.
BLAKE3: The Modern Alternative
BLAKE3 (2020) deserves mention as the fastest cryptographic hash function available. It is based on BLAKE2 (a SHA-3 finalist) with a Merkle tree structure that enables parallelism:
| Function | Speed (single core) | Speed (8 cores) | Output size |
|---|---|---|---|
| SHA-256 | 500 MB/s | 500 MB/s (not parallelizable) | 256 bits |
| SHA3-256 | 350 MB/s | 350 MB/s (not parallelizable) | 256 bits |
| BLAKE3 | 1.2 GB/s | 6+ GB/s | 256 bits (extendable) |
BLAKE3 is not yet a NIST standard, which limits adoption in regulated environments. For everything else — file hashing, content addressing, key derivation inputs — it is the fastest secure option.
bcrypt: The Battle-Tested Password Hash
The Origin Story
In 1999, Niels Provos and David Mazieres published bcrypt, based on the Blowfish cipher. The key insight: make the hash function intentionally slow, and make the slowness configurable.
How It Works
bcrypt is built on EksBlowfish (Expensive Key Schedule Blowfish), a modified Blowfish cipher with a deliberately expensive key setup phase.
Step 1: Key Setup — The Expensive Part
# Simplified bcrypt key derivationdef eks_blowfish_setup(cost, salt, password): # Initialize Blowfish state with digits of pi state = initial_blowfish_state()
# Expand key with salt and password state = expand_key(state, salt, password)
# THE EXPENSIVE PART: repeat 2^cost times for _ in range(2 ** cost): state = expand_key(state, password) state = expand_key(state, salt)
return stateThe cost parameter (also called “work factor” or “rounds”) controls the number of iterations. Each increment doubles the computation time:
| Cost | Iterations | Time (modern CPU) | Use Case |
|---|---|---|---|
| 10 | 1,024 | ~100ms | Development, low-security |
| 12 | 4,096 | ~300ms | Recommended minimum |
| 14 | 16,384 | ~1.2s | High-security applications |
| 16 | 65,536 | ~5s | Archival, offline systems |
Step 2: Encryption
After the expensive key setup, bcrypt encrypts the string "OrpheanBeholderScryDoubt" 64 times using the derived key state. The result is the hash.
Step 3: Output Format
$2b$12$WApznUPhDubN0oeveSXHp.GM/eCDjTMFaEWbGGlrVPbBevMOW/BRy │ │ │ │ │ │ │ └── Hash (31 chars, base64) │ │ └── Salt (22 chars, base64 = 128 bits) │ └── Cost factor (12 = 2^12 iterations) └── Algorithm version ($2b$ = current)The salt and cost are embedded in the output. No separate salt storage needed — password_verify() extracts them automatically.
What bcrypt Resists
bcrypt’s 4KB Blowfish state provides some GPU resistance — GPU architectures prefer small, uniform memory access patterns, and bcrypt’s key-dependent memory accesses create cache pressure. But it is not truly memory-hard: a determined attacker with FPGAs or ASICs can parallelize bcrypt efficiently.
bcrypt’s Limitations
-
72-byte password limit: bcrypt truncates passwords longer than 72 bytes. Long passphrases lose entropy silently. Workaround: pre-hash with SHA-256 (but encode as base64 to avoid null bytes).
-
Not memory-hard: The 4KB state is too small to resist modern GPU/ASIC attacks effectively.
-
No parallelism tuning: You can only tune CPU cost, not memory or thread count.
# bcrypt in Pythonimport bcrypt
# Hashpassword = b"correct horse battery staple"salt = bcrypt.gensalt(rounds=12) # cost factor 12hashed = bcrypt.hashpw(password, salt)# b'$2b$12$LJ3m4ys3Lg...'
# Verifybcrypt.checkpw(password, hashed) # Truescrypt: Memory-Hard Hashing
The Problem scrypt Solves
By 2009, GPUs had become efficient at cracking bcrypt hashes. Colin Percival designed scrypt with a specific goal: make the algorithm require large amounts of memory, so that attacking it requires not just computation but expensive RAM.
How It Works
scrypt builds on PBKDF2-HMAC-SHA256 but adds a memory-hard mixing step called ROMix:
Step 1: Generate a Large Memory Buffer
# Simplified ROMixdef romix(block, N): # Phase 1: Fill memory with N sequential blocks V = [None] * N X = block for i in range(N): V[i] = X X = block_mix(X)
# Phase 2: Random reads from the buffer for i in range(N): j = X % N # Index depends on current state X = block_mix(X ^ V[j]) # Requires reading V[j] from memory
return XThe critical insight is in Phase 2: the index j depends on the current value of X, which depends on the password. An attacker cannot predict which memory locations will be needed without computing the entire chain. Skipping the memory means recomputing from scratch — trading memory for time at a steep ratio.
Parameters:
| Parameter | Controls | Typical Value |
|---|---|---|
N | Memory size (blocks) — must be power of 2 | 2^14 (16,384) to 2^20 |
r | Block size multiplier | 8 |
p | Parallelism | 1 |
| Memory used | 128 * N * r bytes | 16 MB to 1 GB |
# scrypt in Pythonimport hashlib
hashed = hashlib.scrypt( password=b"correct horse battery staple", salt=b"random16bytesalt", n=2**14, # N = 16384 (memory cost) r=8, # block size p=1, # parallelism dklen=32, # output length)What scrypt Resists
scrypt’s Limitations
-
Side-channel vulnerability: The data-dependent memory access pattern leaks information through cache timing. An attacker sharing the same hardware (cloud VMs, shared hosting) can potentially extract the password.
-
All-or-nothing memory: scrypt has a single memory parameter. If you want high memory, you also get high latency. There is no way to tune memory and time independently.
-
Limited parallelism control: The
pparameter adds parallelism but does not improve memory-hardness per thread.
Argon2id: The State of the Art
The Password Hashing Competition
In 2013, recognizing that no existing algorithm fully addressed GPUs, FPGAs, ASICs, and side-channel attacks simultaneously, a group of cryptographers organized the Password Hashing Competition (PHC). Twenty-four candidates were submitted. In 2015, Argon2 won.
Argon2 comes in three variants:
| Variant | Memory Access Pattern | Resists | Use Case |
|---|---|---|---|
| Argon2d | Data-dependent | GPU/ASIC (strongest) | Cryptocurrency, no side-channel risk |
| Argon2i | Data-independent | Side-channel (strongest) | Environments with shared hardware |
| Argon2id | Hybrid (first half i, second half d) | Both GPU/ASIC and side-channel | General-purpose password hashing |
Argon2id is the recommended variant for virtually all applications.
How Argon2id Works
Argon2id fills a large memory array in two phases using the Blake2b hash function as its compression core:
Step 1: Initialize the Memory Matrix
The memory is organized as a matrix of 1 KB blocks arranged in rows (lanes) and columns (segments):
Memory Matrix (m = memory_cost blocks):┌─────────┬─────────┬─────────┬─────────┐│ Lane 0 │ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │├─────────┼─────────┼─────────┼─────────┤│ Lane 1 │ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │├─────────┼─────────┼─────────┼─────────┤│ Lane p-1│ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │└─────────┴─────────┴─────────┴─────────┘ Each cell = 1 KB block Lanes can be computed in parallel (p = parallelism)Step 2: Fill Passes — The Hybrid Strategy
# Simplified Argon2id fillingdef fill_memory(password, salt, memory_cost, time_cost, parallelism): # Initialize memory matrix matrix = allocate(memory_cost * 1024) # memory_cost KB
# Initial block derivation from password + salt via Blake2b H0 = blake2b(parallelism || time_cost || memory_cost || password || salt || ...)
for pass_number in range(time_cost): for segment in range(4): for lane in range(parallelism): # Parallelizable for block_index in segment_range(segment): if pass_number == 0 and segment < 2: # FIRST HALF: Argon2i mode (data-independent indexing) # Reference block index generated from counter, NOT from memory ref_index = generate_addresses_independent(pass_number, lane, block_index) else: # SECOND HALF: Argon2d mode (data-dependent indexing) # Reference block index derived from previous block's content ref_index = derive_from_previous_block(matrix[lane][block_index - 1])
# Compress: mix current block with referenced block matrix[lane][block_index] = compress( matrix[lane][block_index - 1], matrix[ref_lane][ref_index], )
# Final: XOR the last block of each lane return xor_last_blocks(matrix)Why the hybrid matters:
- The first half (Argon2i mode) uses data-independent addressing. An attacker with cache-timing side channels learns nothing — the memory access pattern is deterministic from the public parameters alone.
- The second half (Argon2d mode) uses data-dependent addressing. The reference block depends on the previous block’s content, which depends on the password. This makes it impossible to skip memory — you cannot predict which blocks will be needed without computing the entire chain.
By splitting passes, Argon2id gets side-channel resistance from the first half and maximum memory-hardness from the second half.
The Compression Function
Argon2’s compression function processes two 1 KB input blocks into one 1 KB output block using a modified Blake2b round applied to 8x8 matrices of 64-bit words:
# Argon2 compression (conceptual)def compress(prev_block, ref_block): R = prev_block ^ ref_block # XOR the two 1KB inputs
# Apply Blake2b-like mixing to 8 columns, then 8 rows # of a 128-element array of 64-bit words for col in range(8): blake2b_mix(R[col*16 : (col+1)*16]) for row in range(8): blake2b_mix(R[row::8])
return R ^ prev_block # Feed-forward XORParameters
| Parameter | Controls | Recommended |
|---|---|---|
memory_cost (m) | Memory in KB | 65536 (64 MB) — or as much as you can afford |
time_cost (t) | Number of passes | 3 |
parallelism (p) | Lanes (threads) | 4 |
hash_length | Output size | 32 bytes |
Tuning strategy: Max out memory first (the strongest defense), then increase time_cost to hit your latency budget. On a login endpoint targeting 500ms response time, 64 MB / 3 passes / 4 threads is a solid starting point.
# Argon2id in Pythonfrom argon2 import PasswordHasher
ph = PasswordHasher( memory_cost=65536, # 64 MB time_cost=3, # 3 passes parallelism=4, # 4 threads hash_len=32, salt_len=16,)
# Hashhashed = ph.hash("correct horse battery staple")# $argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$RdescudvJCsgt...
# Verifyph.verify(hashed, "correct horse battery staple") # True
# Check if rehash needed (parameters changed)ph.check_needs_rehash(hashed) # True if your defaults changedArgon2id Output Format
$argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$RdescudvJCsgt3meg8GnUFI5qhXM │ │ │ │ │ │ │ │ │ └── Hash (base64) │ │ │ └── Salt (base64) │ │ └── Parameters: memory=64MB, time=3, parallelism=4 │ └── Version 19 (0x13) └── Algorithm variant (argon2id)Like bcrypt, all parameters are embedded in the output string. Verification extracts them automatically, and you can upgrade parameters transparently on login.
What Argon2id Resists
The Complete Comparison
Security Properties
| Property | SHA-256 | bcrypt | scrypt | Argon2id |
|---|---|---|---|---|
| Preimage resistance | 256-bit | N/A (not a goal) | N/A | N/A |
| Collision resistance | 128-bit | N/A | N/A | N/A |
| CPU brute-force resistance | None (fast) | Strong (cost factor) | Strong (N parameter) | Strong (time_cost) |
| GPU resistance | None | Moderate (4KB state) | Strong (memory-hard) | Strong (memory-hard) |
| FPGA/ASIC resistance | None | Weak | Moderate-Strong | Strong |
| Side-channel resistance | N/A | N/A | Weak (data-dependent) | Strong (hybrid mode) |
| Time-memory tradeoff resistance | N/A | N/A | Moderate | Strong |
Performance Characteristics
| Property | SHA-256 | bcrypt (cost=12) | scrypt (N=2^14, r=8) | Argon2id (64MB, t=3, p=4) |
|---|---|---|---|---|
| Hashes/sec (CPU) | ~15,000,000 | ~3 | ~2 | ~2 |
| Hashes/sec (GPU) | ~10,000,000,000 | ~50,000 | ~1,000 | ~500 |
| Memory per hash | ~0 | 4 KB | 16 MB | 64 MB |
| Parallelism tunable | N/A | No | Limited | Yes |
| Output size | Fixed 256-bit | Fixed 184-bit | Variable | Variable |
Attack Cost Comparison
For an 8-character alphanumeric password (~48 bits of entropy), using an RTX 4090 GPU:
| Algorithm | Time to Exhaust | Cost (AWS equivalent) |
|---|---|---|
| SHA-256 | ~3 days | ~$50 |
| bcrypt (cost=12) | ~30 years | ~$2,000,000 |
| scrypt (N=2^14, r=8) | ~300 years | ~$20,000,000 |
| Argon2id (64MB, t=3, p=4) | ~600 years | ~$40,000,000+ |
These numbers are estimates based on published benchmarks. The key takeaway is not the exact numbers but the orders of magnitude separating general-purpose and password-specific hashes.
The Decision Framework
Quick Reference
| Scenario | Use | Why |
|---|---|---|
| Password storage (new project) | Argon2id | Best-in-class defense against all known attacks |
| Password storage (existing bcrypt) | bcrypt → Argon2id migration | Transparent rehash on login |
| File checksums | SHA-256 or BLAKE3 | Speed matters, not brute-force resistance |
| HMAC/JWT signing | HMAC-SHA256 | Standard, fast, well-supported |
| API key storage | SHA-256 | High-entropy input, speed is safe |
| Key derivation (encryption keys from passwords) | Argon2id or scrypt | Memory-hardness protects the derived key |
| Content-addressable storage | BLAKE3 | Fastest secure hash, parallelizable |
| Digital signatures | SHA-256 or SHA-384 | NIST-standardized, regulatory compliance |
| Deduplication | BLAKE3 or SHA-256 | Speed + collision resistance |
Implementation Patterns
Transparent Upgrade Strategy
The best password hashing systems upgrade algorithms transparently on login:
from passlib.context import CryptContext
pwd_context = CryptContext( schemes=["argon2", "bcrypt", "scrypt"], default="argon2", argon2__memory_cost=65536, argon2__time_cost=3, argon2__parallelism=4, bcrypt__rounds=12, deprecated=["bcrypt", "scrypt"],)
def hash_password(password: str) -> str: return pwd_context.hash(password)
def verify_and_upgrade(password: str, hashed: str) -> tuple[bool, str | None]: """Verify password and return upgraded hash if needed.
Returns: (is_valid, new_hash) — new_hash is non-None when the stored hash uses a deprecated algorithm or outdated parameters. """ if not pwd_context.verify(password, hashed): return False, None
if pwd_context.needs_update(hashed): return True, pwd_context.hash(password)
return True, None
# In the login handler:is_valid, new_hash = verify_and_upgrade(request.password, user.hashed_password)if not is_valid: raise InvalidCredentials()if new_hash: user.hashed_password = new_hash await db.commit() # Transparent upgradePre-Hashing for bcrypt’s 72-Byte Limit
import base64import hashlib
def prehash_for_bcrypt(password: str) -> bytes: """SHA-256 pre-hash to bypass bcrypt's 72-byte limit.
Base64-encode the digest to avoid null bytes (which bcrypt treats as a terminator in some implementations). """ digest = hashlib.sha256(password.encode("utf-8")).digest() return base64.b64encode(digest)Benchmarking Your Hardware
Always benchmark on your production hardware. Hash times vary dramatically between machines:
import timefrom argon2 import PasswordHasher
def benchmark_argon2(memory_cost: int, time_cost: int, parallelism: int, iterations: int = 10): ph = PasswordHasher(memory_cost=memory_cost, time_cost=time_cost, parallelism=parallelism) start = time.perf_counter() for _ in range(iterations): ph.hash("benchmark_password") elapsed = (time.perf_counter() - start) / iterations print(f"m={memory_cost}, t={time_cost}, p={parallelism}: {elapsed*1000:.0f}ms per hash")
# Find the sweet spot for your latency budgetbenchmark_argon2(65536, 3, 4) # 64 MB, 3 passesbenchmark_argon2(131072, 2, 4) # 128 MB, 2 passesbenchmark_argon2(262144, 1, 4) # 256 MB, 1 passTarget: 200-500ms per hash for interactive login. 1-2 seconds for high-security applications. Higher memory is always better than more passes — memory-hardness is the strongest defense.
Key Takeaways
-
SHA-256 is for data, not passwords. Its speed is a feature for checksums and a vulnerability for credentials.
-
bcrypt is still safe but showing its age. The 72-byte limit and 4KB memory footprint are real constraints. If you are using it, stay at cost 12+ and plan an Argon2id migration.
-
scrypt added memory-hardness but introduced side-channel risk. It remains a good choice for key derivation (it was designed for encrypting disk volumes) but Argon2id is better for interactive password hashing.
-
Argon2id is the right default for new projects. It won the PHC, resists all four attack classes, and gives you independent knobs for memory, time, and parallelism.
-
Max out memory first, then tune time. Memory-hardness is the strongest defense because memory cannot be parallelized cheaply. The time parameter is secondary.
-
Always support transparent upgrades. Algorithms evolve. The system that can seamlessly rehash on login is the one that stays secure across decades.
Hash functions are not interchangeable tools from the same drawer. They are purpose-built instruments designed for specific threat models. Pick the right one, configure it properly, and build your system so you can change it when the threat landscape shifts.