Engineering

Hashing Deep Dive: SHA, bcrypt, scrypt, and Argon2id From the Inside Out

Not all hashes are created equal. Here is how SHA-256, bcrypt, scrypt, and Argon2id actually work under the hood — the math, the memory, the attacks they resist, and exactly when to use each one.

Tin Dang avatar
Tin Dang
Layered cross-section diagram showing the internal rounds and memory structures of different hash algorithms

A junior engineer picks SHA-256 to hash passwords because “it’s secure — 256 bits!” A senior engineer picks bcrypt. A cryptographer picks Argon2id. They are all hashing, but they are solving completely different problems. The junior’s passwords will be cracked in hours. The senior’s will hold for years. The cryptographer’s will hold for decades.

The difference is not bit length. It is what the algorithm was designed to resist. SHA-256 resists collision attacks. bcrypt resists brute-force on CPUs. scrypt resists GPUs. Argon2id resists everything we know how to throw at it — CPUs, GPUs, FPGAs, and ASICs.

This post opens the hood on the four most widely used hash functions. You will understand the internal mechanics, the specific attack vectors each one defends against, and the precise scenarios where each is the right choice. If you want to see these algorithms applied in a production FastAPI authentication system, the FastAPI Auth series builds on the concepts covered here.

What Hashing Actually Is

A hash function takes arbitrary-length input and produces a fixed-length output (the digest) with three properties:

  1. Deterministic: Same input always produces the same output
  2. One-way: Given the output, you cannot recover the input
  3. Avalanche effect: A single-bit input change flips ~50% of output bits

But there are two fundamentally different categories of hash functions, and confusing them is the root of most hashing mistakes:

CategoryDesigned ForSpeed GoalExamples
Cryptographic hashData integrity, signatures, checksumsAs fast as possibleSHA-256, SHA-3, BLAKE3
Password hashCredential storageAs slow as safely tolerablebcrypt, scrypt, Argon2id

A cryptographic hash that processes 1 GB/sec is a feature. A password hash that processes 1 GB/sec is a vulnerability.

SHA-256: The Speed Demon

What It Is

SHA-256 (Secure Hash Algorithm, 256-bit) is part of the SHA-2 family designed by the NSA and published by NIST in 2001. It produces a 256-bit (32-byte) digest from any input.

Where It Is Used

  • File integrity: Verifying downloads, Docker image digests, Git commit hashes
  • Digital signatures: TLS certificates, JWT signing (HS256)
  • Blockchain: Bitcoin mining, Merkle trees
  • HMAC: Message authentication codes (HMAC-SHA256)
  • API key hashing: Hashing high-entropy secrets for storage (as we did in the FastAPI Auth series)
  • Content addressing: CAS systems, deduplication

How It Works Internally

SHA-256 processes input in 512-bit (64-byte) blocks through 64 rounds of bitwise operations. Here is the pipeline:

Step 1: Padding

The message is padded to a multiple of 512 bits. Append a 1 bit, then zeros, then the original message length as a 64-bit integer.

Original: "abc" = 01100001 01100010 01100011
Padded: 01100001 01100010 01100011 1 000...000 00000000 00011000
├── message ──────────────┤ ├─ zeros ─┤ ├─ length ──┤
Total: exactly 512 bits

Step 2: Initialize Hash Values

Eight 32-bit words (H0 through H7) are initialized to the fractional parts of the square roots of the first 8 primes:

# Initial hash values (first 32 bits of fractional parts of sqrt(2..19))
H = [
0x6a09e667, # sqrt(2)
0xbb67ae85, # sqrt(3)
0x3c6ef372, # sqrt(5)
0xa54ff53a, # sqrt(7)
0x510e527f, # sqrt(11)
0x9b05688c, # sqrt(13)
0x1f83d9ab, # sqrt(17)
0x5be0cd19, # sqrt(19)
]

These are not arbitrary — they are derived from mathematical constants to prevent the designers from inserting a backdoor (“nothing-up-my-sleeve numbers”).

Step 3: Message Schedule

Each 512-bit block is expanded into 64 32-bit words. The first 16 words come directly from the block. Words 16-63 are generated by mixing earlier words:

# Message schedule expansion
for i in range(16, 64):
s0 = right_rotate(W[i-15], 7) ^ right_rotate(W[i-15], 18) ^ (W[i-15] >> 3)
s1 = right_rotate(W[i-2], 17) ^ right_rotate(W[i-2], 19) ^ (W[i-2] >> 10)
W[i] = (W[i-16] + s0 + W[i-7] + s1) & 0xFFFFFFFF

Step 4: Compression — 64 Rounds

Each round mixes the eight working variables using bitwise operations (AND, XOR, NOT, rotations) and addition with round constants:

# One compression round (simplified)
def sha256_round(a, b, c, d, e, f, g, h, K_i, W_i):
S1 = right_rotate(e, 6) ^ right_rotate(e, 11) ^ right_rotate(e, 25)
ch = (e & f) ^ (~e & g) # "choice" — e picks between f and g
temp1 = (h + S1 + ch + K_i + W_i) & 0xFFFFFFFF
S0 = right_rotate(a, 2) ^ right_rotate(a, 13) ^ right_rotate(a, 22)
maj = (a & b) ^ (a & c) ^ (b & c) # "majority" — 2-of-3 vote
temp2 = (S0 + maj) & 0xFFFFFFFF
# Shift registers and inject new values
return (temp1 + temp2) & 0xFFFFFFFF, a, b, c, (d + temp1) & 0xFFFFFFFF, e, f, g

After all 64 rounds, the working variables are added to the current hash values, and the process repeats for the next block.

Performance

SHA-256 is designed for speed:

PlatformThroughputOperations/sec (single hash)
Modern CPU (single core)~500 MB/s~15 million
GPU (RTX 4090)~20 GB/s~10 billion
ASIC (Bitcoin miner)~100 trillion

This speed is why SHA-256 is catastrophically bad for passwords. An 8-character alphanumeric password has ~48 bits of entropy. At 10 billion hashes/sec on a GPU, that is exhausted in under 3 days.

When to Use SHA-256

  • Checksums and data integrity verification
  • HMAC for message authentication
  • Content-addressable storage
  • Hashing high-entropy secrets (API keys, tokens) where brute-force is infeasible
  • Digital signatures and certificate chains
  • Merkle trees and blockchain applications

Never use for: Passwords, low-entropy secrets, or anything a human chose.

SHA-3 (Keccak): The Backup Plan

SHA-3 is not a replacement for SHA-2 — it is an insurance policy. After theoretical weaknesses were found in SHA-1, NIST ran a competition (2007–2012) to select a fundamentally different hash algorithm. Keccak won.

How It Differs From SHA-2

SHA-2 uses the Merkle-Damgard construction — it processes blocks sequentially, feeding each block’s output into the next. SHA-3 uses the sponge construction:

The sponge has two phases:

  1. Absorb: Input blocks are XORed into the state and permuted
  2. Squeeze: Output blocks are extracted from the state

The internal state is 1600 bits — much larger than SHA-256’s 256-bit state. This makes length-extension attacks structurally impossible (SHA-256 is vulnerable to them without HMAC wrapping).

SHA-3 vs SHA-2

PropertySHA-256SHA3-256
ConstructionMerkle-DamgardSponge
Internal state256 bits1600 bits
Length extensionVulnerableImmune
Speed (software)Faster~30% slower
Speed (hardware)Optimized everywhereLess hardware support
Security marginGood (no practical attacks)Larger (newer design)

When to use SHA-3: When you need defense in depth (dual hashing), length-extension resistance without HMAC, or when regulatory compliance mandates it. For most applications, SHA-256 remains the practical choice.

BLAKE3: The Modern Alternative

BLAKE3 (2020) deserves mention as the fastest cryptographic hash function available. It is based on BLAKE2 (a SHA-3 finalist) with a Merkle tree structure that enables parallelism:

FunctionSpeed (single core)Speed (8 cores)Output size
SHA-256500 MB/s500 MB/s (not parallelizable)256 bits
SHA3-256350 MB/s350 MB/s (not parallelizable)256 bits
BLAKE31.2 GB/s6+ GB/s256 bits (extendable)

BLAKE3 is not yet a NIST standard, which limits adoption in regulated environments. For everything else — file hashing, content addressing, key derivation inputs — it is the fastest secure option.

bcrypt: The Battle-Tested Password Hash

The Origin Story

In 1999, Niels Provos and David Mazieres published bcrypt, based on the Blowfish cipher. The key insight: make the hash function intentionally slow, and make the slowness configurable.

How It Works

bcrypt is built on EksBlowfish (Expensive Key Schedule Blowfish), a modified Blowfish cipher with a deliberately expensive key setup phase.

Step 1: Key Setup — The Expensive Part

# Simplified bcrypt key derivation
def eks_blowfish_setup(cost, salt, password):
# Initialize Blowfish state with digits of pi
state = initial_blowfish_state()
# Expand key with salt and password
state = expand_key(state, salt, password)
# THE EXPENSIVE PART: repeat 2^cost times
for _ in range(2 ** cost):
state = expand_key(state, password)
state = expand_key(state, salt)
return state

The cost parameter (also called “work factor” or “rounds”) controls the number of iterations. Each increment doubles the computation time:

CostIterationsTime (modern CPU)Use Case
101,024~100msDevelopment, low-security
124,096~300msRecommended minimum
1416,384~1.2sHigh-security applications
1665,536~5sArchival, offline systems

Step 2: Encryption

After the expensive key setup, bcrypt encrypts the string "OrpheanBeholderScryDoubt" 64 times using the derived key state. The result is the hash.

Step 3: Output Format

$2b$12$WApznUPhDubN0oeveSXHp.GM/eCDjTMFaEWbGGlrVPbBevMOW/BRy
│ │ │ │
│ │ │ └── Hash (31 chars, base64)
│ │ └── Salt (22 chars, base64 = 128 bits)
│ └── Cost factor (12 = 2^12 iterations)
└── Algorithm version ($2b$ = current)

The salt and cost are embedded in the output. No separate salt storage needed — password_verify() extracts them automatically.

What bcrypt Resists

bcrypt’s 4KB Blowfish state provides some GPU resistance — GPU architectures prefer small, uniform memory access patterns, and bcrypt’s key-dependent memory accesses create cache pressure. But it is not truly memory-hard: a determined attacker with FPGAs or ASICs can parallelize bcrypt efficiently.

bcrypt’s Limitations

  1. 72-byte password limit: bcrypt truncates passwords longer than 72 bytes. Long passphrases lose entropy silently. Workaround: pre-hash with SHA-256 (but encode as base64 to avoid null bytes).

  2. Not memory-hard: The 4KB state is too small to resist modern GPU/ASIC attacks effectively.

  3. No parallelism tuning: You can only tune CPU cost, not memory or thread count.

# bcrypt in Python
import bcrypt
# Hash
password = b"correct horse battery staple"
salt = bcrypt.gensalt(rounds=12) # cost factor 12
hashed = bcrypt.hashpw(password, salt)
# b'$2b$12$LJ3m4ys3Lg...'
# Verify
bcrypt.checkpw(password, hashed) # True

scrypt: Memory-Hard Hashing

The Problem scrypt Solves

By 2009, GPUs had become efficient at cracking bcrypt hashes. Colin Percival designed scrypt with a specific goal: make the algorithm require large amounts of memory, so that attacking it requires not just computation but expensive RAM.

How It Works

scrypt builds on PBKDF2-HMAC-SHA256 but adds a memory-hard mixing step called ROMix:

Step 1: Generate a Large Memory Buffer

# Simplified ROMix
def romix(block, N):
# Phase 1: Fill memory with N sequential blocks
V = [None] * N
X = block
for i in range(N):
V[i] = X
X = block_mix(X)
# Phase 2: Random reads from the buffer
for i in range(N):
j = X % N # Index depends on current state
X = block_mix(X ^ V[j]) # Requires reading V[j] from memory
return X

The critical insight is in Phase 2: the index j depends on the current value of X, which depends on the password. An attacker cannot predict which memory locations will be needed without computing the entire chain. Skipping the memory means recomputing from scratch — trading memory for time at a steep ratio.

Parameters:

ParameterControlsTypical Value
NMemory size (blocks) — must be power of 22^14 (16,384) to 2^20
rBlock size multiplier8
pParallelism1
Memory used128 * N * r bytes16 MB to 1 GB
# scrypt in Python
import hashlib
hashed = hashlib.scrypt(
password=b"correct horse battery staple",
salt=b"random16bytesalt",
n=2**14, # N = 16384 (memory cost)
r=8, # block size
p=1, # parallelism
dklen=32, # output length
)

What scrypt Resists

scrypt’s Limitations

  1. Side-channel vulnerability: The data-dependent memory access pattern leaks information through cache timing. An attacker sharing the same hardware (cloud VMs, shared hosting) can potentially extract the password.

  2. All-or-nothing memory: scrypt has a single memory parameter. If you want high memory, you also get high latency. There is no way to tune memory and time independently.

  3. Limited parallelism control: The p parameter adds parallelism but does not improve memory-hardness per thread.

Argon2id: The State of the Art

The Password Hashing Competition

In 2013, recognizing that no existing algorithm fully addressed GPUs, FPGAs, ASICs, and side-channel attacks simultaneously, a group of cryptographers organized the Password Hashing Competition (PHC). Twenty-four candidates were submitted. In 2015, Argon2 won.

Argon2 comes in three variants:

VariantMemory Access PatternResistsUse Case
Argon2dData-dependentGPU/ASIC (strongest)Cryptocurrency, no side-channel risk
Argon2iData-independentSide-channel (strongest)Environments with shared hardware
Argon2idHybrid (first half i, second half d)Both GPU/ASIC and side-channelGeneral-purpose password hashing

Argon2id is the recommended variant for virtually all applications.

How Argon2id Works

Argon2id fills a large memory array in two phases using the Blake2b hash function as its compression core:

Step 1: Initialize the Memory Matrix

The memory is organized as a matrix of 1 KB blocks arranged in rows (lanes) and columns (segments):

Memory Matrix (m = memory_cost blocks):
┌─────────┬─────────┬─────────┬─────────┐
│ Lane 0 │ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │
├─────────┼─────────┼─────────┼─────────┤
│ Lane 1 │ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │
├─────────┼─────────┼─────────┼─────────┤
│ Lane p-1│ Seg 0 │ Seg 1 │ Seg 2 │ Seg 3 │
└─────────┴─────────┴─────────┴─────────┘
Each cell = 1 KB block
Lanes can be computed in parallel (p = parallelism)

Step 2: Fill Passes — The Hybrid Strategy

# Simplified Argon2id filling
def fill_memory(password, salt, memory_cost, time_cost, parallelism):
# Initialize memory matrix
matrix = allocate(memory_cost * 1024) # memory_cost KB
# Initial block derivation from password + salt via Blake2b
H0 = blake2b(parallelism || time_cost || memory_cost || password || salt || ...)
for pass_number in range(time_cost):
for segment in range(4):
for lane in range(parallelism): # Parallelizable
for block_index in segment_range(segment):
if pass_number == 0 and segment < 2:
# FIRST HALF: Argon2i mode (data-independent indexing)
# Reference block index generated from counter, NOT from memory
ref_index = generate_addresses_independent(pass_number, lane, block_index)
else:
# SECOND HALF: Argon2d mode (data-dependent indexing)
# Reference block index derived from previous block's content
ref_index = derive_from_previous_block(matrix[lane][block_index - 1])
# Compress: mix current block with referenced block
matrix[lane][block_index] = compress(
matrix[lane][block_index - 1],
matrix[ref_lane][ref_index],
)
# Final: XOR the last block of each lane
return xor_last_blocks(matrix)

Why the hybrid matters:

  • The first half (Argon2i mode) uses data-independent addressing. An attacker with cache-timing side channels learns nothing — the memory access pattern is deterministic from the public parameters alone.
  • The second half (Argon2d mode) uses data-dependent addressing. The reference block depends on the previous block’s content, which depends on the password. This makes it impossible to skip memory — you cannot predict which blocks will be needed without computing the entire chain.

By splitting passes, Argon2id gets side-channel resistance from the first half and maximum memory-hardness from the second half.

The Compression Function

Argon2’s compression function processes two 1 KB input blocks into one 1 KB output block using a modified Blake2b round applied to 8x8 matrices of 64-bit words:

# Argon2 compression (conceptual)
def compress(prev_block, ref_block):
R = prev_block ^ ref_block # XOR the two 1KB inputs
# Apply Blake2b-like mixing to 8 columns, then 8 rows
# of a 128-element array of 64-bit words
for col in range(8):
blake2b_mix(R[col*16 : (col+1)*16])
for row in range(8):
blake2b_mix(R[row::8])
return R ^ prev_block # Feed-forward XOR

Parameters

ParameterControlsRecommended
memory_cost (m)Memory in KB65536 (64 MB) — or as much as you can afford
time_cost (t)Number of passes3
parallelism (p)Lanes (threads)4
hash_lengthOutput size32 bytes

Tuning strategy: Max out memory first (the strongest defense), then increase time_cost to hit your latency budget. On a login endpoint targeting 500ms response time, 64 MB / 3 passes / 4 threads is a solid starting point.

# Argon2id in Python
from argon2 import PasswordHasher
ph = PasswordHasher(
memory_cost=65536, # 64 MB
time_cost=3, # 3 passes
parallelism=4, # 4 threads
hash_len=32,
salt_len=16,
)
# Hash
hashed = ph.hash("correct horse battery staple")
# $argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$RdescudvJCsgt...
# Verify
ph.verify(hashed, "correct horse battery staple") # True
# Check if rehash needed (parameters changed)
ph.check_needs_rehash(hashed) # True if your defaults changed

Argon2id Output Format

$argon2id$v=19$m=65536,t=3,p=4$c29tZXNhbHQ$RdescudvJCsgt3meg8GnUFI5qhXM
│ │ │ │ │
│ │ │ │ └── Hash (base64)
│ │ │ └── Salt (base64)
│ │ └── Parameters: memory=64MB, time=3, parallelism=4
│ └── Version 19 (0x13)
└── Algorithm variant (argon2id)

Like bcrypt, all parameters are embedded in the output string. Verification extracts them automatically, and you can upgrade parameters transparently on login.

What Argon2id Resists

The Complete Comparison

Security Properties

PropertySHA-256bcryptscryptArgon2id
Preimage resistance256-bitN/A (not a goal)N/AN/A
Collision resistance128-bitN/AN/AN/A
CPU brute-force resistanceNone (fast)Strong (cost factor)Strong (N parameter)Strong (time_cost)
GPU resistanceNoneModerate (4KB state)Strong (memory-hard)Strong (memory-hard)
FPGA/ASIC resistanceNoneWeakModerate-StrongStrong
Side-channel resistanceN/AN/AWeak (data-dependent)Strong (hybrid mode)
Time-memory tradeoff resistanceN/AN/AModerateStrong

Performance Characteristics

PropertySHA-256bcrypt (cost=12)scrypt (N=2^14, r=8)Argon2id (64MB, t=3, p=4)
Hashes/sec (CPU)~15,000,000~3~2~2
Hashes/sec (GPU)~10,000,000,000~50,000~1,000~500
Memory per hash~04 KB16 MB64 MB
Parallelism tunableN/ANoLimitedYes
Output sizeFixed 256-bitFixed 184-bitVariableVariable

Attack Cost Comparison

For an 8-character alphanumeric password (~48 bits of entropy), using an RTX 4090 GPU:

AlgorithmTime to ExhaustCost (AWS equivalent)
SHA-256~3 days~$50
bcrypt (cost=12)~30 years~$2,000,000
scrypt (N=2^14, r=8)~300 years~$20,000,000
Argon2id (64MB, t=3, p=4)~600 years~$40,000,000+

These numbers are estimates based on published benchmarks. The key takeaway is not the exact numbers but the orders of magnitude separating general-purpose and password-specific hashes.

The Decision Framework

Quick Reference

ScenarioUseWhy
Password storage (new project)Argon2idBest-in-class defense against all known attacks
Password storage (existing bcrypt)bcrypt → Argon2id migrationTransparent rehash on login
File checksumsSHA-256 or BLAKE3Speed matters, not brute-force resistance
HMAC/JWT signingHMAC-SHA256Standard, fast, well-supported
API key storageSHA-256High-entropy input, speed is safe
Key derivation (encryption keys from passwords)Argon2id or scryptMemory-hardness protects the derived key
Content-addressable storageBLAKE3Fastest secure hash, parallelizable
Digital signaturesSHA-256 or SHA-384NIST-standardized, regulatory compliance
DeduplicationBLAKE3 or SHA-256Speed + collision resistance

Implementation Patterns

Transparent Upgrade Strategy

The best password hashing systems upgrade algorithms transparently on login:

src/core/security.py
from passlib.context import CryptContext
pwd_context = CryptContext(
schemes=["argon2", "bcrypt", "scrypt"],
default="argon2",
argon2__memory_cost=65536,
argon2__time_cost=3,
argon2__parallelism=4,
bcrypt__rounds=12,
deprecated=["bcrypt", "scrypt"],
)
def hash_password(password: str) -> str:
return pwd_context.hash(password)
def verify_and_upgrade(password: str, hashed: str) -> tuple[bool, str | None]:
"""Verify password and return upgraded hash if needed.
Returns:
(is_valid, new_hash) — new_hash is non-None when the stored hash
uses a deprecated algorithm or outdated parameters.
"""
if not pwd_context.verify(password, hashed):
return False, None
if pwd_context.needs_update(hashed):
return True, pwd_context.hash(password)
return True, None
# In the login handler:
is_valid, new_hash = verify_and_upgrade(request.password, user.hashed_password)
if not is_valid:
raise InvalidCredentials()
if new_hash:
user.hashed_password = new_hash
await db.commit() # Transparent upgrade

Pre-Hashing for bcrypt’s 72-Byte Limit

import base64
import hashlib
def prehash_for_bcrypt(password: str) -> bytes:
"""SHA-256 pre-hash to bypass bcrypt's 72-byte limit.
Base64-encode the digest to avoid null bytes (which bcrypt
treats as a terminator in some implementations).
"""
digest = hashlib.sha256(password.encode("utf-8")).digest()
return base64.b64encode(digest)

Benchmarking Your Hardware

Always benchmark on your production hardware. Hash times vary dramatically between machines:

import time
from argon2 import PasswordHasher
def benchmark_argon2(memory_cost: int, time_cost: int, parallelism: int, iterations: int = 10):
ph = PasswordHasher(memory_cost=memory_cost, time_cost=time_cost, parallelism=parallelism)
start = time.perf_counter()
for _ in range(iterations):
ph.hash("benchmark_password")
elapsed = (time.perf_counter() - start) / iterations
print(f"m={memory_cost}, t={time_cost}, p={parallelism}: {elapsed*1000:.0f}ms per hash")
# Find the sweet spot for your latency budget
benchmark_argon2(65536, 3, 4) # 64 MB, 3 passes
benchmark_argon2(131072, 2, 4) # 128 MB, 2 passes
benchmark_argon2(262144, 1, 4) # 256 MB, 1 pass

Target: 200-500ms per hash for interactive login. 1-2 seconds for high-security applications. Higher memory is always better than more passes — memory-hardness is the strongest defense.

Key Takeaways

  1. SHA-256 is for data, not passwords. Its speed is a feature for checksums and a vulnerability for credentials.

  2. bcrypt is still safe but showing its age. The 72-byte limit and 4KB memory footprint are real constraints. If you are using it, stay at cost 12+ and plan an Argon2id migration.

  3. scrypt added memory-hardness but introduced side-channel risk. It remains a good choice for key derivation (it was designed for encrypting disk volumes) but Argon2id is better for interactive password hashing.

  4. Argon2id is the right default for new projects. It won the PHC, resists all four attack classes, and gives you independent knobs for memory, time, and parallelism.

  5. Max out memory first, then tune time. Memory-hardness is the strongest defense because memory cannot be parallelized cheaply. The time parameter is secondary.

  6. Always support transparent upgrades. Algorithms evolve. The system that can seamlessly rehash on login is the one that stays secure across decades.

Hash functions are not interchangeable tools from the same drawer. They are purpose-built instruments designed for specific threat models. Pick the right one, configure it properly, and build your system so you can change it when the threat landscape shifts.

0