Hashing Algorithms
Map arbitrary data to fixed-size digests -- used for integrity, identity, distribution, and authentication across every layer of a system.
Map arbitrary data to fixed-size digests — used for integrity, identity, distribution, and authentication across every layer of a system.
Key Properties
| Property | Cryptographic | Non-Cryptographic |
|---|---|---|
| Preimage Resistance | ✅ Hard to reverse | ❌ Not required |
| Collision Resistance | ✅ ~2^128 work for 256-bit | ⚠️ Theoretical, not verified |
| Speed | Medium (SHA-256: 500 MB/s) | Very fast (MurmurHash: 5+ GB/s) |
| Output Determinism | ✅ Always same for input | ✅ Always same for input |
| Avalanche Effect | ✅ Strict (1 bit → 50% output bits) | ✅ Good (1 bit → 30-40% output bits) |
| Use Cases | Signing, integrity, identity | Sharding, hash tables, Bloom filters |
Algorithm Comparison
| Algorithm | Output | Speed | Collision Resist | Status | Primary Use |
|---|---|---|---|---|---|
| MD5 | 128-bit | Fast | ❌ Broken (collisions found 2004) | Deprecated | Legacy checksums only |
| SHA-1 | 160-bit | Fast | ❌ Broken (collisions found 2017) | Sunset (Git migrating away) | Legacy Git, deprecated in TLS 1.3 |
| SHA-256 | 256-bit | Medium | ✅ NIST standard (no known attacks) | Current standard | TLS certs, JWT, Bitcoin, Git (future) |
| SHA-3 (Keccak) | 256/512-bit | Medium | ✅ NIST 2015 standard (sponge construction) | Post-quantum preferred | Future migration, post-quantum readiness |
| bcrypt | 192-bit | Tunable slow | ✅ Adaptive; salted | Recommended | Password storage (legacy systems) |
| Argon2id | Variable | Tunable | ✅ Memory-hard; resistant to GPU/ASIC attacks | OWASP best practice 2024 | Password storage (modern standard) |
| MurmurHash3 | 32/128-bit | Very fast (3-5 GB/s) | Non-cryptographic | Production standard | Cassandra, Bloom filters, hash tables |
| xxHash | 32/64/128-bit | Fastest (10+ GB/s) | Non-cryptographic | Production standard | File checksums, high-speed hash tables |
| CRC32 | 32-bit | Fast (hardware accelerated) | ❌ Error detection only | Standard | Storage integrity, network packets |
Decision Guide
Choosing a Hash Algorithm:
For Security/Authentication:
- Password storage: Use Argon2id (OWASP 2024 recommendation: m=46 MiB, t=1, p=1)
- JWT signing / TLS certs: Use SHA-256 (standard for ECDSA and RSA-PSS)
- Future-proofing: Use SHA-3-256 for post-quantum resilience
For Data Distribution (Sharding/Load Balancing):
- Partition keys: Use MurmurHash3 (Cassandra standard; 3-5x faster than MD5)
- Hash tables: Use xxHash for speed (10+ GB/s) or CityHash for simplicity
For Error Detection:
- Network packets / storage checksums: Use CRC32 (hardware-accelerated, lightweight)
For Content Addressing (Deduplication):
- Git: Currently SHA-1 (migrating to SHA-256 by 2026 in Git 3.0)
- IPFS / content-addressed storage: SHA-256 (collision-free guarantee)
Real Systems Using Hashing
| System | Hash Used | Purpose | Configuration | Benefit |
|---|---|---|---|---|
| Bitcoin Blockchain | SHA-256 (double) | Proof of work; block integrity | Header: 1024 bits (version, prev_hash, merkle_root, timestamp, target, nonce); double hashed; difficulty adjusts every 2,016 blocks (~2 weeks) to maintain 10-min block time | Immutable ledger; 51% attack requires controlling >50% of hash rate (~exahash/sec) |
| Git (v2.45+) | SHA-1 → SHA-256 | Content-addressable objects (commits, trees, blobs) | Currently SHA-1 for compatibility; a future Git major release defaults to SHA-256; transitional repos support both | Collision resistance; Git 3.0 forces ecosystem adaptation (GitHub support pending) |
| TLS 1.3 Certificates | SHA-256 (ECDSA/RSA-PSS) | Chain signature verification | Certificate signature: RSA-PSS with SHA-256 (OID 0x0804); ECDSA-SHA256 for EC keys; 256-bit hash prevents second-preimage attacks on 2048-bit RSA | Industry standard; Firefox, Chrome, Safari all require SHA-256 minimum |
| Argon2id (OWASP) | Custom KDF with BLAKE2b | Password hashing resistant to GPU cracking | m=46 MiB (memory), t=1 (time cost), p=1 (parallelism); or m=19 MiB, t=2, p=1 for resource-constrained environments; 16 bytes salt | Tunable cost; ~100ms per hash on modern CPU; GPU/ASIC resistant due to memory-hard computation |
| Cassandra Partition Keys | MurmurHash3 (128-bit, x64, upper 64 bits) | Token generation for consistent hashing | Maps to 64-bit ring: -2^63 to +2^63-1; token_range determines replication factor ownership; 3-5x faster than RandomPartitioner | Uniform distribution across nodes; deterministic ring; enables stream-based replica balancing |
| JWT Signing (RS256/HS256) | SHA-256 with HMAC or RSA | Token integrity; tamper detection | Header.Payload.Signature format; shared secret (HMAC) or public key (RSA); signature covers first two parts | Stateless auth; impossible to forge without key; widely supported (OpenID Connect standard) |
How Real Systems Use This
Bitcoin: Double SHA-256 Proof of Work
Bitcoin uses SHA-256(SHA-256(block_header)) to create an immutable ledger. The block header is 1024 bits: version (32-bit), previous block hash (256-bit), Merkle root of transactions (256-bit), timestamp (32-bit), difficulty target (32-bit), and nonce (32-bit). Miners compete to find a nonce that makes the double-hashed result fall below the difficulty target. The difficulty adjusts every 2,016 blocks (~two weeks) to maintain a 10-minute average block time. At current hashrate (~520 exahash/sec as of 2025), finding a valid nonce requires ~10^19 hash operations, making a 51% attack cost-prohibitive. Nodes verify blocks in O(1) by recomputing the double hash; the double hashing reduces certain cryptographic attacks and reinforces integrity.
Git: SHA-1 → SHA-256 Migration
Git uses SHA-1 to create content-addressable object IDs (commits, trees, blobs are identified by their hash). A commit SHA-1 is the hash of: parent commit, tree object, author metadata, and message. For decades this was secure due to Git’s use case (trusted internal repositories), but in 2017 Google published the first SHA-1 collision. Git 2.45 (released 2024) added SHA-256 support with interoperability; Git 3.0 (targeting late 2026) will make SHA-256 default for new repos. The challenge: GitHub, GitLab, and other forges still don’t support SHA-256 repos, creating a chicken-and-egg problem. The migration plan includes hybrid repos that can contain both SHA-1 and SHA-256 objects, allowing gradual transition without breaking existing workflows.
TLS 1.3 Certificate Chain Signatures
Modern TLS certificates use SHA-256 with RSA-PSS or ECDSA for signatures. A certificate contains: public key, subject name, issuer name, validity dates, extensions, and a signature from the Certificate Authority. The signature algorithm (e.g., sha256WithRSAEncryption, OID 1.2.840.113549.1.1.11) specifies how the hash is computed. For RSA, the CA computes SHA-256(certificate_body), then encrypts with its private key. When verifying, clients recompute SHA-256(certificate_body), decrypt the signature with the CA’s public key, and compare. TLS 1.3 deprecates SHA-1 signatures and all major browsers require SHA-256 minimum, ensuring 256-bit collision resistance. The hash length matches the public key strength (256-bit hash for 2048-bit RSA).
Argon2id: GPU-Resistant Password Hashing
Argon2id is a memory-hard key derivation function designed to resist GPU and ASIC attacks. OWASP 2024 recommends: m=46 MiB (memory), t=1 (time cost in iterations), p=1 (parallelism degree). This takes ~100 milliseconds on a modern CPU to hash a password, making dictionary attacks prohibitively slow. The “memory-hard” aspect forces attackers to allocate 46 MB per hash attempt; even with GPUs (thousands of parallel threads), memory bandwidth becomes the bottleneck. Parameters are tunable: for resource-constrained environments, m=19 MiB with t=2 provides equivalent security with lower memory overhead. Argon2id combines data-dependent memory access (Argon2i, resistant to side-channel attacks) with data-independent access (Argon2d, faster), making it resistant to both timing and GPU attacks.
Cassandra: MurmurHash3 Partition Key Distribution
Cassandra uses MurmurHash3 (128-bit x64, truncated to upper 64 bits) to hash partition keys and assign them to nodes on a consistent hash ring. The hash value becomes a “token” ranging from -2^63 to +2^63-1. Each node owns a token range; the partition key hash determines which node(s) hold the data. MurmurHash3 is 3-5x faster than the legacy RandomPartitioner (cryptographic MD5) because Cassandra doesn’t need cryptographic guarantees — it just needs uniform distribution for load balancing. The deterministic hash enables stream-based token reassignment during node additions: a new node can replicate data by scanning nodes in increasing token order, rather than rebalancing the entire ring. With replication factor = 3, data is placed on the token-owning node plus the next 2 nodes clockwise.
JWT Signing: HMAC-SHA256 and RS256
JSON Web Tokens (JWT) use SHA-256 for signature generation. The header specifies the algorithm (e.g., HS256 for HMAC-SHA256, RS256 for RSA with SHA-256). For HMAC-SHA256, the server has a shared secret key (e.g., 256-bit random); to create a token: signature = HMAC-SHA256(header.payload, secret). The token is base64(header).base64(payload).base64(signature). Clients verify by recomputing the HMAC with the shared secret and comparing. For RS256, the server signs with a private key and clients verify using the corresponding public key (no shared secret required). The signature covers only the header and payload, allowing clients to extract and validate claims (user ID, expiration, permissions) without contacting the server. This is the foundation of stateless authentication in OAuth 2.0 and OpenID Connect.
Implementation
Simple Hash Function (Educational)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def djb2_hash(data, table_size):
"""
Simple hash function using DJB2 algorithm.
Demonstrates basic hash properties: fast, uniform distribution.
Args:
data: string or bytes to hash
table_size: size of hash table
Returns:
hash value in range [0, table_size)
Speed: ~5-10 MB/s (educational; production uses MurmurHash3 at 5 GB/s)
"""
if isinstance(data, str):
data = data.encode()
hash_value = 5381
for byte in data:
# Bitwise operations: left-shift + addition for mixing
hash_value = ((hash_value << 5) + hash_value) + byte
hash_value &= 0xFFFFFFFF # Keep 32-bit
return hash_value % table_size
# Example usage
print(djb2_hash("apple", 100)) # Output: reproducible hash in [0, 100)
print(djb2_hash("banana", 100))
Consistent Hashing with MurmurHash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
import hashlib
class ConsistentHashRing:
"""
Consistent hashing for distributed systems.
Used in: Cassandra (with MurmurHash3), Amazon DynamoDB, Memcached.
Minimizes key redistribution when nodes are added/removed.
"""
def __init__(self, nodes=None, virtual_nodes=150):
"""
Args:
nodes: list of node identifiers (e.g., server IPs)
virtual_nodes: number of virtual nodes per physical node
(higher = better distribution, more memory)
"""
self.virtual_nodes = virtual_nodes
self.ring = {} # token -> node_id
self.sorted_tokens = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
"""Hash key to 64-bit token (mimics MurmurHash3)."""
h = hashlib.md5(str(key).encode()).hexdigest()
# Use first 16 hex chars (64 bits)
return int(h[:16], 16)
def add_node(self, node):
"""Add node and its virtual nodes to ring."""
for i in range(self.virtual_nodes):
virtual_key = f"{node}:{i}"
token = self._hash(virtual_key)
self.ring[token] = node
self._recompute_sorted_tokens()
def remove_node(self, node):
"""Remove node and its virtual nodes from ring."""
for i in range(self.virtual_nodes):
virtual_key = f"{node}:{i}"
token = self._hash(virtual_key)
del self.ring[token]
self._recompute_sorted_tokens()
def _recompute_sorted_tokens(self):
"""Maintain sorted list of tokens for binary search."""
self.sorted_tokens = sorted(self.ring.keys())
def get_node(self, key):
"""
Find node responsible for key using consistent hashing.
Returns:
node_id, or None if ring is empty
"""
if not self.ring:
return None
token = self._hash(key)
# Find first token >= key's token (clockwise on ring)
import bisect
idx = bisect.bisect_left(self.sorted_tokens, token)
if idx == len(self.sorted_tokens):
idx = 0 # Wrap around to first node
return self.ring[self.sorted_tokens[idx]]
# Example: 3 nodes, partition requests
ring = ConsistentHashRing(nodes=["server_a", "server_b", "server_c"])
keys = ["user:123", "session:456", "cache:789"]
for key in keys:
node = ring.get_node(key)
print(f"{key} → {node}")
print("\n--- Adding server_d (minimal redistribution) ---")
ring.add_node("server_d")
for key in keys:
node = ring.get_node(key)
print(f"{key} → {node}")
HMAC-SHA256 for Message Authentication
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import hmac
import hashlib
import base64
def sign_jwt_header_payload(header_b64, payload_b64, secret):
"""
Create HMAC-SHA256 signature for JWT.
Args:
header_b64: base64-encoded header
payload_b64: base64-encoded payload
secret: shared secret (typically 256-bit random)
Returns:
signature as base64-encoded string
"""
# Message to sign is header.payload (without signature yet)
message = f"{header_b64}.{payload_b64}".encode()
# HMAC-SHA256
signature_bytes = hmac.new(
secret.encode(), message, hashlib.sha256
).digest()
# Base64 encode (URL-safe, no padding)
signature_b64 = base64.urlsafe_b64encode(signature_bytes).decode().rstrip("=")
return signature_b64
def verify_jwt_signature(header_b64, payload_b64, signature_b64, secret):
"""
Verify JWT signature.
Args:
header_b64, payload_b64, signature_b64: JWT parts
secret: shared secret
Returns:
True if signature is valid, False otherwise
"""
# Recompute signature
expected_signature = sign_jwt_header_payload(header_b64, payload_b64, secret)
# Constant-time comparison to prevent timing attacks
return hmac.compare_digest(signature_b64, expected_signature)
# Example usage
import json
header = {"alg": "HS256", "typ": "JWT"}
payload = {"sub": "user123", "exp": 1234567890}
header_b64 = base64.urlsafe_b64encode(json.dumps(header).encode()).decode().rstrip("=")
payload_b64 = base64.urlsafe_b64encode(json.dumps(payload).encode()).decode().rstrip("=")
secret = "my_super_secret_key_min_32_chars"
token_signature = sign_jwt_header_payload(header_b64, payload_b64, secret)
jwt_token = f"{header_b64}.{payload_b64}.{token_signature}"
print(f"JWT Token: {jwt_token}")
# Verify
is_valid = verify_jwt_signature(header_b64, payload_b64, token_signature, secret)
print(f"Signature valid: {is_valid}")
References
- FIPS 202 — SHA-3 Standard (NIST 2015) — Keccak sponge construction; post-quantum preferred algorithm.
- RFC 9106 — Argon2 Password Hashing (IETF 2021) — Memory-hard KDF; OWASP 2024 recommended parameters (m=46 MiB, t=1, p=1).
- MurmurHash3 Algorithm — Production hash function used by Cassandra, Kafka, and Bloom filters; 3-5x faster than MD5.
- OWASP Password Storage Cheat Sheet — Argon2id recommendations and secure storage best practices.
- Bitcoin: Double SHA-256 Proof of Work — Block header hashing; difficulty adjustment mechanism; 51% attack cost analysis.
- Git Hash Function Transition Documentation — SHA-1 → SHA-256 migration timeline; Git 3.0 targeting late 2026.
- TLS 1.3 Certificate Chain Signatures — SHA-256 with ECDSA/RSA-PSS; certificate verification flow.
- Cassandra Partitioner Documentation — MurmurHash3 for token ring and consistent hashing.
- ByteByteGo — Hashing Algorithms Explained — Visual walkthroughs of SHA-256, bcrypt, and consistent hashing.
- Wikipedia: Cryptographic Hash Function — Avalanche effect, collision resistance, and security properties.