Course → Module 3: Storage, Databases & Caching

The Problem with Modulo Hashing

Distributed systems often need to assign data to nodes. The simplest approach is modulo hashing: take the hash of a key, divide by the number of nodes, and use the remainder as the node assignment. node = hash(key) % N.

This works well as long as N never changes. The moment you add or remove a node, nearly every key maps to a different node. If you go from 4 nodes to 5, the formula changes from hash(key) % 4 to hash(key) % 5. For most keys, the remainder is different. In a cache cluster, this means almost every cached item is now on the wrong node. The cache hit rate drops to nearly zero. Every request hits the database. You just turned a scaling event into a cascading failure.

Modulo hashing redistributes nearly all keys when the node count changes. For a system with K keys and N nodes going to N+1, roughly (N-1)/N of all keys move. With 4 nodes, adding a 5th moves about 75% of keys.

Consistent Hashing

Consistent hashing solves this by mapping both nodes and keys onto the same circular hash space (a "hash ring"). Each node is placed at a position on the ring determined by hashing its identifier. Each key is placed on the ring by hashing its value. The key is assigned to the first node encountered when walking clockwise from the key's position.

When a node is added, it takes over a segment of the ring from its clockwise neighbor. Only the keys in that segment move. When a node is removed, its segment is absorbed by the next clockwise node. Only the keys assigned to the removed node need to move.

Consistent hashing guarantees that when the number of nodes changes from N to N+1, only K/N keys need to be remapped on average, where K is the total number of keys. This is the theoretical minimum.

graph TD subgraph "Hash Ring" direction TB A["Node A
position: 0°"] B["Node B
position: 90°"] C["Node C
position: 180°"] D["Node D
position: 270°"] end K1["Key 1 (45°)"] -.-> B K2["Key 2 (120°)"] -.-> C K3["Key 3 (200°)"] -.-> D K4["Key 4 (350°)"] -.-> A style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#6b8f71,color:#ede9e3 style C fill:#222221,stroke:#c8a882,color:#ede9e3 style D fill:#222221,stroke:#6b8f71,color:#ede9e3

In this ring, Key 1 at position 45 degrees walks clockwise and hits Node B at 90 degrees. Key 4 at position 350 degrees wraps around and hits Node A at 0 degrees. Each key belongs to the next node clockwise.

Key Redistribution: The Numbers

The difference between modulo hashing and consistent hashing becomes dramatic when nodes change.

With modulo hashing, the problem gets worse as you scale. Going from 99 to 100 nodes moves 99% of keys. With consistent hashing, the same change moves only 1%. The gap widens at every scale.

Comparison Table

Dimension Modulo Hashing Consistent Hashing
Key assignment hash(key) % N Next clockwise node on hash ring
Keys moved on add/remove ~(N-1)/N of all keys ~K/N keys (theoretical minimum)
Load distribution Uniform (when N is stable) Uneven without virtual nodes
Implementation complexity Trivial Moderate (ring, sorted map)
Node heterogeneity Not supported Supported via virtual nodes
Cache impact on resize Near-total cache miss storm Minimal disruption
Used by Simple hash tables DynamoDB, Cassandra, Memcached, CDNs

The Problem with Basic Consistent Hashing

With only one position per node on the ring, load distribution is uneven. If Node A sits at 0 degrees and Node B sits at 10 degrees, Node B owns a tiny 10-degree arc while Node A owns a massive 350-degree arc. Node A gets roughly 35x more traffic than Node B. Random placement does not guarantee even spacing.

Removing a node makes this worse. The entire load of the removed node shifts to a single neighbor, potentially doubling that neighbor's load instantly.

Virtual Nodes

Virtual nodes (vnodes) solve load imbalance by giving each physical node multiple positions on the ring. Instead of hashing a node once, you hash it with different suffixes: hash("NodeA-1"), hash("NodeA-2"), hash("NodeA-3"), and so on. Each virtual node is a separate point on the ring, but they all map back to the same physical server.

Virtual nodes spread each physical node across many positions on the hash ring. With 100-200 virtual nodes per physical node, the standard deviation of load distribution drops to 5-10% of the mean.

graph TD subgraph "Hash Ring with Virtual Nodes" direction TB A1["A-v1
15°"] B1["B-v1
45°"] C1["C-v1
80°"] A2["A-v2
130°"] B2["B-v2
170°"] C2["C-v2
210°"] A3["A-v3
260°"] B3["B-v3
300°"] C3["C-v3
340°"] end PA["Physical Node A"] --- A1 PA --- A2 PA --- A3 PB["Physical Node B"] --- B1 PB --- B2 PB --- B3 PC["Physical Node C"] --- C1 PC --- C2 PC --- C3 style PA fill:#222221,stroke:#c8a882,color:#ede9e3 style PB fill:#222221,stroke:#6b8f71,color:#ede9e3 style PC fill:#222221,stroke:#8a8478,color:#ede9e3

With 3 virtual nodes per physical node, each physical node owns 3 arcs on the ring instead of 1. The arcs are spread across the ring, so the load is more evenly distributed. When Node B is removed, its three arcs are absorbed by three different nodes, not just one. This prevents any single node from being overwhelmed.

Virtual Nodes Enable Heterogeneous Clusters

Not all servers are equal. A machine with 64 GB of RAM can handle more data than one with 16 GB. Virtual nodes let you assign more vnodes to more powerful machines. If Server A has 4x the capacity of Server B, give Server A 200 virtual nodes and Server B 50. Server A will own roughly 4x as much of the ring.

This is how DynamoDB and Cassandra handle mixed hardware in production clusters. The vnode count per node is a configuration parameter that can be adjusted without changing the hashing algorithm.

Systems Thinking Lens

Modulo hashing is a brittle system with tight coupling between node count and key assignment. Any change to one variable (node count) causes a cascade through the entire system (mass key migration). Consistent hashing decouples these variables. The hash ring acts as an abstraction layer that absorbs changes locally instead of propagating them globally.

Virtual nodes add a second layer of decoupling: physical capacity from ring position. This makes the system adaptive. You can scale heterogeneously, handle failures gracefully, and rebalance without coordination storms.

Further Reading

Assignment

You have 4 cache nodes (A, B, C, D) arranged on a consistent hash ring at positions 0, 90, 180, and 270 degrees. Each node has 1,000 keys assigned to it (4,000 keys total, uniformly distributed).

  1. Draw the hash ring with the 4 nodes and label the arcs each node owns.
  2. A 5th node E is added at position 135 degrees. Which node loses keys? How many keys move, approximately?
  3. Now calculate the same scenario with modulo hashing: 4,000 keys, going from hash(key) % 4 to hash(key) % 5. How many keys move?
  4. If Node C (at 180 degrees) fails and is removed, what happens to its keys? Which node absorbs them? Why is this a problem, and how do virtual nodes fix it?