Course → Module 3: Storage, Databases & Caching

Two Tools, Different Philosophies

Redis and Memcached are both in-memory key-value stores. Both deliver sub-millisecond latency. Both are open-source and battle-tested at massive scale. But they were built with fundamentally different design goals, and those differences matter when you are choosing one for a real system.

Memcached was built to do one thing well: cache simple key-value pairs with maximum throughput. Redis was built as a data structure server that happens to be excellent at caching. That difference in ambition shapes everything from data model to persistence to operational complexity.

Redis: The Data Structure Server

Redis stores more than strings. Its native data types include strings, hashes, lists, sets, sorted sets, bitmaps, hyperloglogs, and streams. Each type comes with atomic operations. You can increment a counter, push to a list, add to a sorted set by score, or compute set intersections, all in a single command without client-side logic.

This matters because it reduces round trips. Instead of fetching a value, modifying it in your application, and writing it back (three operations with race conditions), you issue a single ZINCRBY or HINCRBY command. The operation is atomic. No locks needed.

Redis also offers persistence through two mechanisms. RDB (Redis Database) takes point-in-time snapshots at configurable intervals. AOF (Append Only File) logs every write operation and can replay them on restart. You can use both together: RDB for fast recovery, AOF for minimal data loss. With AOF set to fsync every second, you lose at most one second of writes on a crash.

Additional features include pub/sub messaging, Lua scripting for server-side logic, transactions with MULTI/EXEC, and Redis Streams for event log workloads similar to a lightweight Kafka.

Memcached: The Pure Cache

Memcached stores strings. That is it. Every value is a blob of bytes up to 1MB. There are no data structures, no persistence, no scripting, no pub/sub. When a Memcached server restarts, all data is gone.

This simplicity is a feature. Memcached is multi-threaded by default, using its slab allocator to handle concurrent reads and writes across all available CPU cores. Redis, historically single-threaded for command execution (though Redis 6+ introduced I/O threading for network operations), cannot match Memcached's throughput on a single node when the workload is many concurrent clients doing simple GET/SET operations on multiple cores.

Memcached's memory management is also predictable. Its slab allocator pre-divides memory into fixed-size chunks, which eliminates fragmentation. Redis uses jemalloc and can fragment under certain workloads (many small keys created and deleted), though in practice this is manageable with activedefrag.

Feature Comparison

Dimension Redis Memcached
Data types Strings, hashes, lists, sets, sorted sets, streams, bitmaps, HyperLogLog Strings only
Max value size 512 MB 1 MB (default)
Threading Single-threaded command execution (I/O threads since v6) Multi-threaded
Persistence RDB snapshots + AOF log None
Eviction policies 8 policies (LRU, LFU, random, TTL-based, noeviction) LRU only
Replication Built-in leader-follower None (client-side consistent hashing)
Pub/Sub Yes No
Scripting Lua scripts, Redis Functions No
Cluster mode Redis Cluster (automatic sharding) Client-side sharding only
Memory efficiency Variable (depends on data structures used) Predictable (slab allocator)
Typical use cases Session store, leaderboard, rate limiter, message broker, cache Simple object caching, HTML fragment caching

Performance: Throughput and Latency

Benchmarks are always context-dependent, but industry testing gives us useful baselines. On a standard Linux server, Redis handles roughly 1.2 to 1.5 million operations per second for simple GET/SET workloads using pipelining. Memcached, with its multi-threaded architecture, achieves 1.0 to 1.2 million operations per second on similar hardware, with an advantage in highly concurrent multi-core scenarios.

For latency, both deliver sub-millisecond response times. Redis typically achieves around 0.1 to 0.15ms for simple GETs. Memcached is in the same range, around 0.15 to 0.25ms. At the P99 level under heavy concurrent load, Memcached's multi-threading can show lower tail latencies because it does not serialize all commands through a single thread.

The practical takeaway: for simple caching of string values with high concurrency, Memcached may edge ahead. For anything requiring data structures, persistence, or atomic operations on complex types, Redis is the only option.

Eviction Policies

When memory is full, the cache must decide what to remove. Memcached uses LRU (Least Recently Used) and that is your only choice. Redis provides eight eviction policies:

Policy Behavior When to use
allkeys-lru Evict least recently used key from all keys General-purpose caching (most common)
allkeys-lfu Evict least frequently used key from all keys When access frequency matters more than recency
volatile-lru LRU among keys with TTL set Mix of cache (with TTL) and persistent keys (no TTL)
volatile-lfu LFU among keys with TTL set Same as above, frequency-weighted
volatile-ttl Evict keys with shortest remaining TTL first When expiring-soon data is least valuable
allkeys-random Random eviction from all keys Uniform access patterns (rare)
volatile-random Random eviction from keys with TTL Rarely useful in practice
noeviction Return error on writes when memory is full When data loss is unacceptable (persistent data store)

LRU assumes that recently accessed data will be accessed again. LFU assumes that frequently accessed data will be accessed again. For most web applications, allkeys-lru is the correct default. Switch to allkeys-lfu when you have a small set of permanently hot keys mixed with large amounts of infrequently scanned data.

Cache Stampede: The Thundering Herd

A cache stampede happens when a popular cache key expires and many concurrent requests simultaneously discover the cache miss. All of them hit the database at once, potentially overwhelming it. If the database slows down or crashes, the problem cascades: more requests time out, more retries stack up, and the system spirals.

sequenceDiagram participant R1 as Request 1 participant R2 as Request 2 participant RN as Request N participant Cache as Cache participant DB as Database Note over Cache: Popular key expires at T=0 R1->>Cache: GET product:123 Cache-->>R1: MISS R2->>Cache: GET product:123 Cache-->>R2: MISS RN->>Cache: GET product:123 Cache-->>RN: MISS R1->>DB: SELECT * FROM products WHERE id=123 R2->>DB: SELECT * FROM products WHERE id=123 RN->>DB: SELECT * FROM products WHERE id=123 Note over DB: 50,000 identical queries hit simultaneously DB-->>R1: Result (slow, DB overloaded)

Three proven prevention techniques:

1. Distributed locking. When a cache miss occurs, the first request acquires a lock (using Redis SET key NX EX) and fetches from the database. All other requests wait for the lock to release, then read from the refreshed cache. Only one database query executes.

2. Probabilistic early expiration (XFetch). Instead of all keys expiring at exactly their TTL, each request computes a probability of refreshing the cache before expiration. As the TTL approaches zero, the probability increases. In practice, some request will refresh the key slightly before it expires, so the cache never actually goes empty.

3. Staggered TTL with jitter. Set TTLs as base_ttl + random(0, jitter_range). If 10,000 keys all have a 60-second TTL, they all expire at the same second. With jitter of 10 seconds, they expire spread across a 10-second window. This does not prevent stampede on a single key, but it prevents cache avalanche across many keys.

sequenceDiagram participant R1 as Request 1 participant R2 as Request 2 participant RN as Request N participant Cache as Cache participant Lock as Lock (Redis) participant DB as Database Note over Cache: Popular key expires at T=0 R1->>Cache: GET product:123 Cache-->>R1: MISS R1->>Lock: SET lock:product:123 NX EX 5 Lock-->>R1: OK (lock acquired) R2->>Cache: GET product:123 Cache-->>R2: MISS R2->>Lock: SET lock:product:123 NX EX 5 Lock-->>R2: FAIL (already locked) R2->>R2: Wait / return stale R1->>DB: SELECT * FROM products DB-->>R1: Result R1->>Cache: SET product:123 (fresh data) R1->>Lock: DEL lock:product:123 RN->>Cache: GET product:123 Cache-->>RN: HIT (fresh data)

When to Choose Which

Choose Memcached when your workload is simple key-value caching with high concurrency, you do not need persistence, and you want predictable memory behavior. Memcached shines as a pure caching layer in front of a relational database where the application handles all data structure logic.

Choose Redis for everything else. If you need sorted sets for leaderboards, pub/sub for real-time features, streams for event logs, persistence for session stores, or Lua scripts for atomic multi-step operations, Redis is the answer. Its ecosystem is larger, its feature set is richer, and it can serve as both cache and lightweight data store.

In practice, Redis has become the default choice for most new systems. Memcached retains a strong position in legacy architectures and in environments where its multi-threaded performance on simple operations justifies the feature trade-off.

Further Reading

Assignment

It is midnight. The cache TTL for your most popular product page expires. Your e-commerce site has 50,000 active users. Within one second, 50,000 requests discover the cache is empty and all hit the database simultaneously.

  1. Describe what happens to the database, the application servers, and the user experience in this scenario. Be specific about the failure cascade.
  2. Design a fix using at least two of the three prevention techniques discussed (locking, probabilistic early expiry, staggered TTL). Explain how they work together.
  3. What if the database query for this product takes 2 seconds? How does that change your locking strategy? What do the 49,999 waiting requests do during those 2 seconds?