Course → Module 7: Real-World Case Studies I

The Problem: Personalized Timelines at Scale

When a user opens Twitter (now X), they see a feed of tweets from people they follow, ranked and ordered. This looks simple. It is not. At peak, Twitter handled 300,000 timeline read requests per second while ingesting around 4,600 new tweets per second (12,000 at peak). The system must assemble a personalized timeline for each of those 300K requests in under 200 milliseconds.

The core design question is deceptively simple: when User A posts a tweet, how does it appear in User B's feed? There are exactly two strategies, and Twitter has tried both.

Strategy 1: Fan-Out on Read (Pull Model)

When User B opens their timeline, the system queries all accounts B follows, fetches their recent tweets, merges them, ranks them, and returns the result. Every timeline request triggers a fresh computation.

sequenceDiagram participant U as User B (Reader) participant API as API Server participant FS as Follow Service participant TS as Tweet Store participant R as Ranker U->>API: GET /timeline API->>FS: Get accounts B follows FS-->>API: [User A, User C, User D, ...] API->>TS: Fetch recent tweets from each TS-->>API: Tweets (unsorted) API->>R: Rank and merge R-->>API: Sorted timeline API-->>U: Timeline response

This approach is simple and storage-efficient. You store each tweet once. But the read path is expensive. If User B follows 500 accounts, that is 500 lookups plus a merge-sort on every single request. At 300K requests per second, this does not scale.

Strategy 2: Fan-Out on Write (Push Model)

When User A posts a tweet, the system immediately writes a copy of that tweet's ID into the timeline cache of every follower. User B's timeline is pre-computed and sitting in memory (Redis) before B even opens the app.

sequenceDiagram participant A as User A (Writer) participant API as API Server participant FS as Fan-out Service participant TL as Timeline Cache (Redis) participant B as User B (Reader) A->>API: POST /tweet API->>FS: Fan-out to all followers FS->>TL: Write tweet ref to Follower 1 timeline FS->>TL: Write tweet ref to Follower 2 timeline FS->>TL: Write tweet ref to Follower N timeline Note over FS,TL: N writes per tweet B->>API: GET /timeline API->>TL: Read pre-computed timeline TL-->>API: Sorted tweet refs API-->>B: Timeline response

This is what Twitter actually adopted. Each user's home timeline is stored as a Redis list of tweet IDs, replicated across 3 machines in a datacenter. The read path becomes a single Redis lookup. Fast. Predictable. Easy to cache.

The cost is write amplification. User A posts one tweet. If A has 5,000 followers, that single tweet generates 5,000 timeline writes. Twitter reported that this amplification inflated their 4,600 tweets/sec into 345,000 timeline writes per second.

The Celebrity Problem

Fan-out on write works beautifully until a celebrity posts. A user with 10 million followers generates 10 million timeline writes for a single tweet. At Twitter's scale, multiple celebrities tweet every second. The fan-out queue backs up. Timelines go stale. Users complain their feed is delayed.

This is not a theoretical problem. Twitter engineers confirmed that tweets from high-follower accounts could take more than 5 seconds to propagate to all followers. For breaking news from a journalist with millions of followers, that delay is unacceptable.

StrategyRead CostWrite CostLatencyStorageBest For
Fan-out on ReadHigh (N queries per request)Low (1 write)High read latencyLowUsers with many followers
Fan-out on WriteLow (1 cache read)High (N writes per tweet)Low read latencyHighUsers with few followers
HybridMediumMediumLow read latencyMediumProduction systems at scale

Key insight: Fan-out on write trades storage for latency. Fan-out on read trades latency for storage. The celebrity problem forces you to do both.

The Hybrid Approach

Twitter's actual production system uses a hybrid. Regular users (say, under 10,000 followers) use fan-out on write. Their tweets are pushed to follower timelines immediately. High-follower accounts are flagged. Their tweets are not fanned out. Instead, when User B requests their timeline, the system reads B's pre-computed timeline from Redis, then fetches recent tweets from any high-follower accounts B follows, merges them in, and returns the result.

flowchart TD A[User A posts tweet] --> B{Followers > threshold?} B -->|No: Regular user| C[Fan-out on Write] C --> D[Write tweet ref to each follower timeline in Redis] B -->|Yes: Celebrity| E[Store tweet only] E --> F[No fan-out] G[User B requests timeline] --> H[Read pre-computed timeline from Redis] H --> I[Fetch recent tweets from celebrities B follows] I --> J[Merge and rank] J --> K[Return timeline to User B]

The threshold is not fixed. Twitter uses a dynamic cutoff based on follower count and tweet frequency. An account with 5 million followers who tweets once a day is treated differently than one with 500,000 followers who tweets 50 times a day. The fan-out cost depends on both.

Cursor-Based Pagination

Timelines are paginated. The naive approach is offset-based: "give me tweets 20-40." This breaks when new tweets arrive between page requests. Tweet 20 on page 2 might be tweet 21 by the time the user scrolls. Items get duplicated or skipped.

Cursor-based pagination solves this. Instead of "give me page 2," the client says "give me 20 tweets older than tweet ID 1892347." The cursor is a pointer to the last item seen. New tweets arriving at the top of the timeline do not shift the cursor position. The user sees a consistent stream regardless of how fast new content arrives.

Storage Architecture

The timeline itself lives in Redis for fast reads. But the tweets themselves are stored in a persistent datastore. Twitter uses a MySQL cluster (sharded by user ID) for tweet storage. The Redis timeline contains only tweet IDs, not full tweet objects. When the client requests a timeline, the system reads the list of IDs from Redis, then hydrates them by fetching the full tweet data from the persistent store (with another cache layer in front).

This separation matters. Redis holds the ordering and personalization. The persistent store holds the source of truth. If a Redis node fails, timelines can be recomputed from the tweet store. Slow, but recoverable.

Further Reading

Assignment

A user with 10 million followers posts a tweet. If you used pure fan-out on write, that single tweet would generate 10 million timeline writes.

  1. Calculate how long this fan-out would take if each write takes 1ms and you have 100 fan-out workers running in parallel.
  2. Design a hybrid system. Define your threshold for "celebrity" accounts. What data do you use to set this threshold?
  3. Draw the sequence diagram for a timeline request where User B follows 3 regular users and 2 celebrities. Show every system interaction.
  4. Your Redis cluster holding timelines loses a node. 5% of users have lost their pre-computed timelines. How do you recover? What is the user experience during recovery?