Course → Module 6: System Design Interview Framework

From Numbers to Architecture

You have your requirements from Session 6.1. You have your scale estimates from Session 6.2. Now you translate both into a diagram. The high-level design (HLD) is the skeleton of your system. Every box represents a component. Every arrow represents data flow. Every component exists because a requirement or a scale constraint demands it.

The HLD is not a wish list of technologies. It is a structural argument. Each component answers a question: what problem does it solve, and why can the adjacent components not solve that problem on their own?

Every box in your diagram should answer: what problem does this solve that the adjacent box cannot? If you cannot answer that question, the box does not belong.

The Standard Components

Most large-scale systems share a common set of building blocks. You do not include all of them in every design. You include the ones your requirements demand.

Component Purpose Include When
Client (Mobile/Web) User interface, input/output Always. Every system has users.
CDN Serve static assets close to users Media-heavy systems, global user base
Load Balancer Distribute traffic across servers More than one application server (almost always)
API Gateway Rate limiting, auth, routing, protocol translation Microservices architecture, public APIs
Application Servers Business logic Always. This is your service tier.
Cache Reduce database load, improve latency Read-heavy systems, hot data patterns
Database Persistent storage Always. Data must survive restarts.
Message Queue Decouple producers from consumers, handle spikes Async workflows, write spikes, cross-service communication
Object Storage Store large files (images, video, backups) Media uploads, file sharing, backups
Notification Service Push notifications, email, SMS Systems that need to alert offline users

High-Level Design: Chat Application

Building on the WhatsApp example from Sessions 6.1 and 6.2, here is the HLD for a one-to-one chat system handling 500M DAU.

flowchart TB subgraph Clients M["Mobile App"] W["Web App"] end subgraph Edge LB["Load Balancer"] end subgraph Services CS["Chat Service
(WebSocket)"] PS["Presence Service"] NS["Notification Service"] US["User Service"] end subgraph Storage MQ["Message Queue
(Kafka)"] DB["Message Store
(Cassandra)"] Cache["Session Cache
(Redis)"] UDB["User DB
(PostgreSQL)"] end M & W -->|"WebSocket"| LB LB --> CS CS -->|"Publish message"| MQ MQ -->|"Persist"| DB CS -->|"Check online?"| PS PS --> Cache CS -->|"User offline"| NS CS -->|"Auth/profile"| US US --> UDB style M fill:#222221,stroke:#c8a882,color:#ede9e3 style W fill:#222221,stroke:#c8a882,color:#ede9e3 style LB fill:#222221,stroke:#6b8f71,color:#ede9e3 style CS fill:#191918,stroke:#c8a882,color:#ede9e3 style PS fill:#191918,stroke:#6b8f71,color:#ede9e3 style NS fill:#191918,stroke:#c47a5a,color:#ede9e3 style US fill:#191918,stroke:#8a8478,color:#ede9e3 style MQ fill:#222221,stroke:#c47a5a,color:#ede9e3 style DB fill:#222221,stroke:#c8a882,color:#ede9e3 style Cache fill:#222221,stroke:#6b8f71,color:#ede9e3 style UDB fill:#222221,stroke:#8a8478,color:#ede9e3

Justifying Each Component

Every box in the diagram above has a reason.

WebSocket connections through Load Balancer. Chat requires real-time bidirectional communication. HTTP polling would generate 500M+ unnecessary requests per minute. WebSockets maintain a persistent connection. The load balancer distributes these connections across Chat Service instances.

Chat Service. Handles the core message flow: receive message from sender, determine if recipient is online, deliver or queue for later delivery. This is the heart of the system.

Message Queue (Kafka). At 231K writes/second, writing directly to the database from the Chat Service would create a bottleneck. The queue absorbs write spikes and decouples message ingestion from persistence. If the database is slow for a few seconds, messages queue up instead of being dropped.

Message Store (Cassandra). At nearly 2 PB/year with 3x replication, you need a database designed for high write throughput and horizontal scaling. Cassandra handles this well. We chose it over PostgreSQL for messages because relational features (joins, transactions) are not needed for message storage.

Session Cache (Redis). The Presence Service needs to know instantly whether a user is online and which Chat Service instance holds their WebSocket connection. Redis provides sub-millisecond lookups for this mapping.

Notification Service. When the recipient is offline, the message must still be delivered eventually. The Notification Service handles push notifications (APNs, FCM) to wake the recipient's device.

User Service and PostgreSQL. User profiles, authentication, and contact lists are relational data with low write volume. PostgreSQL is the right fit: ACID transactions for account operations, and the scale is manageable (user profile reads are cacheable).

Defining API Contracts

After the HLD diagram, define the key APIs. You do not need every endpoint, just the ones that serve the core flow.

Send message (WebSocket frame):

{
  "action": "send_message",
  "to": "user_id_456",
  "content": "Hello",
  "timestamp": 1711929600,
  "client_msg_id": "uuid-abc-123"
}

Receive message (WebSocket frame):

{
  "action": "new_message",
  "from": "user_id_123",
  "content": "Hello",
  "timestamp": 1711929600,
  "msg_id": "server-generated-id",
  "client_msg_id": "uuid-abc-123"
}

Fetch message history (REST):

GET /api/v1/conversations/{conversation_id}/messages?before={msg_id}&limit=50

Notice the client_msg_id in the send message payload. This is an idempotency key. If the client sends the same message twice (due to a network retry), the server can deduplicate using this ID. This small detail shows the interviewer you think about real-world reliability.

HLD Presentation Tips

Draw top to bottom or left to right. Clients at the top, storage at the bottom. Data flows downward. This is the convention interviewers expect.

Label the arrows. An arrow without a label is ambiguous. Does data flow from service A to service B via HTTP, gRPC, or a message queue? Label it.

Start simple, then elaborate. Draw the minimal HLD first (client, server, database). Then add components one by one as you explain why each is needed. This builds a narrative. It is far more effective than presenting a complex diagram all at once and then trying to explain it.

Separate the read path from the write path. In many systems, reads and writes follow different paths through the architecture. Making this explicit shows depth. For the chat system: the write path goes Client to Chat Service to Kafka to Cassandra. The read path for message history goes Client to Chat Service to Cassandra (or Cache).

Further Reading

Assignment

Draw the high-level design for WhatsApp based on your requirements from Session 6.1 and your estimates from Session 6.2. Your diagram should include:

  1. All components from client to storage
  2. Labeled arrows showing data flow and protocols
  3. A one-sentence justification for each component
  4. At least two API contracts for the core flow (send message, receive message, or fetch history)

Compare your diagram with the one in this session. What did you include that we did not? What did you leave out? Both differences are worth examining.