Step 3: High-Level Design
Session 6.3 · ~5 min read
From Numbers to Architecture
You have your requirements from Session 6.1. You have your scale estimates from Session 6.2. Now you translate both into a diagram. The high-level design (HLD) is the skeleton of your system. Every box represents a component. Every arrow represents data flow. Every component exists because a requirement or a scale constraint demands it.
The HLD is not a wish list of technologies. It is a structural argument. Each component answers a question: what problem does it solve, and why can the adjacent components not solve that problem on their own?
Every box in your diagram should answer: what problem does this solve that the adjacent box cannot? If you cannot answer that question, the box does not belong.
The Standard Components
Most large-scale systems share a common set of building blocks. You do not include all of them in every design. You include the ones your requirements demand.
| Component | Purpose | Include When |
|---|---|---|
| Client (Mobile/Web) | User interface, input/output | Always. Every system has users. |
| CDN | Serve static assets close to users | Media-heavy systems, global user base |
| Load Balancer | Distribute traffic across servers | More than one application server (almost always) |
| API Gateway | Rate limiting, auth, routing, protocol translation | Microservices architecture, public APIs |
| Application Servers | Business logic | Always. This is your service tier. |
| Cache | Reduce database load, improve latency | Read-heavy systems, hot data patterns |
| Database | Persistent storage | Always. Data must survive restarts. |
| Message Queue | Decouple producers from consumers, handle spikes | Async workflows, write spikes, cross-service communication |
| Object Storage | Store large files (images, video, backups) | Media uploads, file sharing, backups |
| Notification Service | Push notifications, email, SMS | Systems that need to alert offline users |
High-Level Design: Chat Application
Building on the WhatsApp example from Sessions 6.1 and 6.2, here is the HLD for a one-to-one chat system handling 500M DAU.
(WebSocket)"] PS["Presence Service"] NS["Notification Service"] US["User Service"] end subgraph Storage MQ["Message Queue
(Kafka)"] DB["Message Store
(Cassandra)"] Cache["Session Cache
(Redis)"] UDB["User DB
(PostgreSQL)"] end M & W -->|"WebSocket"| LB LB --> CS CS -->|"Publish message"| MQ MQ -->|"Persist"| DB CS -->|"Check online?"| PS PS --> Cache CS -->|"User offline"| NS CS -->|"Auth/profile"| US US --> UDB style M fill:#222221,stroke:#c8a882,color:#ede9e3 style W fill:#222221,stroke:#c8a882,color:#ede9e3 style LB fill:#222221,stroke:#6b8f71,color:#ede9e3 style CS fill:#191918,stroke:#c8a882,color:#ede9e3 style PS fill:#191918,stroke:#6b8f71,color:#ede9e3 style NS fill:#191918,stroke:#c47a5a,color:#ede9e3 style US fill:#191918,stroke:#8a8478,color:#ede9e3 style MQ fill:#222221,stroke:#c47a5a,color:#ede9e3 style DB fill:#222221,stroke:#c8a882,color:#ede9e3 style Cache fill:#222221,stroke:#6b8f71,color:#ede9e3 style UDB fill:#222221,stroke:#8a8478,color:#ede9e3
Justifying Each Component
Every box in the diagram above has a reason.
WebSocket connections through Load Balancer. Chat requires real-time bidirectional communication. HTTP polling would generate 500M+ unnecessary requests per minute. WebSockets maintain a persistent connection. The load balancer distributes these connections across Chat Service instances.
Chat Service. Handles the core message flow: receive message from sender, determine if recipient is online, deliver or queue for later delivery. This is the heart of the system.
Message Queue (Kafka). At 231K writes/second, writing directly to the database from the Chat Service would create a bottleneck. The queue absorbs write spikes and decouples message ingestion from persistence. If the database is slow for a few seconds, messages queue up instead of being dropped.
Message Store (Cassandra). At nearly 2 PB/year with 3x replication, you need a database designed for high write throughput and horizontal scaling. Cassandra handles this well. We chose it over PostgreSQL for messages because relational features (joins, transactions) are not needed for message storage.
Session Cache (Redis). The Presence Service needs to know instantly whether a user is online and which Chat Service instance holds their WebSocket connection. Redis provides sub-millisecond lookups for this mapping.
Notification Service. When the recipient is offline, the message must still be delivered eventually. The Notification Service handles push notifications (APNs, FCM) to wake the recipient's device.
User Service and PostgreSQL. User profiles, authentication, and contact lists are relational data with low write volume. PostgreSQL is the right fit: ACID transactions for account operations, and the scale is manageable (user profile reads are cacheable).
Defining API Contracts
After the HLD diagram, define the key APIs. You do not need every endpoint, just the ones that serve the core flow.
Send message (WebSocket frame):
{
"action": "send_message",
"to": "user_id_456",
"content": "Hello",
"timestamp": 1711929600,
"client_msg_id": "uuid-abc-123"
}
Receive message (WebSocket frame):
{
"action": "new_message",
"from": "user_id_123",
"content": "Hello",
"timestamp": 1711929600,
"msg_id": "server-generated-id",
"client_msg_id": "uuid-abc-123"
}
Fetch message history (REST):
GET /api/v1/conversations/{conversation_id}/messages?before={msg_id}&limit=50
Notice the client_msg_id in the send message payload. This is an idempotency key. If the client sends the same message twice (due to a network retry), the server can deduplicate using this ID. This small detail shows the interviewer you think about real-world reliability.
HLD Presentation Tips
Draw top to bottom or left to right. Clients at the top, storage at the bottom. Data flows downward. This is the convention interviewers expect.
Label the arrows. An arrow without a label is ambiguous. Does data flow from service A to service B via HTTP, gRPC, or a message queue? Label it.
Start simple, then elaborate. Draw the minimal HLD first (client, server, database). Then add components one by one as you explain why each is needed. This builds a narrative. It is far more effective than presenting a complex diagram all at once and then trying to explain it.
Separate the read path from the write path. In many systems, reads and writes follow different paths through the architecture. Making this explicit shows depth. For the chat system: the write path goes Client to Chat Service to Kafka to Cassandra. The read path for message history goes Client to Chat Service to Cassandra (or Cache).
Further Reading
- System Design Interview, Vol. 1 by Alex Xu, Chapter 12: Design a Chat System
- Facebook Engineering: Building Mobile-First Infrastructure for Messenger
- InfoQ: The WhatsApp Architecture Facebook Bought for $19 Billion
- Martin Fowler: Richardson Maturity Model (API design levels)
Assignment
Draw the high-level design for WhatsApp based on your requirements from Session 6.1 and your estimates from Session 6.2. Your diagram should include:
- All components from client to storage
- Labeled arrows showing data flow and protocols
- A one-sentence justification for each component
- At least two API contracts for the core flow (send message, receive message, or fetch history)
Compare your diagram with the one in this session. What did you include that we did not? What did you leave out? Both differences are worth examining.