Chat Application
Session 7.4 · ~5 min read
What Makes Chat Different
A chat application looks simple on the surface: User A types a message, User B receives it. But the engineering challenges are unique. Unlike a web application where the client initiates every interaction, a chat system must push messages to clients the moment they arrive. The server cannot wait for the client to ask. It must deliver proactively.
WhatsApp handles approximately 100 billion messages per day across 2 billion users, with only about 50 engineers. That efficiency comes from making sharp architectural choices and sticking with them.
Key insight: A chat system is a delivery guarantee problem. The hard part is not sending the message. It is knowing whether the message arrived.
Connection Model: WebSockets
HTTP is request-response. The client asks, the server answers. For chat, you need a persistent, bidirectional channel. WebSockets provide exactly this. After an initial HTTP handshake, the connection upgrades to a full-duplex TCP connection. Both sides can send data at any time without waiting for the other.
WhatsApp's original architecture used persistent TCP connections managed by Erlang processes. Each connected device maintains one process on a frontend server. No connection pooling. No multiplexing. One connection, one process. This sounds wasteful until you realize that Erlang processes are extremely lightweight (roughly 2 KB each) and the BEAM VM can handle millions of them on a single machine.
For a system design interview, WebSockets are the standard answer. The connection is established when the app opens and maintained as long as the user is active. A heartbeat mechanism detects dropped connections.
High-Level Architecture
Each chat server maintains WebSocket connections to a set of users. When User A sends a message to User B, Chat Server 1 publishes the message to a message queue. Chat Server 2 (where User B is connected) consumes the message and pushes it down User B's WebSocket. If User B is offline, the message is stored and a push notification is sent instead.
Message Delivery: Online Path
When both users are online and connected to the system, message delivery is straightforward but involves multiple guarantee checks.
Notice the two checkmarks. The first (sent) confirms the server received and stored the message. The second (delivered) confirms the recipient's device received it. WhatsApp adds a third state: read, triggered when the recipient actually opens the conversation. Each state is a separate acknowledgment flowing back through the system.
Message Delivery: Offline Path
The harder case. User B is not connected. Their phone is off, out of range, or the app is backgrounded. The message must still arrive eventually. This is store-and-forward.
When the message queue tries to deliver to Chat Server 2 and finds no active connection for User B, it triggers two actions. First, the message is already persisted in the message store (Cassandra), tagged as undelivered. Second, a push notification is sent through APNs (Apple) or FCM (Google) to wake the device and alert the user.
When User B comes back online and establishes a WebSocket connection, the chat server queries the message store for all undelivered messages for User B, sorted by timestamp, and pushes them down the connection. Once User B's device acknowledges receipt, the messages are marked as delivered.
Storage Decisions
| Data Type | Store | Why |
|---|---|---|
| Messages (text) | Cassandra | Write-heavy, time-ordered, partitioned by conversation ID. Cassandra's LSM-tree storage is optimized for sequential writes. |
| Media (images, video, voice) | Object storage (S3) | Large binary blobs. Store the file in S3, store the S3 URL in the message record. |
| User profiles | MySQL / PostgreSQL | Relational data with consistency needs. Read-heavy, low write volume. |
| Online/presence status | Redis | Ephemeral data. Needs fast reads and writes. TTL-based expiration handles disconnects automatically. |
| Group membership | MySQL / PostgreSQL | Relational. Groups have members, admins, settings. Consistency matters. |
| Undelivered message queue | Cassandra (same table, filtered) | Query: "all messages for user B where delivered = false, ordered by timestamp." |
Cassandra is the dominant choice for message storage because of its write performance and natural time-series partitioning. Messages within a conversation are stored in a single partition, ordered by timestamp. Reading a conversation is a single sequential scan of one partition. WhatsApp's engineering team has spoken about running Cassandra clusters handling 230,000 writes per second.
Group Chat: The Fan-Out Problem Returns
One-to-one chat is point-to-point delivery. Group chat is one-to-many. When User A sends a message to a group of 100 members, the system must deliver 99 copies (everyone except the sender). This is the same fan-out problem from Session 7.2, but at message-level granularity.
The approach: the chat server publishes the message to a group topic in the message queue. Each member's chat server subscribes to that topic. When the message arrives, each server pushes it to the connected members. Offline members get store-and-forward treatment as before.
WhatsApp caps groups at 1,024 members. This is not an arbitrary limit. It is a fan-out constraint. A message to a 1,024-member group generates 1,023 delivery operations. At WhatsApp's message volume, larger groups would create unsustainable write amplification.
End-to-End Encryption
WhatsApp uses the Signal Protocol for end-to-end encryption. The key principle: the server never sees plaintext messages. Messages are encrypted on the sender's device and decrypted only on the recipient's device. The server stores and forwards ciphertext.
The protocol uses X3DH (Extended Triple Diffie-Hellman) to establish a shared secret between two devices, and the Double Ratchet algorithm to generate a new encryption key for every single message. Even if an attacker compromises one message key, they cannot decrypt past or future messages. This property is called forward secrecy.
For system design purposes, encryption adds two constraints. First, the server cannot index or search message content (it is ciphertext). Features like server-side search require separate encrypted indexes. Second, multi-device support is complex because each device has its own key pair. Sending a message to a user with 3 devices means encrypting the message 3 times, once with each device's public key.
Presence and Typing Indicators
The "online" indicator and "typing..." status are presence features. They are ephemeral, high-frequency, and tolerance for staleness is high. Nobody cares if the "online" status is 5 seconds stale.
Presence is stored in Redis with a TTL. When a user's WebSocket sends a heartbeat, the TTL is refreshed. When the heartbeat stops (disconnect), the key expires automatically. Typing indicators are even more transient. They are sent as fire-and-forget messages through the WebSocket. No persistence. No retry. If the indicator is lost, the worst case is the recipient does not see "typing..." for a few seconds.
Further Reading
- How WhatsApp Handles 100 Billion Messages Per Day, ByteByteGo
- 8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day with Only 32 Engineers, System Design One
- Why Signal Protocol is Well-Designed, Praetorian
- Design WhatsApp System Design Interview, AlgoMaster
Assignment
User A sends a message to User B. User B's phone is off. Walk through the complete journey:
- Where is the message stored after User A sends it? What acknowledgment does User A see?
- How does the system know User B is offline? What specific mechanism detects this?
- What happens when User B turns on their phone and opens the app? Describe every step from WebSocket establishment to message display.
- How does User A know the message was delivered? Trace the delivery receipt back through every component.
- Now add end-to-end encryption. At which points in the flow is the message plaintext vs. ciphertext? Where are the encryption and decryption operations?