Course → Module 5: Distributed Systems & Consensus

More Than a Message Queue

Apache Kafka is not a traditional message broker. It is a distributed, append-only commit log. Messages are not deleted after consumption. They are retained for a configurable period (or indefinitely with log compaction). This single design decision changes everything about how you build event-driven systems.

Kafka was originally built at LinkedIn to handle the firehose of activity data: page views, searches, profile updates. The engineering team published benchmarks showing 2 million writes per second on three commodity machines. Today, Kafka powers event streaming at Netflix, Uber, Airbnb, and thousands of other organizations processing trillions of messages per day.

Core Architecture

Kafka's storage model is a partitioned, replicated, append-only log. Producers write to the end of a partition. Consumers read from any offset. Messages are never modified in place. This sequential I/O pattern is why Kafka saturates disk throughput far more efficiently than random-access message brokers.

Four concepts define Kafka's architecture:

Brokers are the servers that store data and serve client requests. A Kafka cluster typically runs 3 or more brokers for fault tolerance. Each broker stores a subset of the total data.

Topics are named categories of messages. An "orders" topic holds order events. A "payments" topic holds payment events. Topics are logical groupings. The physical storage is handled by partitions.

Partitions are the unit of parallelism. Each topic is split into one or more partitions. Each partition is an ordered, immutable sequence of messages. Messages within a single partition are strictly ordered. Messages across partitions have no ordering guarantee. When a producer sends a message, it goes to one partition (determined by a partition key, round-robin, or custom logic).

Consumer groups enable parallel consumption. Each consumer in a group is assigned one or more partitions. A partition is assigned to exactly one consumer within a group. If a topic has 6 partitions and a consumer group has 3 consumers, each consumer reads from 2 partitions. If you add a 7th consumer, it sits idle because there are only 6 partitions.

graph TB subgraph Topic: orders P0[Partition 0] P1[Partition 1] P2[Partition 2] end subgraph Consumer Group A CA1[Consumer A1] -.-> P0 CA2[Consumer A2] -.-> P1 CA3[Consumer A3] -.-> P2 end subgraph Consumer Group B CB1[Consumer B1] -.-> P0 CB1 -.-> P1 CB2[Consumer B2] -.-> P2 end style P0 fill:#222221,stroke:#c8a882,color:#ede9e3 style P1 fill:#222221,stroke:#c8a882,color:#ede9e3 style P2 fill:#222221,stroke:#c8a882,color:#ede9e3 style CA1 fill:#191918,stroke:#6b8f71,color:#ede9e3 style CA2 fill:#191918,stroke:#6b8f71,color:#ede9e3 style CA3 fill:#191918,stroke:#6b8f71,color:#ede9e3 style CB1 fill:#191918,stroke:#c47a5a,color:#ede9e3 style CB2 fill:#191918,stroke:#c47a5a,color:#ede9e3

In this diagram, Consumer Group A has 3 consumers, each reading one partition. Consumer Group B has 2 consumers, so one consumer handles two partitions. Both groups read the same data independently. This is how Kafka combines pub/sub (multiple consumer groups) with queue semantics (competing consumers within a group).

Partition Count and Throughput

More partitions means more parallelism. A topic with 1 partition can only be read by 1 consumer per group. A topic with 64 partitions can be read by up to 64 consumers per group. But more partitions also means more open file handles, longer leader election times during broker failures, and higher end-to-end latency for ordered processing.

The following chart shows approximate write throughput scaling with partition count on a 3-broker cluster. These numbers are derived from LinkedIn's published benchmarks and Confluent's performance testing guides. Actual results depend on message size, replication factor, acknowledgment settings, and hardware.

Notice that throughput does not scale linearly. Going from 1 to 4 partitions gives roughly 3.5x improvement. Going from 16 to 64 gives only 1.5x. The bottleneck shifts from partition parallelism to network I/O, disk bandwidth, and replication overhead. Choosing the right partition count is about matching your expected consumer parallelism, not maximizing the number.

Ordering Guarantees

Kafka guarantees ordering within a partition but not across partitions. If you need all events for a specific order to be processed in sequence, use the order ID as the partition key. All events for that order will land in the same partition and be read in order. Events for different orders can be processed in parallel across partitions.

This is a common source of bugs. If you use a random partition key (or round-robin), two events for the same entity can end up in different partitions and be processed out of order by different consumers.

Log Compaction

Standard retention deletes messages after a time period (7 days by default). Log compaction is different: it keeps the latest value for each key and removes older values. This turns a Kafka topic into a key-value snapshot.

Example: a "user-profiles" topic. Every time a user updates their profile, a new message is produced with the user ID as the key. Log compaction ensures the topic always contains the latest profile for every user, even if the original message was written months ago. New consumers can read the compacted topic to rebuild the full current state.

Delivery Guarantees

Guarantee How It Works When to Use Risk
At-most-once Consumer commits offset before processing. If it crashes mid-processing, the message is skipped. Metrics, logging where losing a few messages is acceptable Message loss
At-least-once Consumer commits offset after processing. If it crashes after processing but before committing, the message is reprocessed. Most workloads. Make consumers idempotent to handle duplicates. Duplicate processing
Exactly-once Idempotent producer (sequence numbers per partition) + transactional APIs. Consume-transform-produce as an atomic unit. Financial transactions, inventory updates, any workflow where duplicates cause real harm Higher latency, more complexity

Exactly-Once Semantics: How It Works

Kafka's exactly-once support (introduced in KIP-98, Kafka 0.11) has two components:

Idempotent producers. Each producer gets a unique producer ID (PID). Each message to a partition carries a sequence number. The broker rejects any message whose sequence number is not exactly one greater than the last committed sequence for that PID and partition. Duplicates from retries are silently discarded.

Transactions. A producer can start a transaction, write to multiple partitions, commit consumer offsets, and then commit or abort the entire transaction atomically. Consumers configured with isolation.level=read_committed only see messages from committed transactions. This enables consume-transform-produce pipelines where the input offset commit and the output writes are atomic.

Kafka as Event Store

Because Kafka retains messages (unlike traditional queues that delete after consumption), it can serve as an event store. The full history of state changes is preserved. Any new service can replay the topic from the beginning to rebuild its own view of the data. This is the foundation of event sourcing architectures built on Kafka.

graph LR subgraph Kafka Cluster T1[orders topic] T2[payments topic] T3[inventory topic] end OS[Order Service] -->|produce| T1 PS[Payment Service] -->|produce| T2 IS[Inventory Service] -->|produce| T3 T1 -->|consume| Analytics[Analytics Service] T2 -->|consume| Analytics T3 -->|consume| Analytics T1 -->|replay from offset 0| NewService[New Reporting Service] style T1 fill:#222221,stroke:#c8a882,color:#ede9e3 style T2 fill:#222221,stroke:#c8a882,color:#ede9e3 style T3 fill:#222221,stroke:#c8a882,color:#ede9e3 style NewService fill:#191918,stroke:#6b8f71,color:#ede9e3

The new reporting service joins months after launch. It resets its consumer offset to zero and replays the entire order history to build its reports. No backfill scripts needed. The data is already in Kafka.

Systems Thinking Lens

Kafka's partition model creates a reinforcing loop: more partitions enable more consumers, which enables higher throughput, which attracts more use cases, which demands more partitions. Without governance, this leads to thousands of topics with unclear ownership. The balancing force is operational cost. Each partition consumes memory, file handles, and replication bandwidth. The systems thinker asks: what is the minimum partition count that serves our current parallelism needs, with room to grow but not room to waste?

Further Reading

Assignment

You are designing the Kafka infrastructure for an e-commerce platform. Define topics, partition counts, and partition keys for the following event streams. Justify each decision.

  1. Order events (order.created, order.updated, order.cancelled). The platform processes 50,000 orders per day. Three downstream services consume order events: fulfillment, analytics, and notifications.
  2. Payment events (payment.initiated, payment.completed, payment.failed). Payments must be processed in order per transaction. The system handles 100,000 payment events per day.
  3. Inventory events (stock.updated, stock.reserved, stock.released). 200,000 events per day across 50,000 SKUs. The warehouse service must see all events for a given SKU in order.

For each topic, specify: (a) the partition key, (b) the number of partitions and why, (c) the delivery guarantee (at-least-once or exactly-once) and why, (d) the retention policy (time-based or log compaction) and why.