Session 3.4: Database Replication

Course → Module 3: Storage, Databases & Caching

Why Replicate?

A single database server is a single point of failure. If it goes down, your application goes down. Replication copies data from one server to others, providing two benefits: fault tolerance (if one server dies, another has the data) and read scaling (distribute read queries across multiple copies).

Database replication is the process of maintaining copies of the same data on multiple database servers. The source of writes is called the primary (or leader). The copies are called replicas (or followers). Replication can be synchronous or asynchronous, and uses either single-primary or multi-primary topology.

Replication and sharding solve different problems. Sharding splits data to handle more writes and store more data. Replication copies data to handle more reads and survive failures. Most production databases use both: shard the data across multiple primaries, then replicate each shard to one or more replicas.

Primary-Replica Replication

The most common replication topology. All writes go to a single primary server. The primary streams changes (via a write-ahead log, binary log, or oplog) to one or more replica servers. Replicas apply these changes and serve read queries.

graph TD App["Application"] --> W["Writes"] App --> R["Reads"] W --> Primary[("Primary
(read/write)")] Primary -->|"WAL stream"| R1[("Replica 1
(read-only)")] Primary -->|"WAL stream"| R2[("Replica 2
(read-only)")] Primary -->|"WAL stream"| R3[("Replica 3
(read-only)")] R --> R1 R --> R2 R --> R3

This topology is straightforward. There is no write conflict because only one server accepts writes. Replicas are read-only, which means you can add replicas to scale read throughput linearly. Three replicas handle roughly three times the read queries of a single server.

The downside is write scalability. All writes funnel through one server. If your write throughput exceeds what one machine can handle, primary-replica replication alone will not help. You need sharding for write scaling.

Failover is the other concern. If the primary dies, one of the replicas must be promoted to primary. This can be automated (most managed databases do this) but involves a brief window where writes are unavailable. The promoted replica may also be slightly behind the old primary, meaning some recent writes could be lost in asynchronous replication setups.

Multi-Primary Replication

Multi-primary replication (also called multi-leader or active-active) allows writes on multiple servers. Each primary replicates its changes to the others. This enables write availability across regions but introduces write conflicts that must be detected and resolved.

Multi-primary replication is used when you need to accept writes in multiple geographic regions. A user in Jakarta writes to the Jakarta primary. A user in Singapore writes to the Singapore primary. Each primary sends its changes to the other. Reads are served locally, and writes do not need to cross the ocean before being acknowledged.

The fundamental problem is conflicts. If user A updates their email on the Jakarta primary and user B updates the same email field on the Singapore primary at the same time, which write wins? Conflict resolution strategies include:

Last-write-wins (LWW): The write with the later timestamp wins. Simple but loses data silently. Clock synchronization across data centers is imperfect.
Application-level resolution: The database stores both versions and lets the application decide. CouchDB and DynamoDB use this approach.
CRDTs (Conflict-free Replicated Data Types): Data structures designed to merge automatically without conflicts. Works for counters, sets, and some text editing scenarios but not for arbitrary data.

Synchronous vs. Asynchronous Replication

When the primary commits a write, it can wait for replicas to confirm before acknowledging the client (synchronous), or it can acknowledge immediately and let replicas catch up later (asynchronous). This is a fundamental tradeoff between durability and latency.

Dimension	Synchronous	Asynchronous	Semi-Synchronous
Write latency	High (waits for replica ACK)	Low (immediate ACK from primary)	Medium (waits for 1 replica)
Data durability	Strong (data on multiple nodes)	Weak (data may exist only on primary)	Moderate (data on at least 2 nodes)
Risk of data loss on failure	None (all replicas confirmed)	Possible (unsynced writes lost)	Minimal (1 replica confirmed)
Impact of slow replica	Blocks all writes	No impact on writes	Blocks if the sync replica is slow
Throughput	Limited by slowest replica	Limited only by primary	Limited by one replica
Common use	Financial systems, compliance	Most web applications	PostgreSQL (sync standby), MySQL Group Replication

Most production systems use asynchronous replication because the latency penalty of synchronous replication is too high, especially across data centers. A write that takes 2ms on the primary might take 50-200ms if it must wait for a replica in another region. Semi-synchronous replication is a common compromise: wait for at least one replica to confirm, then acknowledge the client.

Replication Lag

Replication lag is the delay between a write committed on the primary and that write becoming visible on a replica. In asynchronous replication, lag is typically under a second but can spike to minutes during high write load, long-running queries on replicas, or network congestion.

Replication lag causes stale reads. A user updates their profile on the primary, then immediately loads the profile page. If the read is routed to a replica that has not yet received the update, the user sees old data. This is the "read-your-own-writes" problem.

Lag is not constant. During normal operation, it might be 10-50 milliseconds. During a traffic spike that doubles write throughput, the replica's single-threaded apply process falls behind. Long-running analytical queries on replicas can also cause lag by competing for I/O and CPU resources.

The chart above simulates typical replication lag over a 24-hour period. During off-peak hours (midnight to 6 AM), lag stays under 20ms. During peak load (late morning and evening), lag spikes above the 200ms threshold. These spikes correlate with write volume: when the primary processes more writes per second, the replica falls further behind.

Handling Replication Lag

Several strategies mitigate the effects of replication lag:

Read-your-own-writes: After a user writes data, route that user's subsequent reads to the primary (or a replica known to be up-to-date) for a short window. This ensures users see their own changes immediately.
Monotonic reads: Ensure a user always reads from the same replica. This prevents the confusing experience of seeing newer data, then older data on the next request (because a different, more-lagged replica served it).
Lag-aware routing: Monitor replication lag on each replica. Route reads only to replicas within an acceptable lag threshold. If all replicas are too far behind, fall back to the primary.
Causal consistency: Track causal dependencies between writes and reads. A read that depends on a specific write waits until the replica has applied that write before returning results.

Replication Strategy Comparison

Dimension	Primary-Replica	Multi-Primary
Write target	Single primary only	Any primary
Conflict potential	None (single writer)	Yes (concurrent writes on different primaries)
Write availability	Lost during failover	Available in all regions
Write latency	Low if client is near primary	Low (write to nearest primary)
Complexity	Low	High (conflict resolution logic)
Consistency	Strong (reads from primary)	Eventual (cross-region convergence)
Use case	Most applications	Multi-region, offline-capable, collaborative editing

Systems Thinking Lens

Replication creates a fundamental balancing loop between consistency and performance. Adding replicas increases read throughput (reinforcing loop), but each replica introduces replication lag (balancing loop). The faster you write, the further behind replicas fall, and the more stale reads your users experience.

The leverage point is not the replication strategy itself but the read/write ratio of your workload. If your application is 95% reads and 5% writes, primary-replica replication with asynchronous replication works well. The lag affects only a small fraction of operations. If your application is 50% writes, replication lag becomes a dominant concern, and you may need synchronous replication or architectural changes (like event sourcing) that decouple reads from writes entirely.

Multi-primary replication has a hidden reinforcing loop: the more primaries you add, the more conflict resolution logic you need, the more edge cases emerge, the more engineering time you spend debugging subtle data inconsistencies instead of building features.

Assignment

Your application has a primary database server in Jakarta and an asynchronous replica in Surabaya. A user in Jakarta updates their profile (changes their display name). One second later, the same user loads their profile page. The read is routed to the Surabaya replica for load balancing.

What will the user likely see? Explain why, referencing replication lag and the asynchronous replication model.
Describe the "read-your-own-writes" problem in your own words. Why is it particularly frustrating for the user who just made the change?
Propose three different solutions to fix this problem. For each solution, explain the mechanism and the tradeoff (what it costs in terms of complexity, latency, or infrastructure).
Under what circumstances would you choose synchronous replication instead? What would that do to the user's write latency, given the network distance between Jakarta and Surabaya?

Database Replication