Course → Module 8: Real-World Case Studies II

The Problem

Two users are editing the same document at the same time. User A types "hello" at position 10. User B, at the same moment, deletes characters 5 through 8. Both operations are valid locally. But when they arrive at the server, the positions no longer mean the same thing. User A's "position 10" assumed the original document. After User B's deletion, position 10 refers to a different location in the text.

Without a conflict resolution mechanism, concurrent edits produce garbled text, lost characters, or duplicated content. The core challenge of collaborative editing is not networking or storage. It is making concurrent operations converge to the same result on every device, regardless of the order they arrive.

Key insight: Collaboration is a conflict resolution problem. The algorithm determines whose keystrokes survive and how positions shift when edits overlap. Get it wrong, and users see different versions of the same document.

High-Level Architecture

graph TD U1[User A
Browser] -->|WebSocket| GW[Gateway Service] U2[User B
Browser] -->|WebSocket| GW U3[User C
Browser] -->|WebSocket| GW GW --> OT[OT / CRDT Engine] OT --> DS[Document Store] OT --> PS[Presence Service] DS --> DB[(Document DB
Snapshots + Op Log)] PS --> GW GW -->|Broadcast transformed ops| U1 GW -->|Broadcast transformed ops| U2 GW -->|Broadcast transformed ops| U3

Each user connects to the server via a persistent WebSocket. Edits are represented as operations (insert, delete, format) and sent to the server. The OT/CRDT engine transforms these operations to account for concurrency, applies them to the canonical document state, and broadcasts the transformed operations to all other connected users. The presence service tracks cursor positions, selections, and user activity for the colored cursors visible in the editor.

Operational Transformation (OT)

OT is the algorithm Google Docs uses. It works by transforming operations against each other so that they produce consistent results regardless of execution order.

Consider this scenario. The document contains "ABCDEF" (6 characters). User A inserts "X" at position 2. User B deletes the character at position 4 (the letter "D"). Both edits happen simultaneously.

sequenceDiagram participant A as User A participant S as Server participant B as User B Note over A,B: Document: "ABCDEF" A->>S: Insert "X" at position 2 B->>S: Delete at position 4 Note over S: Server receives A first S->>S: Apply A: "ABXCDEF" S->>S: Transform B against A:
B's position 4 shifts to 5
(insertion before position 4) S->>S: Apply transformed B: delete at 5
"ABXCEF" S->>A: Transformed B: delete at position 5 S->>B: Transformed A: insert "X" at position 2 Note over A,B: Both see: "ABXCEF"

The transformation logic is straightforward in this case. User A inserted a character before User B's target position. So User B's position must shift right by 1. Without this transformation, User B would delete the character at position 4 in the modified document, which is now "C" instead of "D." The wrong character disappears.

OT requires a central server to establish a canonical operation order. Every operation carries a revision number. The server maintains a linear history of applied operations. When an operation arrives that was generated against an older revision, the server transforms it against all operations that have been applied since that revision. This is called the transformation pipeline.

CRDTs: The Decentralized Alternative

Conflict-free Replicated Data Types take a fundamentally different approach. Instead of transforming operations, CRDTs embed enough metadata in the data structure itself to guarantee that all replicas converge automatically, without a central server.

In a CRDT-based text editor, every character has a unique, globally ordered identifier. This identifier is not a simple position number. It is a tuple of (site ID, logical clock, fractional position) that establishes a total order across all characters inserted by all users. When User A inserts between characters with IDs 1.0 and 2.0, the new character gets an ID like 1.5. This ID never changes, even as other characters are inserted or deleted around it.

Because every character has a stable, unique ID, operations never conflict. An insert specifies "place this character after ID X." A delete specifies "remove the character with ID Y." These operations commute: the final result is the same regardless of the order they are applied.

OT vs. CRDT Comparison

Dimension Operational Transformation CRDTs
Central server required? Yes. Server establishes canonical order. No. Replicas converge independently.
Complexity Transform functions for every operation pair. Correctness proofs are notoriously difficult. Data structure design is complex upfront, but operations are simpler.
Bandwidth Low. Only operations are transmitted. Higher. Each character carries metadata (IDs, tombstones).
Offline support Difficult. Offline edits require complex rebasing against server history. Natural. Offline edits merge automatically on reconnect.
Undo Complex. Must invert and re-transform operations. Complex. Must track causal history per operation.
Used by Google Docs, Microsoft Office Online Figma, Yjs, Automerge
Memory overhead Low (operation log) Higher (per-character metadata, tombstones for deleted characters)

The Delta-Based Document Model

Rather than storing the full document text, collaborative editors typically store the document as a sequence of deltas (operations). The initial document is a snapshot. Every edit after that is a delta applied to the snapshot. To reconstruct the current document, replay all deltas from the last snapshot.

Periodic snapshots compress the history. Instead of replaying 10,000 operations from the beginning of the document, the system saves a snapshot every N operations and only replays from the most recent snapshot. This bounds the cost of opening a document regardless of its edit history.

WebSocket for Real-Time Sync

Collaborative editing requires bidirectional, low-latency communication. HTTP request-response is too slow: each edit would require a new connection. WebSocket maintains a persistent, full-duplex connection between the browser and the server.

The server broadcasts transformed operations to all connected clients within milliseconds. Cursor positions, selections, and typing indicators also flow through this channel. The result is the "live" feeling of seeing other users' cursors move in real time.

Connection management adds complexity. If a user's WebSocket drops (network change, laptop sleep), the client must reconnect, fetch any operations it missed, and apply them to its local state. The server maintains a per-document operation log that clients can replay from their last known revision.

Offline Mode and Reconciliation

A user edits a document on a plane with no internet. They make 50 edits over two hours. When they land and reconnect, those 50 operations must be integrated with whatever changes other users made during the same period.

With OT, this is expensive. Each offline operation must be transformed against every server operation that happened in the interim. If 200 operations occurred on the server, each of the 50 offline operations must be transformed against all 200, creating a cascade of transformations.

With CRDTs, offline reconciliation is straightforward. The client sends its operations. Because CRDT operations commute (order does not matter), the server simply applies them. No transformation needed. This is the primary reason newer collaboration tools like Figma adopted CRDTs over OT.

Further Reading

Assignment

Two users are editing the same document simultaneously. The document currently reads "The quick brown fox." User A inserts "very " at position 4 (before "quick"). User B inserts "old " at position 4 (also before "quick"), at the same time.

  1. Without any conflict resolution, what does each user see after applying both operations locally? Why do they see different results?
  2. With OT, the server receives User A's operation first. How does it transform User B's operation? What does the final document read?
  3. If this system used CRDTs instead, how would the character IDs prevent the conflict? What metadata ensures a consistent ordering?
  4. User A goes offline for 30 minutes and makes 20 edits. Meanwhile, 100 operations occur on the server. Compare the reconciliation cost for OT vs. CRDT when User A reconnects.