Session 7.5: Ride-Hailing

Course → Module 7: Real-World Case Studies I

The Core Problem

A ride-hailing platform connects riders who need a car with drivers who have one. That sentence sounds simple. The engineering behind it is not. At Uber's scale, the system handles roughly 31 million trips per day, with 2 million online drivers sending GPS coordinates every 4 seconds. That produces around 500,000 location writes per second at peak.

The fundamental challenge is real-time spatial matching: given a rider's location, find the nearest available drivers within seconds. This is a search problem over constantly moving data points on a spherical surface.

Ride-hailing is a real-time matching problem built on top of a geospatial indexing problem. The matching algorithm is only as good as the spatial index that feeds it.

Geospatial Indexing

You cannot query "find all drivers within 2 km" by computing the distance from the rider to every single driver. With 2 million online drivers, that brute-force approach would take too long and consume too many resources. Instead, the system partitions the Earth's surface into a grid of cells, assigns each driver to a cell based on their current coordinates, and only searches nearby cells when a match request arrives.

Three indexing systems dominate this space.

Method	Grid Shape	Hierarchy	Edge Distortion	Used By
Geohash	Rectangles	Z-order curve, prefix-based	High near poles and cell boundaries	Redis GEO, Elasticsearch
Google S2	Quadrilaterals (on sphere)	Hilbert curve on cube faces	Low (projects onto sphere directly)	Google Maps, early Uber
Uber H3	Hexagons	Icosahedron-based, 16 resolutions	Minimal (equidistant neighbors)	Uber (post-2018), spatial analytics

Geohash is the simplest. It encodes latitude and longitude into a single string by interleaving bits. Longer strings mean smaller cells. The problem: two points on opposite sides of a cell boundary can have completely different geohash prefixes, making boundary queries tricky.

Google S2 projects the Earth onto a cube, then uses a Hilbert curve to map 2D regions to 1D ranges. This preserves spatial locality better than geohash. Uber originally used S2 for their dispatch system.

Uber later developed H3, a hexagonal grid. Hexagons have a useful property: every neighbor is equidistant from the center. With squares or rectangles, diagonal neighbors are farther away than edge neighbors. In a city where people are constantly moving in all directions, hexagons minimize the quantization error. H3 supports 16 resolution levels, from continent-sized hexagons down to roughly 1 square meter.

graph TB subgraph "Geospatial Index" A["Earth Surface"] --> B["Divide into H3 Hexagonal Cells"] B --> C["Resolution 7: ~5 km²"] B --> D["Resolution 9: ~0.1 km²"] B --> E["Resolution 12: ~3 m²"] end subgraph "Driver Location Store" F["Driver sends GPS
(lat, lng, timestamp)"] --> G["Compute H3 cell ID
at resolution 9"] G --> H["Write to in-memory store
key: cell_id → driver_list"] end subgraph "Match Query" I["Rider requests ride"] --> J["Compute rider's H3 cell"] J --> K["k-ring: get neighboring cells"] K --> L["Fetch drivers from all cells"] L --> M["Rank by distance + ETA"] end style A fill:#222221,stroke:#c8a882,color:#ede9e3 style B fill:#222221,stroke:#c8a882,color:#ede9e3 style C fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#6b8f71,color:#ede9e3 style E fill:#222221,stroke:#6b8f71,color:#ede9e3 style F fill:#222221,stroke:#c8a882,color:#ede9e3 style G fill:#222221,stroke:#c8a882,color:#ede9e3 style H fill:#222221,stroke:#6b8f71,color:#ede9e3 style I fill:#222221,stroke:#c8a882,color:#ede9e3 style J fill:#222221,stroke:#c8a882,color:#ede9e3 style K fill:#222221,stroke:#c8a882,color:#ede9e3 style L fill:#222221,stroke:#6b8f71,color:#ede9e3 style M fill:#222221,stroke:#6b8f71,color:#ede9e3

The "k-ring" operation is critical. Given a cell, it returns all cells within k hops. For finding nearby drivers, you start with k=1 (the 6 immediate neighbors plus the center cell), fetch all drivers in those 7 cells, and expand the ring if you need more candidates.

High-Level Architecture

graph LR R["Rider App"] -->|"Request ride"| GW["API Gateway"] D["Driver App"] -->|"GPS every 4s"| GW GW --> LS["Location Service
(in-memory, sharded)"] GW --> MS["Matching Service"] GW --> TS["Trip Service"] MS -->|"Query nearby drivers"| LS MS -->|"Create trip"| TS TS --> DB["Trip DB
(PostgreSQL)"] TS --> Q["Event Queue
(Kafka)"] Q --> PS["Pricing Service"] Q --> NS["Notification Service"] Q --> AN["Analytics"] D -->|"Accept/Reject"| GW style R fill:#222221,stroke:#6b8f71,color:#ede9e3 style D fill:#222221,stroke:#6b8f71,color:#ede9e3 style GW fill:#222221,stroke:#c8a882,color:#ede9e3 style LS fill:#222221,stroke:#c47a5a,color:#ede9e3 style MS fill:#222221,stroke:#c47a5a,color:#ede9e3 style TS fill:#222221,stroke:#c47a5a,color:#ede9e3 style DB fill:#222221,stroke:#8a8478,color:#ede9e3 style Q fill:#222221,stroke:#8a8478,color:#ede9e3 style PS fill:#222221,stroke:#6b8f71,color:#ede9e3 style NS fill:#222221,stroke:#6b8f71,color:#ede9e3 style AN fill:#222221,stroke:#6b8f71,color:#ede9e3

The Location Service is the hottest component. It must absorb 500K writes/sec and serve spatial queries with low latency. This service stores driver locations in memory, sharded by geographic region. It does not use a traditional database for the live location data. Persistent storage (for trip history, analytics) comes later via Kafka.

The Matching Service receives a ride request, queries the Location Service for nearby available drivers, ranks them by estimated time of arrival, and dispatches a request to the best candidate. If the driver declines, the system moves to the next candidate.

Trip State Machine

Every trip follows a strict lifecycle. State machines prevent invalid transitions (you cannot complete a trip that was never started) and make the system easier to reason about during failures.

stateDiagram-v2 [*] --> Requested : Rider requests Requested --> Matching : Find drivers Matching --> DriverAssigned : Driver accepts Matching --> Cancelled : No driver / timeout DriverAssigned --> EnRoute : Driver heading to pickup EnRoute --> Arrived : Driver at pickup Arrived --> InProgress : Trip started InProgress --> Completed : Arrived at destination InProgress --> Cancelled : Rider cancels mid-trip Completed --> [*] Cancelled --> [*]

Each state transition triggers events. "DriverAssigned" triggers a push notification to the rider. "InProgress" starts the fare meter. "Completed" triggers payment processing and receipt generation. The state machine is the backbone that coordinates all downstream services.

Write Load: The Numbers

Consider the scale calculation for driver location updates.

With 50,000 drivers updating every 3 seconds: 50,000 / 3 = ~16,667 writes per second. At Uber's global scale of 2 million drivers updating every 4 seconds: 2,000,000 / 4 = 500,000 writes per second. No single relational database handles this. The solution is an in-memory store (like Redis or a custom service), sharded by H3 cell ranges, with asynchronous persistence to durable storage.

Surge Pricing as a Reinforcing Loop

Surge pricing connects directly to the reinforcing loops from Session 0.4. When demand exceeds supply in a region, prices increase. Higher prices attract more drivers to that region (supply increases). Higher prices also discourage some riders (demand decreases). This is a balancing mechanism layered on top of a reinforcing signal.

But there is a reinforcing loop hiding underneath. When surge pricing activates, drivers from neighboring regions migrate toward the surge zone. This creates undersupply in those neighboring regions, which triggers surge pricing there, which causes further driver migration. The surge can propagate outward like a wave. Uber mitigates this with predictive positioning, incentivizing drivers to stay in areas where demand is expected to rise.

Assignment

Your city has 50,000 active drivers. Each driver sends a GPS update every 3 seconds.

Calculate the write QPS for the location service.
A relational database like PostgreSQL maxes out at roughly 10,000-30,000 writes/sec on a single node. Can it handle this load? What alternatives would you use?
A rider requests a ride. Design the flow to find the 5 nearest available drivers. Which geospatial index would you use and why? What is the k-ring radius you would start with?
Draw a causal loop diagram showing how surge pricing propagates across neighboring regions. Identify the reinforcing and balancing loops.

Ride-Hailing