Session 2.3: Load Balancing Algorithms

Course → Module 2: Scalability, Load Balancing & API Design

The Algorithm Decides

A load balancer receives a request. It has five healthy servers behind it. Which one gets the request?

That decision is made by a load balancing algorithm. The algorithm you choose determines how traffic is distributed, how well your system handles heterogeneous servers, and how it behaves under uneven load. The right algorithm depends on your traffic pattern, your server fleet, and what assumptions you can safely make.

Round-Robin

Round-robin distributes requests sequentially across all servers in a fixed rotation. Server 1, then Server 2, then Server 3, then back to Server 1. Every server gets the same number of requests.

Round-robin is the simplest algorithm and the default in most load balancers, including NGINX. It works well when all servers have identical hardware specs and all requests take roughly the same time to process.

It fails when either of those assumptions breaks. If one server is slower than the others, it still receives the same number of requests, creating a bottleneck. If some requests are expensive (a complex database query) and others are cheap (serving a cached response), round-robin ignores this completely. It distributes requests evenly, not load evenly.

Weighted Round-Robin

Weighted round-robin assigns each server a weight proportional to its capacity. A server with weight 3 receives three requests for every one request sent to a server with weight 1.

This solves the heterogeneous server problem. If Server A has 8 CPU cores and Server B has 4, you give Server A a weight of 2 and Server B a weight of 1. The load balancer sends twice as many requests to Server A.

The weights are static. You configure them once and they do not change unless you update the configuration. This means weighted round-robin cannot adapt to runtime conditions like one server running a memory-intensive background job.

In this example, for every 6 requests, Server A gets 3, Server C gets 2, and Server B gets 1. The distribution matches their configured capacity.

Least Connections

Least connections sends each new request to the server with the fewest active connections at that moment. It is a dynamic algorithm that adapts to real-time load.

This is the right choice when requests have highly variable processing times. A server handling a long-running request will have that connection counted against it, so new requests will be routed elsewhere. Servers that finish requests quickly will naturally receive more.

Least connections is particularly effective for WebSocket applications, database connection pools, and any workload where some requests hold connections open much longer than others.

The weakness is that it still assumes all servers have equal capacity. A small server with 2 active connections and a large server with 2 active connections are treated identically, even though the large server could handle 10 more. This is solved by weighted least connections, which divides the connection count by the server's weight before comparing.

IP Hash

IP hash computes a hash of the client's IP address and uses the result to deterministically select a server. The same client IP always maps to the same server.

IP hash provides session affinity without cookies or shared session stores. If a user's session data is stored in memory on Server B, IP hash ensures that user's requests always go to Server B.

The tradeoff is uneven distribution. IP hash does not guarantee equal traffic across servers. A single corporate office behind a NAT gateway might have thousands of users sharing one IP address, all routed to the same server. If a server goes down, all clients mapped to it are remapped, which invalidates their sessions.

IP hash is also disrupted by adding or removing servers. When the number of servers changes, the hash function redistributes most clients to different servers. Consistent hashing (covered in Session 3.6) solves this by minimizing remapping when the server pool changes.

Random

Random selection picks a server at random for each request. With a large number of requests, the distribution approaches uniform.

Random selection is rarely the primary algorithm, but it appears in two important contexts. First, as a tiebreaker when other algorithms produce equal candidates. Second, in the power of two random choices algorithm: pick two servers at random, then send the request to whichever has fewer connections. This achieves near-optimal load distribution with minimal coordination, and it is used in systems like Envoy proxy.

Comparison Table

Algorithm	How It Works	Best For	Fails When
Round-robin	Sequential rotation through server list	Identical servers, uniform request cost	Servers differ in capacity or requests differ in cost
Weighted round-robin	Round-robin proportional to configured weights	Heterogeneous server fleet with known capacity ratios	Runtime conditions change (weights are static)
Least connections	Route to server with fewest active connections	Variable request durations, long-lived connections	Servers have unequal capacity (use weighted variant)
IP hash	Hash client IP to deterministically select server	Session affinity without shared session store	Uneven IP distribution (NAT), server pool changes
Random	Pick a server at random	Tiebreaking, "power of two choices" variant	Small request volumes (randomness needs large N to converge)

Choosing an Algorithm

Start with these questions:

Are all servers identical? If yes, round-robin or least connections. If no, use a weighted variant.
Do requests vary in processing time? If yes, least connections outperforms round-robin. If no, round-robin is simpler and sufficient.
Do you need session affinity? If yes, IP hash or sticky sessions at the load balancer level. If no, connection-based algorithms are more efficient.
Is the server pool stable? If servers are frequently added and removed (autoscaling), IP hash will cause frequent remapping. Least connections adapts naturally.

Systems Thinking Lens

The load balancing algorithm is a control mechanism in a feedback loop. The system measures some signal (connection count, server order, client IP) and uses it to make a routing decision. The quality of that signal determines the quality of the decision.

Round-robin uses no signal at all. It is open-loop control: distribute evenly and hope for the best. Least connections closes the loop by measuring actual server state. IP hash closes a different loop: ensuring client-server affinity.

When you see a system with uneven load despite a load balancer, the first question is: what signal is the algorithm using, and does that signal actually correlate with server load? If the algorithm is round-robin and one server is overloaded, the algorithm literally cannot see the problem. Switching to least connections adds the feedback the system needs to self-correct.

Assignment

You have three servers behind a load balancer:

Server A: 8 cores, 32 GB RAM
Server B: 8 cores, 32 GB RAM
Server C: 16 cores, 64 GB RAM (twice the capacity of A or B)

Answer the following:

Which load balancing algorithm would you choose for this fleet? What weights would you assign?
If you used plain round-robin instead, what would happen to Server A and Server B during peak traffic? What would happen to Server C?
Now suppose requests vary wildly in cost: some take 10ms and others take 5 seconds. Does your answer to question 1 change? If so, what algorithm would you use instead, and why?

Load Balancing Algorithms