Load Balancing Algorithms
Session 2.3 · ~5 min read
The Algorithm Decides
A load balancer receives a request. It has five healthy servers behind it. Which one gets the request?
That decision is made by a load balancing algorithm. The algorithm you choose determines how traffic is distributed, how well your system handles heterogeneous servers, and how it behaves under uneven load. The right algorithm depends on your traffic pattern, your server fleet, and what assumptions you can safely make.
Round-Robin
Round-robin distributes requests sequentially across all servers in a fixed rotation. Server 1, then Server 2, then Server 3, then back to Server 1. Every server gets the same number of requests.
Round-robin is the simplest algorithm and the default in most load balancers, including NGINX. It works well when all servers have identical hardware specs and all requests take roughly the same time to process.
It fails when either of those assumptions breaks. If one server is slower than the others, it still receives the same number of requests, creating a bottleneck. If some requests are expensive (a complex database query) and others are cheap (serving a cached response), round-robin ignores this completely. It distributes requests evenly, not load evenly.
Weighted Round-Robin
Weighted round-robin assigns each server a weight proportional to its capacity. A server with weight 3 receives three requests for every one request sent to a server with weight 1.
This solves the heterogeneous server problem. If Server A has 8 CPU cores and Server B has 4, you give Server A a weight of 2 and Server B a weight of 1. The load balancer sends twice as many requests to Server A.
The weights are static. You configure them once and they do not change unless you update the configuration. This means weighted round-robin cannot adapt to runtime conditions like one server running a memory-intensive background job.
weight: 3"] LB -->|"Request 4"| B["Server B
weight: 1"] LB -->|"Requests 5, 6"| C["Server C
weight: 2"] LB -->|"Requests 7, 8, 9"| A LB -->|"Request 10"| B
In this example, for every 6 requests, Server A gets 3, Server C gets 2, and Server B gets 1. The distribution matches their configured capacity.
Least Connections
Least connections sends each new request to the server with the fewest active connections at that moment. It is a dynamic algorithm that adapts to real-time load.
This is the right choice when requests have highly variable processing times. A server handling a long-running request will have that connection counted against it, so new requests will be routed elsewhere. Servers that finish requests quickly will naturally receive more.
Least connections is particularly effective for WebSocket applications, database connection pools, and any workload where some requests hold connections open much longer than others.
The weakness is that it still assumes all servers have equal capacity. A small server with 2 active connections and a large server with 2 active connections are treated identically, even though the large server could handle 10 more. This is solved by weighted least connections, which divides the connection count by the server's weight before comparing.
IP Hash
IP hash computes a hash of the client's IP address and uses the result to deterministically select a server. The same client IP always maps to the same server.
IP hash provides session affinity without cookies or shared session stores. If a user's session data is stored in memory on Server B, IP hash ensures that user's requests always go to Server B.
The tradeoff is uneven distribution. IP hash does not guarantee equal traffic across servers. A single corporate office behind a NAT gateway might have thousands of users sharing one IP address, all routed to the same server. If a server goes down, all clients mapped to it are remapped, which invalidates their sessions.
IP hash is also disrupted by adding or removing servers. When the number of servers changes, the hash function redistributes most clients to different servers. Consistent hashing (covered in Session 3.6) solves this by minimizing remapping when the server pool changes.
Random
Random selection picks a server at random for each request. With a large number of requests, the distribution approaches uniform.
Random selection is rarely the primary algorithm, but it appears in two important contexts. First, as a tiebreaker when other algorithms produce equal candidates. Second, in the power of two random choices algorithm: pick two servers at random, then send the request to whichever has fewer connections. This achieves near-optimal load distribution with minimal coordination, and it is used in systems like Envoy proxy.
Comparison Table
| Algorithm | How It Works | Best For | Fails When |
|---|---|---|---|
| Round-robin | Sequential rotation through server list | Identical servers, uniform request cost | Servers differ in capacity or requests differ in cost |
| Weighted round-robin | Round-robin proportional to configured weights | Heterogeneous server fleet with known capacity ratios | Runtime conditions change (weights are static) |
| Least connections | Route to server with fewest active connections | Variable request durations, long-lived connections | Servers have unequal capacity (use weighted variant) |
| IP hash | Hash client IP to deterministically select server | Session affinity without shared session store | Uneven IP distribution (NAT), server pool changes |
| Random | Pick a server at random | Tiebreaking, "power of two choices" variant | Small request volumes (randomness needs large N to converge) |
Choosing an Algorithm
Start with these questions:
- Are all servers identical? If yes, round-robin or least connections. If no, use a weighted variant.
- Do requests vary in processing time? If yes, least connections outperforms round-robin. If no, round-robin is simpler and sufficient.
- Do you need session affinity? If yes, IP hash or sticky sessions at the load balancer level. If no, connection-based algorithms are more efficient.
- Is the server pool stable? If servers are frequently added and removed (autoscaling), IP hash will cause frequent remapping. Least connections adapts naturally.
Systems Thinking Lens
The load balancing algorithm is a control mechanism in a feedback loop. The system measures some signal (connection count, server order, client IP) and uses it to make a routing decision. The quality of that signal determines the quality of the decision.
Round-robin uses no signal at all. It is open-loop control: distribute evenly and hope for the best. Least connections closes the loop by measuring actual server state. IP hash closes a different loop: ensuring client-server affinity.
When you see a system with uneven load despite a load balancer, the first question is: what signal is the algorithm using, and does that signal actually correlate with server load? If the algorithm is round-robin and one server is overloaded, the algorithm literally cannot see the problem. Switching to least connections adds the feedback the system needs to self-correct.
Further Reading
- NGINX, Using nginx as HTTP Load Balancer. Official documentation covering round-robin, least-connected, and IP hash configuration.
- NGINX, HTTP Load Balancing. Extended guide covering weighted methods, health checks, and session persistence.
- AWS, What Is Load Balancing?. Overview of load balancing concepts and algorithm types.
- Sam Rose, Load Balancing. Interactive visual explanation of load balancing algorithms with animated simulations.
- Wikipedia, Load Balancing (Computing). Comprehensive reference covering all major algorithms and their formal properties.
Assignment
You have three servers behind a load balancer:
- Server A: 8 cores, 32 GB RAM
- Server B: 8 cores, 32 GB RAM
- Server C: 16 cores, 64 GB RAM (twice the capacity of A or B)
Answer the following:
- Which load balancing algorithm would you choose for this fleet? What weights would you assign?
- If you used plain round-robin instead, what would happen to Server A and Server B during peak traffic? What would happen to Server C?
- Now suppose requests vary wildly in cost: some take 10ms and others take 5 seconds. Does your answer to question 1 change? If so, what algorithm would you use instead, and why?