Course → Module 2: Scalability, Load Balancing & API Design

What Load Balancers Do

A load balancer sits between clients and a pool of servers. It receives incoming requests and forwards each one to a server that can handle it. The goals are straightforward: distribute traffic evenly, detect and route around failed servers, and allow the backend to scale without clients needing to know about it.

But not all load balancers work the same way. The two major categories, defined by where they operate in the network stack, are Layer 4 (transport) and Layer 7 (application). In AWS terminology, these map to the Network Load Balancer (NLB) and the Application Load Balancer (ALB).

The OSI Model Context

To understand the difference, you need to know two layers of the OSI model:

The layer at which a load balancer operates determines what it can see and what routing decisions it can make.

graph TB Client[Client] --> L7["Layer 7 (ALB)
Reads HTTP headers, paths, cookies"] Client --> L4["Layer 4 (NLB)
Reads TCP/UDP: IP + port only"] L7 -->|"/api/*"| Backend1[API Servers] L7 -->|"/static/*"| Backend2[Static Servers] L4 -->|"TCP forward"| Backend3[Server Pool]

Application Load Balancer (ALB)

ALB operates at Layer 7. It understands HTTP and HTTPS, can inspect request content, and makes routing decisions based on URL paths, hostnames, HTTP headers, query strings, and source IP.

ALB is the default choice for web applications. It supports:

The tradeoff is latency. Because the ALB reads and parses the full HTTP request before making a routing decision, it adds processing time. For most web applications, this overhead is negligible (single-digit milliseconds). For ultra-low-latency workloads, it matters.

Network Load Balancer (NLB)

NLB operates at Layer 4. It routes TCP and UDP traffic based on IP addresses and port numbers without inspecting the application-layer content. It is designed for extreme throughput and ultra-low latency.

NLB is built for raw performance. It supports:

The tradeoff is intelligence. NLB cannot route based on URL paths, HTTP headers, or cookies. It sees packets, not requests. If you need content-aware routing, NLB cannot do it alone.

Comparison Table

Dimension ALB (Layer 7) NLB (Layer 4)
OSI Layer Layer 7 (Application) Layer 4 (Transport)
Protocols HTTP, HTTPS, WebSocket, gRPC TCP, UDP, TLS
Routing intelligence Path, host, header, query string, source IP IP address and port only
SSL/TLS handling Terminates and re-encrypts (offloading) Passthrough or terminate
Static IP No (DNS-based, IP can change) Yes (one per AZ)
Latency Low (single-digit ms) Ultra-low (sub-ms)
Throughput High Extreme (millions of requests/sec)
Source IP preservation Via X-Forwarded-For header Native (client IP visible directly)
Sticky sessions Cookie-based Source IP-based
PrivateLink Not supported Supported
Cost model Per hour + LCU (capacity units) Per hour + NLCU (capacity units)
Best for Web apps, REST APIs, microservices IoT, gaming, real-time, non-HTTP protocols

When to Use Which

The decision usually comes down to what the load balancer needs to understand about the traffic.

If your load balancer needs to inspect HTTP content to make routing decisions (path-based routing, host-based routing, header inspection), use ALB. If it just needs to forward packets as fast as possible, use NLB.

A common production pattern combines both: an NLB as the external entry point (for static IPs and PrivateLink) that forwards to an ALB for content-based routing. This gives you the best of both layers, at the cost of additional hops and complexity.

Systems Thinking Lens

The choice of load balancer is not a local decision. It propagates through the system. Choosing ALB means your backend servers do not need to handle TLS termination, saving CPU. But it also means you depend on ALB's connection limits and request processing latency. Choosing NLB means your servers see the real client IP natively, but your application code must handle routing logic that ALB would have done for you.

Every capability you move into the load balancer is a capability you remove from your application. Every capability you keep in the application is latency you add to the load balancer. This is a tradeoff, not a best practice.

Further Reading

Assignment

For each scenario below, choose the appropriate load balancer type (ALB or NLB) and explain your reasoning:

  1. A REST API where /api/v1/* routes to the backend service cluster and /static/* routes to a CDN origin server. The team wants SSL offloading so backend servers do not handle TLS.
  2. An IoT platform that accepts 1 million persistent TCP connections from embedded sensors. Each sensor sends a 64-byte payload every 30 seconds. Partners need to whitelist specific IP addresses in their firewalls.
  3. A microservices application with 12 services. The team wants the load balancer to handle SSL termination for all HTTPS traffic and authenticate users via an OIDC provider before requests reach any backend service.

For each answer, identify what would go wrong if you chose the other load balancer type instead.