Course → Module 2: Scalability, Load Balancing & API Design

The Front Door Problem

As your system grows from one service to many, clients face a problem. Which service handles authentication? Which one enforces rate limits? Where does SSL terminate? If you have 12 microservices, do clients need to know about all 12 endpoints?

The answer is no. You place a single component at the front door that handles cross-cutting concerns and routes requests to the right backend. This component is the API Gateway.

What an API Gateway Does

An API Gateway is a server that acts as the single entry point for all client requests. It receives API calls, applies policies (authentication, rate limiting, transformation), routes them to the appropriate backend service, and returns the response.

The API Gateway pattern was formalized by Chris Richardson on microservices.io and has become a standard component in microservices architectures. Major implementations include AWS API Gateway, Kong, Apigee, and Azure API Management.

An API Gateway typically handles five responsibilities:

1. Request Routing

The gateway maps incoming requests to backend services. A request to /users/123 goes to the User Service. A request to /orders/456 goes to the Order Service. The client sends everything to one hostname, and the gateway figures out where it goes.

This decouples clients from the internal service topology. You can split, merge, or relocate backend services without changing any client code. The gateway absorbs the complexity.

2. Authentication and Authorization

Instead of every service implementing its own authentication logic, the gateway verifies identity once at the edge. It validates JWT tokens, checks API keys, or integrates with identity providers (OAuth 2.0, OpenID Connect). The backend services receive pre-authenticated requests with the user identity attached as a header.

This centralizes security logic. One place to patch, one place to audit, one place to update when the authentication scheme changes.

3. Rate Limiting

The gateway enforces rate limits before requests reach backend services. A free-tier client gets 100 requests per minute. An enterprise client gets 10,000. Requests that exceed the limit receive a 429 Too Many Requests response without consuming any backend resources.

Rate limiting at the gateway is more efficient than rate limiting at individual services because it protects all services from abuse through a single checkpoint.

4. Request and Response Transformation

The gateway can modify requests and responses in transit. Common transformations include:

5. Monitoring and Logging

Because every request passes through the gateway, it is the natural place to collect metrics. Request counts, latency distributions, error rates, and traffic patterns per service, per client, per endpoint. This data feeds into dashboards, alerts, and capacity planning.

API Gateway vs. Load Balancer

API gateways and load balancers are complementary, not competing. They solve different problems and typically coexist in the same architecture.

Responsibility API Gateway Load Balancer
Primary purpose API management and policy enforcement Traffic distribution across server instances
OSI layer Layer 7 (application) Layer 4 or Layer 7
Routing logic API-aware: paths, versions, client identity Server-aware: health, capacity, connection count
Authentication Yes (JWT, API keys, OAuth) No (or limited to ALB OIDC)
Rate limiting Yes (per client, per endpoint) No
Request transformation Yes (headers, body, protocol) No
Health checks Usually delegates to LB Yes (active and passive)
SSL termination Yes Yes (ALB and NLB)
Scaling backend Not its job Primary job

The short version: the API Gateway decides what to do with the request. The load balancer decides which server handles it.

Request Flow

In a typical production setup, the request passes through multiple components in sequence. Each one applies a specific transformation or routing decision.

graph LR C[Client] --> GW["API Gateway
Auth, rate limit,
transform, route"] GW --> LB["Load Balancer
Distribute to
healthy instance"] LB --> S1["Service Instance 1"] LB --> S2["Service Instance 2"] LB --> S3["Service Instance 3"]

The client sends a request to the API Gateway. The gateway validates the API key, checks the rate limit, determines which backend service should handle the request, and forwards it. The request then hits a load balancer (one per service or a shared one with path-based routing), which selects a healthy instance of that service. The instance processes the request and sends the response back through the same chain.

In some architectures, the API Gateway itself sits behind a load balancer (or an NLB for static IPs), and a CDN sits in front of everything for cached content. The full chain can look like this:

graph LR C[Client] --> CDN[CDN] CDN --> NLB["NLB
(static IP)"] NLB --> GW["API Gateway"] GW --> ALB["ALB
(path routing)"] ALB --> SVC["Service
Instances"]

Each hop adds latency. Each component adds operational cost. The systems thinker asks: does each component in this chain justify its existence? If your API Gateway already does path-based routing, do you also need an ALB? If your CDN already terminates SSL, does the gateway need to do it again?

Common API Gateway Products

Product Type Notable Strength
AWS API Gateway Managed (serverless) Native Lambda integration, usage plans, no infrastructure to manage
Kong Open-source / Enterprise Plugin ecosystem, runs on NGINX, highly extensible
Apigee Managed (Google Cloud) API analytics, developer portal, monetization
Azure API Management Managed (Azure) Policy engine, multi-region, developer portal
Envoy + custom control plane Self-managed Service mesh integration, fine-grained traffic control

The Gateway Bloat Problem

A warning from Microsoft's microservices architecture guide: as teams add features to the API Gateway, it tends to grow into a monolith of its own. Every team wants their custom routing rule, their special header, their exception to the rate limit.

The solution is the Backends for Frontends (BFF) pattern: instead of one gateway for all clients, you deploy separate gateways for the mobile app, the web app, and the admin dashboard. Each gateway is tailored to its client's needs and maintained by the team that owns that client. This prevents the single gateway from becoming a coordination bottleneck across teams.

Systems Thinking Lens

The API Gateway is a leverage point in the system. Because every request passes through it, a small change at the gateway has a large effect across all services. Adding authentication at the gateway secures every service at once. A misconfigured rate limit at the gateway blocks every client at once.

This is high leverage, which also means high risk. The gateway is a single point of failure for the entire API surface. If it goes down, everything goes down. This is why production gateways are themselves deployed behind load balancers, across multiple availability zones, with health checks and automatic failover.

The feedback loop is clear: more services lead to more routing rules in the gateway, which leads to more complexity, which leads to more risk of misconfiguration, which leads to more outages. The balancing force is decomposition: splitting into multiple gateways or using infrastructure-as-code to keep configuration auditable and version-controlled.

Further Reading

Assignment

Draw the complete request flow for the following scenario. Label every component and describe what each one does to the request as it passes through.

Scenario: A mobile app sends a POST /api/v2/orders request to create a new order. The system has the following components:

  • A CDN (for static assets, not relevant for this POST request)
  • An NLB providing a static IP entry point
  • An API Gateway handling authentication, rate limiting, and version routing
  • An ALB distributing traffic across Order Service instances
  • Three instances of the Order Service

For each component in the chain, answer:

  1. What does this component check or modify on the request?
  2. Under what condition would this component reject the request (return an error) instead of forwarding it?
  3. If this component were removed, what would break or degrade?