API Gateway
Session 2.4 · ~5 min read
The Front Door Problem
As your system grows from one service to many, clients face a problem. Which service handles authentication? Which one enforces rate limits? Where does SSL terminate? If you have 12 microservices, do clients need to know about all 12 endpoints?
The answer is no. You place a single component at the front door that handles cross-cutting concerns and routes requests to the right backend. This component is the API Gateway.
What an API Gateway Does
An API Gateway is a server that acts as the single entry point for all client requests. It receives API calls, applies policies (authentication, rate limiting, transformation), routes them to the appropriate backend service, and returns the response.
The API Gateway pattern was formalized by Chris Richardson on microservices.io and has become a standard component in microservices architectures. Major implementations include AWS API Gateway, Kong, Apigee, and Azure API Management.
An API Gateway typically handles five responsibilities:
1. Request Routing
The gateway maps incoming requests to backend services. A request to /users/123 goes to the User Service. A request to /orders/456 goes to the Order Service. The client sends everything to one hostname, and the gateway figures out where it goes.
This decouples clients from the internal service topology. You can split, merge, or relocate backend services without changing any client code. The gateway absorbs the complexity.
2. Authentication and Authorization
Instead of every service implementing its own authentication logic, the gateway verifies identity once at the edge. It validates JWT tokens, checks API keys, or integrates with identity providers (OAuth 2.0, OpenID Connect). The backend services receive pre-authenticated requests with the user identity attached as a header.
This centralizes security logic. One place to patch, one place to audit, one place to update when the authentication scheme changes.
3. Rate Limiting
The gateway enforces rate limits before requests reach backend services. A free-tier client gets 100 requests per minute. An enterprise client gets 10,000. Requests that exceed the limit receive a 429 Too Many Requests response without consuming any backend resources.
Rate limiting at the gateway is more efficient than rate limiting at individual services because it protects all services from abuse through a single checkpoint.
4. Request and Response Transformation
The gateway can modify requests and responses in transit. Common transformations include:
- Adding headers (correlation IDs, authentication context)
- Rewriting URL paths (versioning:
/v2/usersmaps to the same service as/v1/usersbut with a different backend path) - Aggregating responses from multiple services into a single response
- Protocol translation (accepting REST from clients but calling gRPC services internally)
5. Monitoring and Logging
Because every request passes through the gateway, it is the natural place to collect metrics. Request counts, latency distributions, error rates, and traffic patterns per service, per client, per endpoint. This data feeds into dashboards, alerts, and capacity planning.
API Gateway vs. Load Balancer
API gateways and load balancers are complementary, not competing. They solve different problems and typically coexist in the same architecture.
| Responsibility | API Gateway | Load Balancer |
|---|---|---|
| Primary purpose | API management and policy enforcement | Traffic distribution across server instances |
| OSI layer | Layer 7 (application) | Layer 4 or Layer 7 |
| Routing logic | API-aware: paths, versions, client identity | Server-aware: health, capacity, connection count |
| Authentication | Yes (JWT, API keys, OAuth) | No (or limited to ALB OIDC) |
| Rate limiting | Yes (per client, per endpoint) | No |
| Request transformation | Yes (headers, body, protocol) | No |
| Health checks | Usually delegates to LB | Yes (active and passive) |
| SSL termination | Yes | Yes (ALB and NLB) |
| Scaling backend | Not its job | Primary job |
The short version: the API Gateway decides what to do with the request. The load balancer decides which server handles it.
Request Flow
In a typical production setup, the request passes through multiple components in sequence. Each one applies a specific transformation or routing decision.
Auth, rate limit,
transform, route"] GW --> LB["Load Balancer
Distribute to
healthy instance"] LB --> S1["Service Instance 1"] LB --> S2["Service Instance 2"] LB --> S3["Service Instance 3"]
The client sends a request to the API Gateway. The gateway validates the API key, checks the rate limit, determines which backend service should handle the request, and forwards it. The request then hits a load balancer (one per service or a shared one with path-based routing), which selects a healthy instance of that service. The instance processes the request and sends the response back through the same chain.
In some architectures, the API Gateway itself sits behind a load balancer (or an NLB for static IPs), and a CDN sits in front of everything for cached content. The full chain can look like this:
(static IP)"] NLB --> GW["API Gateway"] GW --> ALB["ALB
(path routing)"] ALB --> SVC["Service
Instances"]
Each hop adds latency. Each component adds operational cost. The systems thinker asks: does each component in this chain justify its existence? If your API Gateway already does path-based routing, do you also need an ALB? If your CDN already terminates SSL, does the gateway need to do it again?
Common API Gateway Products
| Product | Type | Notable Strength |
|---|---|---|
| AWS API Gateway | Managed (serverless) | Native Lambda integration, usage plans, no infrastructure to manage |
| Kong | Open-source / Enterprise | Plugin ecosystem, runs on NGINX, highly extensible |
| Apigee | Managed (Google Cloud) | API analytics, developer portal, monetization |
| Azure API Management | Managed (Azure) | Policy engine, multi-region, developer portal |
| Envoy + custom control plane | Self-managed | Service mesh integration, fine-grained traffic control |
The Gateway Bloat Problem
A warning from Microsoft's microservices architecture guide: as teams add features to the API Gateway, it tends to grow into a monolith of its own. Every team wants their custom routing rule, their special header, their exception to the rate limit.
The solution is the Backends for Frontends (BFF) pattern: instead of one gateway for all clients, you deploy separate gateways for the mobile app, the web app, and the admin dashboard. Each gateway is tailored to its client's needs and maintained by the team that owns that client. This prevents the single gateway from becoming a coordination bottleneck across teams.
Systems Thinking Lens
The API Gateway is a leverage point in the system. Because every request passes through it, a small change at the gateway has a large effect across all services. Adding authentication at the gateway secures every service at once. A misconfigured rate limit at the gateway blocks every client at once.
This is high leverage, which also means high risk. The gateway is a single point of failure for the entire API surface. If it goes down, everything goes down. This is why production gateways are themselves deployed behind load balancers, across multiple availability zones, with health checks and automatic failover.
The feedback loop is clear: more services lead to more routing rules in the gateway, which leads to more complexity, which leads to more risk of misconfiguration, which leads to more outages. The balancing force is decomposition: splitting into multiple gateways or using infrastructure-as-code to keep configuration auditable and version-controlled.
Further Reading
- Chris Richardson, Pattern: API Gateway / Backends for Frontends. The canonical pattern definition from microservices.io.
- AWS, API Gateway Pattern. AWS Prescriptive Guidance on implementing the pattern in cloud architectures.
- Microsoft, The API Gateway Pattern vs. Direct Client-to-Microservice Communication. Thorough comparison with architecture diagrams and tradeoff analysis.
- Kong, API Gateway vs Load Balancer. Practical comparison of when to use each and how they complement each other.
- Microsoft, API Gateways in Microservices. Azure Architecture Center guide covering gateway design considerations.
Assignment
Draw the complete request flow for the following scenario. Label every component and describe what each one does to the request as it passes through.
Scenario: A mobile app sends a POST /api/v2/orders request to create a new order. The system has the following components:
- A CDN (for static assets, not relevant for this POST request)
- An NLB providing a static IP entry point
- An API Gateway handling authentication, rate limiting, and version routing
- An ALB distributing traffic across Order Service instances
- Three instances of the Order Service
For each component in the chain, answer:
- What does this component check or modify on the request?
- Under what condition would this component reject the request (return an error) instead of forwarding it?
- If this component were removed, what would break or degrade?