Module 1: Architectural Foundations & Core Concepts
Systems Thinking × System Design · 10 sessions · ~50 min read
The Foundation of Networked Computing
Nearly every application you use today follows the same structural pattern: one process asks for something, another process provides it. The asker is the client. The provider is the server. This separation of roles is the client-server model, and it has shaped how we build software since the 1960s.
Your browser is a client. When you type a URL and press Enter, it sends a request to a server. The server processes that request, retrieves or computes the appropriate data, and sends back a response. That exchange, from request to response, is the fundamental unit of interaction in networked systems.
Client-Server Model: An architecture in which a client initiates requests and a server fulfills them. The client and server are separate processes, often running on separate machines, communicating over a network using a defined protocol.
The Request-Response Cycle
Every HTTP interaction follows a predictable sequence. The client opens a connection to the server, sends a request containing a method (GET, POST, PUT, DELETE, etc.), a target resource (the URL path), headers (metadata about the request), and optionally a body (the data payload). The server reads the request, performs the work, and sends back a response containing a status code (200 OK, 404 Not Found, 500 Internal Server Error, etc.), headers, and optionally a body.
This cycle repeats for every interaction. Loading a single web page might trigger dozens of request-response cycles: one for the HTML document, several for CSS and JavaScript files, more for images, and additional ones for API calls that fetch dynamic data.
The key property of HTTP, as defined in RFC 7230, is that each request-response pair is an independent transaction. The server does not inherently remember anything about previous requests from the same client. This is what we mean by statelessness.
Stateless vs. Stateful Interactions
HTTP is a stateless protocol by design. Each request carries all the information the server needs to fulfill it. The server does not retain any memory of previous interactions between requests. This simplifies server implementation enormously: any server in a cluster can handle any request, because no request depends on what happened before.
But users expect continuity. When you log in to an application, you expect to stay logged in as you navigate between pages. When you add items to a shopping cart, you expect them to still be there when you visit the checkout page. These expectations require state, meaning data that persists across multiple request-response cycles.
This creates a tension. The protocol is stateless, but the application needs to be stateful. Resolving this tension is one of the core challenges in web application architecture.
| Dimension | Stateless | Stateful |
|---|---|---|
| Server memory | No request context retained between calls | Server tracks client context across requests |
| Scalability | Easy. Any server can handle any request. | Harder. Client must reach the same server, or state must be shared. |
| Reliability | Server crash loses nothing. Client retries freely. | Server crash may lose session data. |
| Complexity | Simpler server logic | Requires session storage, replication, or sticky routing |
| Example | DNS lookup, static file serving, REST API call | Shopping cart, user login session, WebSocket connection |
| Bandwidth | May be higher (client resends context each time) | Lower per-request (context stored on server) |
Session Management Strategies
When an application needs state on top of a stateless protocol, it must use a session management strategy. There are several approaches, each with distinct trade-offs.
1. Server-Side Sessions with Cookies
The most traditional approach. When a user logs in, the server creates a session object in memory (or in a database) and assigns it a unique session ID. This ID is sent to the client as a cookie via the Set-Cookie header, as specified in RFC 6265. On every subsequent request, the browser automatically includes this cookie. The server looks up the session ID, retrieves the stored state, and processes the request with full context.
The advantage is simplicity and security: the actual session data never leaves the server. The disadvantage is that the server must store and manage session data. In a multi-server deployment, either all servers must share a session store (e.g., Redis), or the load balancer must route the same client to the same server (sticky sessions).
2. Token-Based Authentication (JWT)
JSON Web Tokens take a different approach. Instead of storing state on the server, the server encodes the session data (user ID, permissions, expiration time) into a signed token and hands it to the client. The client includes this token in the Authorization header of every request. The server validates the token's signature and extracts the session data without touching any session store.
This approach scales well because any server can validate the token independently. But it introduces new challenges: tokens cannot be easily revoked once issued, and they increase the size of every request. If a token is stolen, the attacker has access until it expires.
3. Client-Side Storage
For non-sensitive state, applications can store data directly in the client using localStorage, sessionStorage, or cookies. Shopping cart contents, user preferences, and UI state are common candidates. This eliminates server-side storage entirely for that data, but the client cannot be trusted. Any data stored client-side can be inspected and modified by the user.
4. Hybrid Approaches
Most production systems combine strategies. A JWT handles authentication (who you are), server-side sessions handle authorization context (what you can do right now), and client-side storage handles UI preferences. The choice depends on security requirements, scalability needs, and how much trust you place in the client.
Systems Thinking Lens
The client-server model is a system with feedback loops. Server load affects response time. Response time affects user behavior. User behavior affects request volume, which feeds back into server load. A slow server causes users to retry, which increases load, which makes the server slower. This is a reinforcing feedback loop, the same concept from Session 0.4.
Session management adds another dimension. Stateful servers introduce coupling between the client and a specific server instance. This coupling constrains how you scale, how you handle failures, and how you deploy updates. Every architectural decision in this space creates downstream constraints on the rest of the system.
Understanding these trade-offs is not about memorizing which approach is "best." It is about recognizing that each choice shifts the balance of complexity, security, and scalability in predictable ways.
Further Reading
- RFC 7230: HTTP/1.1 Message Syntax and Routing. The specification that defines HTTP's request-response model and its stateless nature.
- RFC 6265: HTTP State Management Mechanism. The specification for cookies, the primary mechanism for adding state to HTTP.
- MDN: Using HTTP Cookies. A practical guide to how cookies work in browsers, including security attributes.
- Wikipedia: Client-Server Model. Historical context and variations of the architecture.
Assignment
Open your browser's developer tools (F12 or Ctrl+Shift+I) and go to the Network tab. Navigate to any website you use regularly. Watch the requests appear.
- Identify 3 different requests in the list. For each one, note the method (GET/POST), the URL, and the status code.
- Click on each request and look at the Request Headers. Does the request carry a
Cookieheader? Does it carry anAuthorizationheader? If it carries neither, it is a stateless request. If it carries either, it is transporting session state. - For requests that carry session state, determine the strategy: is it a session cookie (short opaque string) or a JWT (long Base64-encoded string with two dots)?
Write down your findings. You have just traced how a real application manages the tension between a stateless protocol and stateful user experience.
The Network Stack
Before a client can talk to a server, several layers of networking infrastructure must cooperate. Every request you send travels through a stack of protocols, each handling a different concern: addressing, routing, reliability, and application logic. Understanding these layers is essential for diagnosing performance issues and making informed architectural decisions.
This session covers the protocols and components that make networked communication possible: IP, DNS, TCP, UDP, HTTP/HTTPS, WebSocket, and proxies.
IP: Addressing and Routing
The Internet Protocol (IP) is responsible for one thing: getting a packet from one machine to another. Every device on a network has an IP address. IPv4 addresses are 32-bit numbers written as four octets (e.g., 192.168.1.1), giving roughly 4.3 billion possible addresses. IPv6 addresses are 128-bit, written in hexadecimal groups (e.g., 2001:0db8::1), providing a vastly larger address space.
IP is a best-effort protocol. It routes packets toward their destination, but it does not guarantee delivery, ordering, or integrity. Those guarantees are the job of the transport layer above it.
DNS: Translating Names to Addresses
Humans use domain names (google.com, hibranwar.com). Machines use IP addresses. The Domain Name System, defined in RFC 1035, bridges that gap. DNS is a distributed, hierarchical database that maps domain names to IP addresses.
When you type a URL into your browser, a DNS resolution process begins. It follows a chain of servers, each responsible for a different level of the domain hierarchy.
The recursive resolver (often provided by your ISP or a service like Cloudflare's 1.1.1.1) does the heavy lifting. It contacts root servers, TLD servers, and authoritative servers on your behalf. Results are cached at multiple levels, so most lookups resolve quickly from cache rather than traversing the full chain.
DNS Caching: Every DNS response includes a Time-To-Live (TTL) value. Resolvers, operating systems, and browsers all cache results for the TTL duration. This reduces latency and load on DNS servers, but means changes to DNS records take time to propagate.
TCP vs. UDP: The Transport Layer
Once DNS resolves the destination IP, the transport layer takes over. Two protocols dominate this layer: TCP and UDP. They represent fundamentally different trade-offs between reliability and speed.
TCP (RFC 793) provides a reliable, ordered, error-checked byte stream. Before any data flows, TCP establishes a connection using a three-way handshake (SYN, SYN-ACK, ACK). It guarantees that every byte arrives, in order, with no corruption. If a packet is lost, TCP retransmits it. This reliability comes at a cost: latency from the handshake, overhead from acknowledgments, and reduced throughput from congestion control.
UDP (RFC 768) is the opposite. No connection setup. No delivery guarantee. No ordering. No retransmission. A UDP packet is sent and forgotten. The entire UDP header is only 8 bytes, compared to TCP's minimum of 20 bytes. What you lose in reliability, you gain in speed and simplicity.
| Property | TCP | UDP |
|---|---|---|
| Connection | Connection-oriented (three-way handshake) | Connectionless |
| Reliability | Guaranteed delivery, retransmission on loss | No delivery guarantee |
| Ordering | Bytes arrive in order | No ordering guarantee |
| Header size | 20+ bytes | 8 bytes |
| Latency | Higher (handshake + acknowledgments) | Lower (fire and forget) |
| Flow control | Yes (sliding window) | No |
| Use cases | Web browsing, email, file transfer, database queries | Video streaming, voice calls, DNS lookups, online gaming |
The choice between TCP and UDP depends on whether your application can tolerate packet loss. A bank transfer cannot lose data, so it uses TCP. A video call can tolerate a dropped frame (you will not even notice), so it uses UDP. Waiting for retransmission in a live video call would cause stuttering, which is worse than losing a single frame.
HTTP and HTTPS
HTTP (Hypertext Transfer Protocol) is the application-layer protocol that powers the web. It runs on top of TCP (or, in HTTP/3, on top of QUIC, which runs on UDP). HTTP defines how clients and servers structure their messages: request methods, headers, status codes, and bodies.
HTTPS is HTTP with TLS (Transport Layer Security) encryption. The client and server perform a TLS handshake to establish an encrypted channel before any HTTP data flows. This protects data from eavesdropping and tampering in transit. Every production web application should use HTTPS.
WebSocket
HTTP follows a strict request-response pattern: the client asks, the server answers. But some applications need the server to push data to the client without being asked. Chat applications, live dashboards, multiplayer games, and real-time collaboration tools all require this capability.
WebSocket solves this. It begins as a standard HTTP request with an Upgrade header. If the server agrees, the connection is upgraded to a persistent, full-duplex channel. Both client and server can send messages at any time, in either direction, without the overhead of establishing new connections.
WebSocket: A protocol that provides full-duplex communication over a single, long-lived TCP connection. It begins with an HTTP handshake and then upgrades to a persistent channel where both parties can send messages independently.
Forward and Reverse Proxies
A proxy is an intermediary that sits between a client and a server, forwarding requests and responses. There are two types, and they serve very different purposes.
A forward proxy sits in front of clients. The client sends its request to the proxy, and the proxy forwards it to the destination server. The server sees the proxy's IP address, not the client's. Forward proxies are used for privacy, access control (blocking certain websites), and caching. Corporate networks often route all employee traffic through a forward proxy.
A reverse proxy sits in front of servers. The client sends its request to the proxy, thinking it is the actual server. The proxy forwards the request to one of several backend servers. Reverse proxies are used for load balancing, SSL termination, caching, and security (hiding the true server infrastructure). Nginx, HAProxy, and Cloudflare are common reverse proxies.
In systems thinking terms, a reverse proxy is a leverage point. It sits at a junction where many connections converge, making it an ideal place to implement cross-cutting concerns: rate limiting, authentication, logging, compression, and caching. Changing behavior at the proxy affects every request flowing through the system without touching any backend server.
Further Reading
- RFC 1035: Domain Names, Implementation and Specification. The foundational specification for the Domain Name System.
- RFC 793: Transmission Control Protocol. The original TCP specification.
- RFC 768: User Datagram Protocol. The UDP specification. Three pages. The shortest RFC you will ever read.
- Wikipedia: Domain Name System. A thorough overview of DNS architecture, record types, and resolution process.
- MDN: WebSockets API. Practical documentation on using WebSocket in web applications.
Assignment
Answer this question in 3 sentences:
Why does a video call use UDP but a bank transfer uses TCP?
Your answer should reference at least two specific properties from the TCP vs. UDP comparison table (e.g., reliability, ordering, latency). Think about what happens in each scenario when a packet is lost. Which is worse: a brief glitch in video, or a missing digit in a transaction amount?
Two Approaches to API Communication
Once a client and server can communicate over HTTP (Session 1.1 and 1.2), the next question is: how should they structure that communication? What format should the request take? How should the server expose its data and operations?
This is the domain of API protocols. Two dominate modern system design: REST and GraphQL. They solve the same fundamental problem, allowing a client to read and write data on a server, but they make very different design choices about how to do it.
REST: Representational State Transfer
REST is not a protocol. It is an architectural style, defined by Roy Fielding in his 2000 doctoral dissertation. Fielding was one of the principal authors of the HTTP specification, and REST describes the design principles that made HTTP successful.
REST is built on six constraints. Four of them form the uniform interface, which is the defining feature of REST:
REST Uniform Interface: (1) Resources are identified by URIs. (2) Resources are manipulated through representations (JSON, XML, HTML). (3) Messages are self-descriptive, containing all information needed to process them. (4) Hypermedia drives application state (HATEOAS), meaning the server's responses include links to related resources and available actions.
In practice, most REST APIs follow a predictable pattern. Resources map to URL paths. HTTP methods map to operations.
| HTTP Method | Operation | Example | Idempotent? |
|---|---|---|---|
| GET | Read | GET /users/42 |
Yes |
| POST | Create | POST /users |
No |
| PUT | Replace | PUT /users/42 |
Yes |
| PATCH | Partial update | PATCH /users/42 |
No (by convention) |
| DELETE | Delete | DELETE /users/42 |
Yes |
REST inherits HTTP's statelessness. Each request must contain all the information the server needs. No session context is assumed. This makes REST APIs easy to cache (GET responses can be cached by any HTTP cache), easy to scale (any server can handle any request), and easy to reason about (each endpoint has a clear, predictable behavior).
The Over-Fetching and Under-Fetching Problem
REST's simplicity comes with a cost. Each endpoint returns a fixed data structure. If you request GET /users/42, you get the entire user object: name, email, avatar, bio, preferences, creation date, and every other field. If you only needed the name and avatar, you still receive everything else. This is over-fetching.
Conversely, if you need a user's profile along with their 5 most recent posts and each post's comment count, you might need three separate requests: one for the user, one for the posts, one for the comment counts. This is under-fetching, and it means multiple round trips, each adding latency.
On a desktop with a fast connection, this is tolerable. On a mobile device over a cellular network, where each round trip adds 100-300ms of latency, it becomes a real performance problem.
GraphQL: Query What You Need
GraphQL was developed at Facebook in 2012 and open-sourced in 2015. The GraphQL specification describes it as a query language for APIs and a runtime for executing those queries against your data.
The core idea: the client specifies exactly what data it needs, and the server returns exactly that. No more, no less.
A GraphQL API exposes a single endpoint (typically POST /graphql). Instead of choosing between many endpoints, the client sends a query that describes the shape of the data it wants:
# GraphQL query
{
user(id: 42) {
name
avatar
posts(limit: 5) {
title
commentCount
}
}
}
This single request replaces the three REST calls from the previous example. The server returns a JSON response that mirrors the query structure exactly:
{
"data": {
"user": {
"name": "Alice",
"avatar": "/images/alice.jpg",
"posts": [
{ "title": "On Feedback Loops", "commentCount": 12 },
{ "title": "Scaling Lessons", "commentCount": 7 }
]
}
}
}
GraphQL: A query language for APIs where the client defines the structure of the response. Uses a strongly-typed schema to describe available data. All requests go to a single endpoint. Eliminates over-fetching and under-fetching by letting clients request exactly the fields they need.
REST vs. GraphQL: A Comparison
| Dimension | REST | GraphQL |
|---|---|---|
| Endpoints | Multiple (one per resource) | Single (/graphql) |
| Data fetching | Server decides what to return | Client decides what to return |
| Over-fetching | Common (fixed response shapes) | Eliminated (client specifies fields) |
| Under-fetching | Common (multiple round trips) | Eliminated (nested queries in one request) |
| Caching | Straightforward (HTTP caching by URL) | Complex (all requests hit same URL with POST) |
| Versioning | Typically URL-based (/v1/, /v2/) |
Schema evolution (add fields, deprecate old ones) |
| Type system | None built-in (OpenAPI/Swagger is optional) | Strongly typed schema (required) |
| Error handling | HTTP status codes | Always returns 200; errors in response body |
| Learning curve | Low (uses standard HTTP conventions) | Higher (new query language, schema design) |
| Best suited for | Simple CRUD, public APIs, resource-oriented services | Complex data relationships, mobile clients, varied consumer needs |
When REST Shines
REST is the better choice when your data model is simple and resource-oriented. If your API mostly serves CRUD operations (create, read, update, delete) on well-defined entities, REST's predictable URL structure and HTTP caching are hard to beat. Public APIs favor REST because it is universally understood, requires no specialized client libraries, and works with any HTTP client.
REST also excels when caching matters. Because each resource has its own URL, HTTP caches (browsers, CDNs, reverse proxies) can cache responses efficiently. A GET /users/42 response can be cached and reused by any client requesting the same resource.
When GraphQL Shines
GraphQL is the better choice when clients have diverse data needs. A mobile app might need a compact subset of user data. A desktop dashboard might need the full object with related entities. A third-party integration might need a completely different combination. With REST, you either build custom endpoints for each consumer or force everyone to over-fetch.
GraphQL also wins when the data model is deeply relational. If fetching a page requires combining data from users, posts, comments, reactions, and notifications, a single GraphQL query can traverse those relationships in one round trip. The equivalent REST implementation would require either multiple sequential requests or a custom aggregate endpoint.
The Trade-Off in Systems Terms
From a systems thinking perspective, REST and GraphQL shift complexity to different parts of the system. REST puts complexity on the client (which must orchestrate multiple requests and handle over-fetched data). GraphQL puts complexity on the server (which must resolve arbitrary query shapes and protect against expensive queries). Neither eliminates complexity. They relocate it.
This is a recurring theme in system design: there is no free lunch. Every architectural choice is a transfer of burden from one component to another. The skill is in choosing which component is best equipped to handle that burden given your specific constraints.
Further Reading
- Roy Fielding, Chapter 5: Representational State Transfer (REST), from Architectural Styles and the Design of Network-based Software Architectures (2000). The original definition of REST.
- GraphQL Specification. The official, normative specification for the GraphQL query language and execution semantics.
- GraphQL: Learn. The official getting-started guide with interactive examples.
- REST API Tutorial. A comprehensive guide to REST constraints, best practices, and common patterns.
- Wikipedia: REST. Historical context, Fielding's constraints, and the evolution of REST in practice.
Assignment
You are building a mobile app for a social platform. The user profile screen shows:
- User name and avatar
- Bio (optional, only 40% of users have one)
- Follower count
- 5 most recent posts (title and timestamp only)
Answer these questions:
- If you use a REST API, how many requests would the client need to make? What data would be over-fetched?
- If you use GraphQL, write the query (or describe its structure) that would fetch this screen's data in a single request.
- Which protocol would reduce bandwidth usage for this mobile client, and why?
Consider: the mobile app is used on cellular networks where every kilobyte and every round trip counts. Your choice should reflect that constraint.
Beyond REST: Why Other Protocols Exist
Session 1.3 covered REST and GraphQL, both of which operate over HTTP/1.1 or HTTP/2 using text-based formats like JSON. They work well for many scenarios, but they carry overhead that becomes painful at scale or under specific constraints. Two protocols address these gaps directly: gRPC for high-performance service-to-service communication, and WebSocket for persistent, bidirectional real-time connections.
Understanding when to reach for each one is a core system design skill. The wrong protocol choice can introduce unnecessary latency, complexity, or resource consumption that compounds as the system grows.
gRPC: Binary, Typed, and Fast
gRPC is an open-source remote procedure call framework originally developed at Google. It uses HTTP/2 as its transport layer and Protocol Buffers (protobuf) as its serialization format. Both choices are deliberate.
Protocol Buffers are a language-neutral, platform-neutral mechanism for serializing structured data. You define your data schema in a .proto file, and the protobuf compiler generates strongly typed code in your target language. The binary encoding is significantly smaller and faster to parse than JSON.
HTTP/2 brings multiplexing (multiple requests over a single TCP connection), header compression, and server push. gRPC exploits all of these. A single connection between two services can carry hundreds of concurrent RPCs without the head-of-line blocking that plagues HTTP/1.1.
The Four gRPC Communication Patterns
gRPC supports four interaction modes, each suited to different scenarios:
| Pattern | Description | Use Case |
|---|---|---|
| Unary | Client sends one request, server sends one response | Standard API call (fetch user profile) |
| Server streaming | Client sends one request, server sends a stream of responses | Downloading large datasets, log tailing |
| Client streaming | Client sends a stream of messages, server sends one response | Uploading telemetry data in batches |
| Bidirectional streaming | Both sides send streams of messages independently | Real-time collaboration between services |
The strongly typed contract means both sides agree on the exact shape of every message at compile time. No runtime surprises from a missing field or a string where you expected an integer. This is a significant advantage in large systems where dozens of teams own different services.
WebSocket: Persistent and Bidirectional
WebSocket, standardized as RFC 6455 in 2011, solves a different problem. HTTP is inherently request-response: the client asks, the server answers, the connection is done. For applications that need the server to push data to the client without being asked, HTTP requires workarounds like long polling or server-sent events.
WebSocket replaces this with a persistent, full-duplex connection. The connection starts as a standard HTTP request with an Upgrade header. If the server agrees, the protocol switches from HTTP to WebSocket, and both sides can send messages at any time over the same TCP connection.
WebSocket provides full-duplex communication channels over a single TCP connection. After an initial HTTP handshake, the connection stays open. Either side can send messages at any time, with minimal framing overhead (as little as 2 bytes per frame).
The low per-message overhead makes WebSocket ideal for high-frequency, small-payload scenarios: chat messages, live price updates, multiplayer game state, collaborative editing cursors.
How They Differ in Practice
The gRPC call is request-response, even though the underlying HTTP/2 connection persists. Each RPC is a discrete unit with a defined start and end. WebSocket, by contrast, opens a channel that stays alive. Messages flow in both directions without the request-response framing.
Protocol Comparison Table
The following table compares gRPC, WebSocket, and REST across key dimensions relevant to system design decisions.
| Dimension | REST (HTTP/JSON) | gRPC | WebSocket |
|---|---|---|---|
| Transport | HTTP/1.1 or HTTP/2 | HTTP/2 (required) | TCP (after HTTP upgrade) |
| Data format | JSON (text) | Protocol Buffers (binary) | Any (text or binary frames) |
| Type safety | None (runtime validation) | Strong (compile-time from .proto) | None (application-defined) |
| Communication | Request-response | Unary + 3 streaming modes | Full-duplex, persistent |
| Browser support | Native | Requires gRPC-Web proxy | Native (all modern browsers) |
| Payload size | Larger (verbose JSON) | Smallest (binary encoding) | Depends on application |
| Best for | Public APIs, CRUD, web frontends | Internal service-to-service | Real-time client-server push |
| Tooling | Curl, Postman, any HTTP client | grpcurl, generated clients | Browser DevTools, wscat |
Where Each Protocol Fits
gRPC dominates internal service communication in large distributed systems. When Service A calls Service B 10,000 times per second, the difference between JSON parsing and protobuf deserialization is measurable. The strict contract prevents the subtle breaking changes that plague loosely typed JSON APIs across team boundaries. Companies like Google, Netflix, and Stripe use gRPC extensively for backend-to-backend traffic.
WebSocket dominates scenarios where the server needs to push data to clients without waiting for a request. Chat applications, live dashboards, multiplayer games, collaborative editors. Any feature where a user expects to see updates the moment they happen, without refreshing.
REST remains the default for public-facing APIs, CRUD operations, and any scenario where simplicity, cacheability, and broad client compatibility matter more than raw performance.
Systems Thinking Lens
Protocol choice is a leverage point in the system. Choosing gRPC for internal communication reduces serialization overhead across every service boundary. That reduction compounds: fewer CPU cycles per request means fewer instances needed, which means lower cost, which means more budget for features. The feedback loop runs through infrastructure cost, team velocity, and product capability.
Conversely, choosing WebSocket where simple polling would suffice introduces connection management complexity, memory overhead for open connections, and operational burden for connection state during deploys. The protocol that feels more "advanced" can make the overall system worse if the problem did not require it.
The right question is never "which protocol is best?" It is "what does this specific interaction need, and what are the second-order effects of this choice on the rest of the system?"
Further Reading
- Introduction to gRPC (grpc.io). Official overview of gRPC concepts, HTTP/2 transport, and Protocol Buffers integration.
- Core concepts, architecture and lifecycle (grpc.io). Detailed explanation of the four gRPC communication patterns and connection lifecycle.
- RFC 6455: The WebSocket Protocol (IETF). The formal specification for WebSocket, including the handshake, framing, and closing procedures.
- WebSocket API (MDN Web Docs). Practical guide to using WebSocket in browser-based applications.
- Protocol Buffers Overview (protobuf.dev). Official documentation for defining and using Protocol Buffer schemas.
Assignment
Match each use case below to the most appropriate protocol (REST, gRPC, or WebSocket). Write one or two sentences justifying each choice.
- Live chat application where users see messages instantly as they arrive.
- Microservice-to-microservice communication in a payment processing pipeline handling 50,000 transactions per second.
- Stock ticker dashboard displaying real-time price updates for 200 symbols.
- Mobile app API for a food delivery service where users browse restaurants, place orders, and track delivery.
For each answer, consider: What is the communication pattern? Who initiates data flow? How critical is latency? Does the client need a persistent connection, or is request-response sufficient?
Two Ways to Build Software
Every application starts as a single thing. One codebase, one deployment, one process. At some point, teams face a decision: keep it together or break it apart. This is the monolith-versus-microservices question, and it is one of the most consequential architectural decisions you will make.
The answer is rarely obvious, and the industry has swung between extremes. Understanding the real tradeoffs, not the marketing, is what matters.
The Monolith
A monolithic architecture deploys the entire application as a single unit. All modules, whether they handle user authentication, payment processing, notifications, or reporting, live in one codebase and run in one process (or a set of identical processes behind a load balancer).
Monolith: A software architecture where all components are packaged and deployed as a single unit. Function calls between modules happen in-process, not over the network.
Monoliths are not inherently bad. They have real structural advantages:
- Simple deployment. One artifact to build, test, and ship. No coordination between services.
- Easy debugging. A stack trace shows you the full call path. No distributed tracing needed.
- In-process communication. A function call takes nanoseconds. A network call takes milliseconds. That is a six-order-of-magnitude difference.
- Single database. Transactions are straightforward. ACID guarantees come free.
- Lower operational overhead. One thing to monitor, one thing to scale, one thing to secure.
The problems emerge as the team and codebase grow. A change to the payment module requires redeploying the entire application. A memory leak in the reporting module crashes the authentication module. A team of 80 engineers stepping on each other in the same repository slows everyone down.
Microservices
A microservice architecture decomposes the application into small, independently deployable services. Each service owns a specific business capability, runs in its own process, and communicates with other services over the network (typically HTTP/REST or gRPC).
Microservices: An architectural style where the application is composed of loosely coupled, independently deployable services, each responsible for a specific business capability and maintaining its own data store.
The advantages are real but come with costs:
- Independent deployment. Ship the payment service without touching authentication. Faster release cycles per team.
- Team autonomy. Each team owns a service end-to-end. They choose their own language, framework, and release schedule.
- Fault isolation. If the reporting service crashes, the payment service keeps running.
- Targeted scaling. Scale the service that needs it, not the entire application.
The costs:
- Network complexity. Every function call that was in-process is now a network call. Latency, retries, timeouts, circuit breakers.
- Data consistency. No cross-service transactions. You need sagas, eventual consistency, or careful domain boundaries.
- Operational overhead. Dozens or hundreds of services to deploy, monitor, log, and trace. You need CI/CD pipelines, container orchestration, service meshes, distributed tracing.
- Integration testing. Testing the interaction between services is harder than testing a monolith's internal calls.
Comparison Across Dimensions
| Dimension | Monolith | Microservices |
|---|---|---|
| Deployment | Single artifact, all-or-nothing | Independent per service |
| Codebase | One repository (usually) | Many repositories or monorepo |
| Communication | In-process function calls | Network calls (HTTP, gRPC, messaging) |
| Data management | Single shared database | Database per service |
| Scaling | Scale entire application | Scale individual services |
| Debugging | Single stack trace, single log | Distributed tracing across services |
| Team structure | Teams share one codebase | Teams own individual services |
| Fault isolation | One failure can crash everything | Failures contained to one service |
| Time to first deploy | Fast (minimal infrastructure) | Slow (needs orchestration, CI/CD) |
Conway's Law
In 1967, Melvin Conway observed that "any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." This observation, later named Conway's Law, explains why architecture decisions and team decisions are inseparable.
Conway's Law: The architecture of a system mirrors the communication structure of the organization that builds it. A company with four teams will produce a system with four major components, regardless of what the optimal architecture might be.
Conway's Law works in both directions. If you have a single team, a monolith is natural and effective. As Martin Fowler notes, "a dozen or two people can have deep and informal communications, so Conway's Law indicates they will create a monolith, and that is fine." If you have 15 teams and force them into one monolith, they will create implicit service boundaries anyway through code ownership conventions, module walls, and meeting schedules.
The Inverse Conway Maneuver deliberately structures teams to encourage the desired architecture. Want microservices? Create small, autonomous teams organized around business capabilities. Want a well-structured monolith? Keep the team small and communicating tightly.
When to Start With What
Martin Fowler's "Monolith First" argument is straightforward: almost all successful microservice stories started with a monolith that grew too big and was broken up. Almost all cases where systems were built as microservices from scratch ended up in serious trouble.
The reasoning is practical. Early in a product's life, you do not know where the real boundaries are. You do not know which features will matter, which will be thrown away, and where the load will concentrate. A monolith lets you discover these boundaries cheaply. Refactoring a function boundary inside a monolith is an afternoon of work. Redrawing a service boundary in a microservice system means migrating data, rewriting APIs, and coordinating multiple teams.
The signal that you might need to break apart is usually organizational, not technical. When the team is large enough that people are blocked by each other. When deployment frequency drops because too many changes are coupled. When a single module's scaling needs are dramatically different from the rest. These are structural pressures that microservices address.
The Modular Monolith: A Middle Path
A modular monolith maintains strict module boundaries inside a single deployment. Modules communicate through defined internal APIs, not by reaching into each other's database tables or internal classes. The code is structured as if it could be split into services, but it runs as one process.
This approach captures the deployment simplicity of a monolith while maintaining the clear boundaries that make a future migration to microservices feasible. Shopify famously operates a modular monolith that serves millions of merchants, demonstrating that you do not need microservices to reach enormous scale.
Further Reading
- Martin Fowler, "Monolith First" (martinfowler.com). The argument for starting with a monolith and extracting services only when the need is proven.
- Martin Fowler, "Microservice Trade-Offs" (martinfowler.com). A balanced analysis of what you gain and what you pay when adopting microservices.
- Martin Fowler, "Conway's Law" (martinfowler.com). How organizational structure shapes system architecture, and vice versa.
- Sam Newman, Building Microservices, 2nd Edition (O'Reilly, 2021). The standard reference for microservice architecture, including decomposition strategies and data management patterns.
- Melvin Conway, "How Do Committees Invent?" (1968). The original paper that introduced Conway's Law.
Assignment
This assignment has two parts.
Part 1: Write a 3-sentence argument FOR starting with a monolith for a new startup building a food delivery app. Address why a monolith is the right choice at this stage.
Part 2: Write a 3-sentence argument for WHEN to break the monolith apart. Identify the specific signals (team size, deployment pain, scaling bottlenecks) that would trigger the transition to microservices.
In both parts, reference Conway's Law. How does the team's current structure support your recommendation?
Patterns as Reusable Decisions
An architectural pattern is a proven structural arrangement for organizing a software system. It is not a library or a framework. It is a set of decisions about how components are arranged, how they communicate, and where responsibilities live. Choosing a pattern means choosing which tradeoffs you accept.
This session surveys five major patterns. Each one solves a different class of problem. Most real production systems combine two or more of them.
Multi-Tier Architecture
Multi-tier (often called n-tier) separates the system into horizontal layers, each handling a distinct concern. The classic version has three tiers: presentation, business logic, and data. Each tier communicates only with its immediate neighbor.
Multi-tier architecture organizes a system into horizontal layers of responsibility. Each layer provides services to the layer above it and consumes services from the layer below it. Communication between non-adjacent layers is prohibited.
This pattern works because separation of concerns reduces cognitive load. Frontend developers work in the presentation tier without touching database queries. Backend developers implement business rules without worrying about HTML rendering. Database administrators optimize storage without breaking application logic.
The limitation is rigidity. Every request must traverse all tiers, even when only one tier does meaningful work. Adding a new feature often requires changes in every layer. And horizontal scaling is uneven: the data tier typically becomes the bottleneck while the presentation tier sits idle.
Event-Driven Architecture
Event-driven architecture (EDA) structures the system around the production, detection, and reaction to events. Instead of components calling each other directly, they emit events. Other components subscribe to the events they care about and react independently.
Event-driven architecture decouples producers from consumers through asynchronous events. A producer publishes an event to a broker or bus without knowing which consumers will process it. Consumers subscribe to event types and react independently.
In this example, the order service publishes an "Order Placed" event. Four services react to it independently. The order service does not know or care how many consumers exist. Adding a fifth consumer (say, a fraud detection service) requires zero changes to the producer.
EDA comes in two topologies, as described in Mark Richards' Software Architecture Patterns:
- Broker topology: Events flow through a lightweight message broker. No central orchestrator. Each consumer decides what to do with each event. Good for simple, decoupled flows.
- Mediator topology: A central mediator receives events and orchestrates a sequence of steps across consumers. Good for complex workflows that require ordering and error handling.
The strength of EDA is loose coupling and extensibility. The weakness is that the overall flow of the system becomes harder to trace. When something goes wrong, you cannot follow a single stack trace. You follow events through a broker, across services, through time.
Microservices Architecture
Session 1.5 covered this in detail. As an architectural pattern, microservices decompose the system vertically by business capability rather than horizontally by technical layer. Each service owns its own data, logic, and interface for a specific domain.
The pattern's defining characteristic is independent deployability. If a service can only be deployed alongside other services, it is not a microservice. It is a distributed monolith, which inherits the costs of both architectures and the benefits of neither.
Microservices are often combined with event-driven architecture. Services communicate through events rather than direct API calls, which further reduces coupling and improves resilience.
Cell-Based Architecture
Cell-based architecture, documented in detail by WSO2's reference architecture, organizes the system into self-contained cells. Each cell is a complete unit that includes its own services, data stores, and communication gateway. Cells interact with each other through well-defined, versioned APIs at the cell boundary.
Cell-based architecture groups related services into autonomous cells, each with its own gateway, data stores, and internal communication. Cells are the unit of deployment, scaling, and failure isolation. Think of each cell as a small, self-sufficient system.
The cell boundary acts as a blast radius. If everything inside Cell A fails, Cell B is unaffected because it communicates only through the cell gateway, which can implement circuit breakers, retries, and fallbacks.
This pattern is particularly useful for very large systems where even microservices become hard to manage. Instead of managing 300 individual services, you manage 20 cells, each containing 10-20 services. The cell provides an intermediate level of organization between individual services and the overall system.
Serverless Architecture
Serverless architecture delegates all infrastructure management to a cloud provider. You write functions that execute in response to events (HTTP requests, message queue entries, file uploads, timers). The provider handles provisioning, scaling, and deprovisioning.
Serverless architecture runs application code in ephemeral, event-triggered functions managed entirely by a cloud provider. There are no servers to provision, patch, or scale. You pay only for the compute time consumed during execution.
Serverless excels at variable, unpredictable workloads. An image processing function that runs 10 times on Monday and 10,000 times on Friday costs proportionally. There is no idle capacity to pay for. The operational burden approaches zero for small to medium workloads.
The constraints are real. Functions have execution time limits (15 minutes on AWS Lambda). Cold starts add latency when a function has not been invoked recently. State must be stored externally. Vendor lock-in is significant because your function code is deeply coupled to the provider's event model, IAM system, and supporting services.
Pattern Comparison
| Pattern | Strengths | Weaknesses | Best Fit |
|---|---|---|---|
| Multi-tier | Clear separation, well understood, easy to staff | Rigid layering, uneven scaling, all-tier changes | Traditional web apps, enterprise CRUD systems |
| Event-driven | Loose coupling, extensible, async by default | Hard to trace, eventual consistency, debugging complexity | Systems with many independent reactions to the same trigger |
| Microservices | Independent deploy, team autonomy, targeted scaling | Network overhead, data consistency, operational burden | Large teams, complex domains, varying scale per capability |
| Cell-based | Blast radius control, organizational grouping, versioned boundaries | Complex to design initially, over-engineering risk for small systems | Very large systems with hundreds of services needing organizational structure |
| Serverless | Zero ops, pay-per-use, automatic scaling | Cold starts, execution limits, vendor lock-in, stateless | Variable workloads, event processing, glue logic, prototypes |
Combining Patterns
Production systems rarely use a single pattern in isolation. A typical e-commerce platform might use multi-tier for its web frontend, microservices for its backend capabilities, event-driven communication between those services, and serverless for image resizing and email sending. The patterns are not competing alternatives. They are complementary tools applied at different levels of the system.
The question is not "which pattern should we use?" It is "which pattern fits this specific part of the system, given our team size, operational maturity, and the constraints of the problem?"
Systems Thinking Lens
Each pattern creates a different feedback structure. Multi-tier creates long feedback loops (changes ripple through all layers). Event-driven creates many short, independent loops (each consumer reacts on its own). Microservices create team-level loops (each team iterates independently). Cell-based creates nested loops (within cells and between cells).
The pattern you choose determines where delays accumulate, where failures propagate, and where teams can move independently. These are structural decisions with compounding effects over months and years. A pattern that feels efficient today can create organizational gridlock in two years if the feedback loops it creates do not match how your teams actually work.
Further Reading
- Mark Richards, Software Architecture Patterns, Chapter 2: Event-Driven Architecture (O'Reilly). Clear explanation of broker and mediator topologies with practical examples.
- WSO2, Cell-Based Reference Architecture (GitHub). The foundational document describing cell-based architecture, its principles, and implementation guidance.
- Microsoft, Event-Driven Architecture Style (Azure Architecture Center). Practical guidance on implementing event-driven systems with cloud services.
- AWS, Serverless Architectures with AWS Lambda (AWS Whitepaper). Comprehensive guide to serverless patterns, anti-patterns, and operational considerations.
- Martin Fowler, "Microservice Trade-Offs" (martinfowler.com). Balanced discussion of what microservices cost versus what they provide.
Assignment
Think about an application you work on or use frequently. It could be your company's product, an open-source project, or a well-known service like Grab, Gojek, or Spotify.
- Draw a box diagram of the major components. Keep it high-level: 5-10 boxes representing services, databases, queues, and external systems. Draw arrows showing how they communicate.
- Identify the primary pattern. Does it follow multi-tier, event-driven, microservices, cell-based, serverless, or a combination? What evidence in the diagram supports your answer?
- Find the pattern boundary. Most systems combine patterns. Identify at least one place where the system switches from one pattern to another (for example, request-response for the API layer but event-driven for background processing).
If you are not sure about the internals, make reasonable assumptions based on what you observe as a user. Where do you see real-time updates (event-driven)? Where do you see request-response behavior (multi-tier or microservices)? Where do you see background processing that does not block the user (serverless or event-driven)?
What Is Three-Tier Architecture?
Three-tier architecture is the most widely deployed pattern for web applications. It divides an application into three logical layers, each with a distinct responsibility: presenting information to the user, processing business logic, and storing data. Each tier can be developed, deployed, and scaled independently.
This separation is not just organizational convenience. It enforces boundaries that prevent the presentation layer from directly querying the database, or the data layer from handling user interface concerns. When those boundaries are respected, teams can work on each tier without stepping on each other, and failures in one tier do not automatically cascade into the others.
Three-tier architecture separates an application into three layers: Presentation (user interface), Application (business logic), and Data (persistent storage). Each tier communicates only with its adjacent tier.
The Three Tiers
Presentation Tier
The presentation tier is what the user sees and interacts with. In a web application, this is the HTML, CSS, and JavaScript that runs in the browser. In a mobile app, it is the native UI. This tier collects user input, sends requests to the application tier, and renders responses.
The presentation tier should contain zero business logic. It does not validate whether a user has sufficient funds for a purchase. It does not calculate shipping costs. It renders what the application tier tells it to render. When presentation code starts making business decisions, you get logic scattered across tiers, which makes bugs harder to trace and changes harder to deploy.
Application Tier
The application tier (also called the logic tier or middle tier) is where the actual work happens. It receives requests from the presentation tier, applies business rules, orchestrates data operations, and returns results. Authentication checks, order processing, inventory validation, and pricing calculations all belong here.
This tier is typically the most complex and the most frequently changed. New features, rule changes, and integrations with external services all happen in this layer. Because it sits between the other two tiers, it also serves as a translation layer: converting user-facing requests into database queries, and database results into user-facing responses.
Data Tier
The data tier is responsible for persistent storage and retrieval. Relational databases, NoSQL stores, object storage, and caching layers all live here. The data tier receives structured queries from the application tier and returns results. It does not know what the data means in a business context. It stores rows and returns rows.
A well-designed data tier handles its own concerns: indexing, replication, backup, and query optimization. The application tier should not need to know whether the database is running on a single node or a cluster of replicas. That abstraction is the data tier's job.
Tier Responsibilities, Scaling, and AWS Services
| Tier | Responsibility | Scaling Strategy | AWS Services |
|---|---|---|---|
| Presentation | Render UI, collect user input, display responses | CDN distribution, edge caching, static asset hosting | Amazon CloudFront, S3 (static hosting), Amplify |
| Application | Business logic, authentication, request orchestration | Horizontal scaling behind load balancer, auto-scaling groups | EC2 + ALB, Elastic Beanstalk, ECS, EKS |
| Data | Persistent storage, queries, caching | Read replicas, sharding, caching layer in front of DB | RDS, DynamoDB, ElastiCache, Aurora |
Classic AWS Implementation
A standard three-tier deployment on AWS looks like this: CloudFront serves static assets from S3 for the presentation tier. An Application Load Balancer distributes traffic across EC2 instances (or ECS containers) running the application tier. Amazon RDS or DynamoDB handles the data tier, with ElastiCache for frequently accessed data.
Each tier lives in its own subnet within a VPC. The presentation tier sits in public subnets. The application tier runs in private subnets, accessible only through the load balancer. The data tier runs in isolated private subnets, accessible only from the application tier. This network segmentation limits blast radius when something goes wrong.
Serverless Variant
The serverless variant replaces managed servers with fully managed services. Amazon API Gateway replaces the load balancer and handles request routing. AWS Lambda replaces EC2 instances for compute. DynamoDB or Aurora Serverless replaces traditional database instances.
The advantage is operational: no servers to patch, no capacity to pre-provision, and costs that scale to zero when there is no traffic. The disadvantage is cold start latency on Lambda functions and the constraints of execution time limits (15 minutes maximum per invocation). For request-response workloads with variable traffic, the serverless variant is often the most cost-effective option.
Kubernetes Variant
Organizations that need container orchestration, portability across cloud providers, or fine-grained control over their runtime environment often choose Amazon EKS (Elastic Kubernetes Service) for the application tier. Each tier runs as a set of Kubernetes pods, scaled by the Horizontal Pod Autoscaler based on CPU, memory, or custom metrics.
The Kubernetes variant adds operational complexity. You manage node groups, pod scheduling, service meshes, and Kubernetes upgrades. In return, you get portability (the same manifests can run on GKE or AKS), a rich ecosystem of observability tools, and the ability to run sidecar containers for logging, tracing, or security proxies alongside your application code.
When Three-Tier Breaks Down
Three-tier architecture works well for the majority of web applications. It breaks down when the application tier becomes a monolithic bottleneck that every request must pass through. If your application has fundamentally different workloads (real-time chat, batch processing, and CRUD operations), forcing them all through a single application tier creates coupling and scaling problems.
At that point, you start splitting the application tier into separate services, which is the beginning of microservices. But start with three-tier. It is simpler, easier to reason about, and sufficient for most applications until they reach significant scale.
Further Reading
- AWS, Three-Tier Architecture Overview. Official AWS whitepaper on the three-tier pattern and its serverless evolution.
- AWS, Serverless Multi-Tier Architectures with API Gateway and Lambda. Detailed reference architecture for serverless three-tier systems.
- Multitier Architecture, Wikipedia. General overview of N-tier architecture with history and variations.
- Aalok Trivedi, "Building a 3-Tier Web Application Architecture with AWS". Practical walkthrough with VPC, subnets, and security group configuration.
Assignment
Design a three-tier architecture for an online bookstore. The store allows users to browse a catalog, search for books, add items to a cart, and check out.
- Draw three boxes (Presentation, Application, Data) and label what runs in each tier. Be specific: name the technologies or AWS services you would use.
- Write one scaling strategy per tier. For example: how does the presentation tier handle a traffic spike during a book launch? How does the data tier handle a growing catalog?
- Identify one risk in your design. What happens if the application tier goes down? What is the user experience?
Why Serialization Matters
Every time data moves between systems, it must be converted from an in-memory structure to a byte stream (serialization) and back again (deserialization). This happens on every API call, every database write, every message pushed to a queue. The format you choose for that conversion affects payload size, parsing speed, debugging ease, and cross-language compatibility.
Choosing a serialization format is not a cosmetic decision. At high throughput, the difference between a verbose text format and a compact binary format translates directly into bandwidth costs, latency, and CPU utilization.
Serialization Formats Compared
JSON (JavaScript Object Notation)
JSON is the default format for web APIs. It is human-readable, natively supported in every major programming language, and requires no schema definition to use. You can inspect a JSON payload in a browser, log it as a string, and debug it with your eyes. That convenience comes at a cost: JSON is verbose. Field names are repeated in every object, numbers are stored as text, and there is no built-in support for binary data.
XML (Extensible Markup Language)
XML dominated the enterprise integration era (SOAP, WSDL, XSLT). It supports namespaces, attributes, and complex nested structures. It also carries significant overhead: closing tags, verbose syntax, and larger payloads than JSON for equivalent data. XML is still used in legacy systems, document formats (DOCX, SVG), and configuration files (Maven POM, Android manifests). For new API design, it has been largely replaced by JSON.
Protocol Buffers (Protobuf)
Developed by Google, Protobuf is a binary serialization format that requires a schema definition (a .proto file). Fields are identified by numeric tags rather than string names, which makes payloads compact. Protobuf is 3 to 7 times faster than JSON for serialization and deserialization, and payloads are typically 30 to 50 percent smaller. The tradeoff: you cannot read a Protobuf message without the schema, and both client and server must agree on the schema at compile time.
Apache Avro
Avro is a binary format developed within the Apache Hadoop ecosystem. Like Protobuf, it uses a schema, but the schema is included with the data (or stored in a schema registry). This makes Avro particularly strong for data pipelines where producers and consumers may not share a codebase. Avro files are often the most compact, but serialization and deserialization can use more memory than Protobuf.
Format Comparison
| Feature | JSON | XML | Protobuf | Avro |
|---|---|---|---|---|
| Encoding | Text | Text | Binary | Binary |
| Human-readable | Yes | Yes | No | No |
| Schema required | No (optional via JSON Schema) | No (optional via XSD) | Yes (.proto file) | Yes (JSON schema) |
| Payload size | Large | Largest | Small | Smallest |
| Serialization speed | Moderate | Slow | Fast | Fast |
| Schema evolution | Manual, fragile | Supported via XSD versioning | Good (field numbers are stable) | Excellent (schema registry) |
| Best use case | Public APIs, web frontends | Legacy systems, document formats | Internal microservices (gRPC) | Data pipelines, Kafka, Hadoop |
Web Sessions: Maintaining State in a Stateless Protocol
HTTP is stateless. Every request is independent. The server does not inherently remember who you are between requests. Session management is the set of techniques used to associate a sequence of requests with a single user. There are three dominant approaches.
Cookie-Based Server Sessions
The server generates a random session ID, stores session data (user ID, permissions, cart contents) in server-side storage (memory, Redis, a database), and sends the session ID to the client as an HTTP cookie. On every subsequent request, the browser automatically includes the cookie. The server looks up the session ID and retrieves the associated data.
This approach is simple and secure when implemented correctly. The cookie itself contains no sensitive data, just an opaque identifier. The server controls the session lifecycle: it can invalidate a session immediately by deleting the server-side record. The limitation is that session storage must be shared across all application servers, which requires sticky sessions or a centralized store like Redis.
JWT (JSON Web Token)
With JWT, the server encodes the user's identity and claims into a signed token and sends it to the client. The client stores the token (typically in localStorage or an HTTP-only cookie) and includes it in the Authorization header on each request. The server verifies the token's signature without querying any external store.
JWTs are stateless: no server-side storage is needed. This makes horizontal scaling straightforward because any server instance can verify the token. The tradeoff is that you cannot revoke a JWT before it expires without maintaining a blocklist, which reintroduces server-side state. JWTs also tend to be larger than session cookies (a typical JWT is 800 to 2000 bytes).
Server-Side Sessions with External Store
A hybrid approach uses cookies to carry a session ID while storing session data in a fast external store like Redis or Memcached. This combines the security of server-side control (immediate revocation) with the scalability of a shared store. Most production web frameworks (Express with connect-redis, Django with Redis backend, Spring Session) support this pattern out of the box.
Session Strategy Comparison
| Feature | Cookie + Server Session | JWT | Cookie + External Store (Redis) |
|---|---|---|---|
| State location | Server memory or DB | Client (token) | External cache (Redis) |
| Scalability | Requires sticky sessions or shared store | Stateless, any server can verify | Scales with Redis cluster |
| Revocation | Immediate (delete session) | Difficult (wait for expiry or maintain blocklist) | Immediate (delete from Redis) |
| Payload size | Small (session ID only) | Large (800+ bytes) | Small (session ID only) |
| Cross-domain | Limited by cookie scope | Works across domains via headers | Limited by cookie scope |
| Best for | Simple single-server apps | APIs, microservices, mobile clients | Production web apps at scale |
Choosing the Right Combination
The serialization format and session strategy are independent choices, but they interact. An API that uses Protobuf for internal service communication might still use JSON for its public-facing endpoints and JWTs for authentication. A data pipeline might use Avro for Kafka messages while the web frontend that triggers the pipeline uses cookie-based sessions and JSON.
The principle is the same in both cases: match the tool to the constraint. Human-readable formats for debugging and external APIs. Binary formats for throughput-sensitive internal paths. Server-side sessions when you need revocation. JWTs when you need stateless verification across services.
Further Reading
- RFC 7519: JSON Web Token (JWT). The official specification for JWT structure, claims, and validation.
- Protocol Buffers Documentation. Google's official Protobuf developer guide with language tutorials and best practices.
- Apache Avro Documentation. Specification and getting-started guide for Avro serialization.
- Okta, "A Comparison of Cookies and Tokens for Secure Authentication". Practical comparison of cookie and token-based session strategies.
- Stytch, "JWTs vs. Sessions: Which Is Right for You?". Clear breakdown of when to use each session approach.
Assignment
An API serves 10,000 requests per second. The average JSON response payload is 2 KB. Switching from JSON to Protobuf would reduce payload size by 40%.
- Calculate the bandwidth saved per hour after switching to Protobuf. Show your work.
- If bandwidth costs $0.09 per GB (AWS data transfer pricing), how much would you save per month (30 days)?
- What non-bandwidth costs would you incur to make this switch? Think about developer time, tooling, debugging difficulty, and client compatibility.
Hint: 10,000 req/s * 2 KB * 0.40 savings = bandwidth saved per second. Convert to GB/hour.
The Language of Distributed Systems
Before you can design systems that scale, you need a shared vocabulary. These terms appear in every system design discussion, every architecture review, and every post-mortem. They are not abstract academic concepts. Each one describes a concrete property that either exists in your system or does not.
This session defines the core terms, gives you a practical example for each, and introduces the CAP theorem, which formalizes the tradeoffs between three of them.
Core Terms
| Term | Definition | One-Line Example |
|---|---|---|
| Scalability | The ability of a system to handle increased load by adding resources | Adding more web servers behind a load balancer to serve more users |
| Availability | The proportion of time a system is operational and accessible | 99.99% availability means less than 53 minutes of downtime per year |
| Consistency | All nodes in a distributed system return the same data at the same time | After updating your profile photo, every server shows the new photo immediately |
| Fault Tolerance | The ability to continue operating correctly when components fail | A database cluster continues serving reads when one replica crashes |
| SPOF (Single Point of Failure) | A component whose failure brings down the entire system | A single database server with no replicas: if it dies, everything stops |
| Partition Tolerance | The system continues to operate despite network splits between nodes | Two data centers lose connectivity but both keep serving requests |
Scalability: Vertical vs. Horizontal
Vertical scaling (scaling up) means adding more resources to a single machine: more CPU, more RAM, faster disks. Horizontal scaling (scaling out) means adding more machines to distribute the load.
Vertical scaling is simpler. You upgrade the server and your application code does not change. But every machine has a ceiling. You cannot add infinite RAM. You cannot buy a CPU with 10,000 cores. And while you are upgrading, the machine is typically offline.
Horizontal scaling has no theoretical ceiling, but it introduces complexity. Your application must handle multiple instances, shared state, load distribution, and network communication between nodes. Most production systems use a combination: scale vertically until it becomes cost-ineffective, then scale horizontally.
Availability: Measuring Uptime
Availability is expressed as a percentage, commonly referred to by the number of nines:
| Availability | Downtime per Year | Downtime per Month |
|---|---|---|
| 99% (two nines) | 3.65 days | 7.3 hours |
| 99.9% (three nines) | 8.76 hours | 43.8 minutes |
| 99.99% (four nines) | 52.6 minutes | 4.38 minutes |
| 99.999% (five nines) | 5.26 minutes | 26.3 seconds |
Each additional nine is exponentially harder and more expensive to achieve. Moving from 99.9% to 99.99% often requires redundant infrastructure across multiple availability zones, automated failover, and rigorous testing of failure scenarios. Most consumer web applications target three or four nines. Financial trading systems and emergency services aim for five.
Consistency: Strong vs. Eventual
Strong consistency guarantees that after a write completes, every subsequent read returns the updated value. If you transfer $100 from account A to account B, strong consistency means no reader will ever see the money in both accounts or neither account. The system behaves as if there is a single copy of the data.
Eventual consistency allows replicas to diverge temporarily. After a write, some replicas may return stale data for a period of time. Eventually, all replicas converge to the same value. The "eventually" part can range from milliseconds to seconds, depending on the system.
Strong consistency is easier to reason about but harder to scale. It often requires coordination between nodes (locks, consensus protocols), which adds latency. Eventual consistency scales better because replicas can operate independently, but your application logic must handle stale reads gracefully.
Social media feeds use eventual consistency. If you post a photo and your friend sees it two seconds later, nobody notices. Banking transactions use strong consistency. If the balance is wrong even briefly, the consequences are severe.
Fault Tolerance and Single Points of Failure
A fault-tolerant system is designed to continue operating when things break. Hardware fails. Networks partition. Disks corrupt. Software crashes. The question is not whether failures happen but what the system does when they happen.
The first step in designing for fault tolerance is identifying every Single Point of Failure (SPOF). A SPOF is any component that, if it fails, takes the entire system down. Common SPOFs include:
- A single database server with no replicas
- A single load balancer with no failover
- A single DNS provider
- An application that depends on one external API with no fallback
The remedy for a SPOF is redundancy: run multiple instances, in multiple locations, with automatic failover. This does not eliminate failure. It reduces the probability that a single failure becomes a system-wide outage.
The CAP Theorem
The CAP theorem, proposed by Eric Brewer in 2000 and formally proven by Seth Gilbert and Nancy Lynch in 2002, states that a distributed data store can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance.
All nodes see same data] A[Availability
Every request gets a response] P[Partition Tolerance
System works despite network splits] CAP --- C CAP --- A CAP --- P CP[CP Systems
MongoDB, HBase, Redis Cluster] AP[AP Systems
Cassandra, DynamoDB, CouchDB] CA[CA Systems
Single-node RDBMS
Not viable in distributed systems] C --- CP C --- CA A --- AP A --- CA P --- CP P --- AP
In practice, partition tolerance is not optional. Networks fail. Packets get lost. Data centers lose connectivity. Any distributed system must tolerate partitions. The real choice is between consistency and availability during a partition:
- CP systems (Consistency + Partition Tolerance): When a network partition occurs, the system refuses to serve requests that might return stale data. It sacrifices availability to maintain consistency. Example: MongoDB in its default configuration will reject writes to a minority partition.
- AP systems (Availability + Partition Tolerance): When a partition occurs, the system continues serving requests, but different nodes may return different data. It sacrifices consistency to remain available. Example: Cassandra continues accepting writes on both sides of a partition and reconciles later.
- CA systems (Consistency + Availability): This combination requires no partitions, which means a single node or a network that never fails. It does not exist in real distributed systems. A single PostgreSQL server is "CA" only because it is not distributed.
The CAP theorem does not say you must permanently give up consistency or availability. It says that during a network partition, you must choose which one to sacrifice. When the network is healthy, you can have both.
Putting It Together
These terms are not independent. They interact. A system that prioritizes strong consistency may sacrifice availability during partitions. A system designed for high availability may accept eventual consistency. Eliminating SPOFs improves fault tolerance, which improves availability. Horizontal scaling enables higher throughput but makes strong consistency harder to achieve.
Understanding these terms and their tradeoffs is the foundation of every design decision you will make in the rest of this course. When someone says "we need 99.99% availability," you should immediately think about what that costs in terms of consistency, complexity, and infrastructure.
Further Reading
- Eric Brewer, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" (Gilbert & Lynch, 2002). The formal proof of the CAP theorem.
- CAP Theorem, Wikipedia. Comprehensive overview with history, formal definition, and system classification.
- IBM, "What Is the CAP Theorem?". Accessible explanation with real-world database examples.
- Martin Kleppmann, "Please Stop Calling Databases CP or AP" (2015). An important critique of oversimplified CAP classifications.
Assignment
Without looking at the session content, define each of the following terms in your own words. One or two sentences each.
- Scalability
- Availability
- Consistency
- Fault Tolerance
- Single Point of Failure
- Partition Tolerance
After writing your definitions, compare them to the table at the top of this session. Where did your understanding differ? Which term was hardest to define precisely?
Bonus: Pick a service you use daily (Gmail, Spotify, Grab). Based on its behavior, would you classify it as a CP or AP system? What evidence supports your classification?
Origin and Purpose
The Twelve-Factor App methodology was written by Adam Wiggins, co-founder of Heroku, and published in 2011. It emerged from observing hundreds of applications deployed on Heroku's platform and distilling the patterns that separated applications that scaled cleanly from those that broke under pressure.
The methodology is not tied to any language, framework, or cloud provider. It describes twelve principles for building software-as-a-service applications that are portable, resilient, and deployable on modern cloud platforms. Fifteen years later, these principles remain the baseline for cloud-native application design.
The original document lives at 12factor.net and is worth reading in full. This session summarizes each factor, explains why it matters, and identifies the most common way teams violate it.
The Twelve Factors
| # | Factor | Principle | Common Violation |
|---|---|---|---|
| 1 | Codebase | One codebase tracked in version control, many deploys | Separate repos for staging and production with copy-pasted code |
| 2 | Dependencies | Explicitly declare and isolate dependencies | Relying on system-level packages that are not in the dependency manifest |
| 3 | Config | Store configuration in the environment | Hardcoding database URLs, API keys, or feature flags in source code |
| 4 | Backing Services | Treat backing services as attached resources | Assuming the database is on localhost and will always be there |
| 5 | Build, Release, Run | Strictly separate build and run stages | SSHing into production to edit code or apply patches directly |
| 6 | Processes | Execute the app as one or more stateless processes | Storing user sessions in local memory instead of an external store |
| 7 | Port Binding | Export services via port binding | Requiring an external web server (Apache, IIS) to be pre-installed |
| 8 | Concurrency | Scale out via the process model | Running everything in a single monolithic process with threads only |
| 9 | Disposability | Maximize robustness with fast startup and graceful shutdown | Processes that take minutes to start or lose in-flight work on shutdown |
| 10 | Dev/Prod Parity | Keep development, staging, and production as similar as possible | Using SQLite in development but PostgreSQL in production |
| 11 | Logs | Treat logs as event streams | Writing logs to local files on disk instead of stdout |
| 12 | Admin Processes | Run admin/management tasks as one-off processes | Running database migrations by manually connecting to production |
Deep Dive: The Factors That Trip People Up
Factor 3: Config
Configuration is everything that varies between deploys: database credentials, API keys, feature flags, third-party service URLs. The twelve-factor app stores these in environment variables, not in code.
This sounds obvious, but violations are everywhere. A config.py file with DATABASE_URL = "postgres://prod-server:5432/mydb" committed to the repo. A .env file checked into version control. An application that reads from a YAML file baked into the Docker image.
The test is simple: could you open-source the codebase right now without exposing any credentials or environment-specific values? If not, your config is not properly externalized.
Factor 6: Processes
Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a backing service (database, cache, object store). This means no sticky sessions, no in-memory caches that cannot be lost, and no local file storage that other processes need to read.
This factor is what makes horizontal scaling possible. If each process is stateless, you can add or remove instances at will. A load balancer can send any request to any instance. If an instance crashes, no data is lost because there was no data on that instance to begin with.
The most common violation is storing session data in process memory. It works fine with a single server. The moment you add a second server behind a load balancer, users lose their sessions when requests hit a different instance.
Factor 9: Disposability
Processes should start fast and shut down gracefully. Fast startup means new instances can be spun up quickly in response to load. Graceful shutdown means the process finishes in-flight requests, releases resources, and exits cleanly when it receives a SIGTERM signal.
This factor matters because cloud platforms routinely start and stop instances. Auto-scaling groups add and remove instances based on load. Kubernetes reschedules pods across nodes. Spot instances can be terminated with 30 seconds notice. If your process takes five minutes to start or drops connections on shutdown, these operations cause user-facing errors.
Factor 10: Dev/Prod Parity
The gap between development and production environments should be as small as possible. This means the same operating system, the same database engine (not just the same type), the same message queue, and the same cache. Docker and containerization have made this dramatically easier. You define your stack once in a Dockerfile and docker-compose.yml, and every developer runs the same environment.
The classic violation is using in-memory substitutes during development. H2 instead of PostgreSQL. A local directory instead of S3. A synchronous function call instead of a message queue. These substitutions hide bugs that only appear in production, where the real services behave differently.
How the Factors Connect
The twelve factors are not independent. They reinforce each other. Stateless processes (Factor 6) only work if config is externalized (Factor 3) and backing services are treated as attachable resources (Factor 4). Fast startup (Factor 9) requires that dependencies are explicitly declared and isolated (Factor 2) so that the environment can be set up predictably. Dev/prod parity (Factor 10) is easier when config lives in environment variables (Factor 3) rather than in environment-specific files.
When teams violate one factor, the violations tend to cascade. If config is hardcoded, dev/prod parity breaks. If processes are stateful, horizontal scaling fails. If the build and run stages are not separated, you end up patching production directly, which violates disposability because you cannot recreate the environment from scratch.
Twelve-Factor in 2026
The original methodology was written before Docker (2013), Kubernetes (2014), and the serverless movement (2015+). Some factors, like port binding, feel obvious now because modern frameworks default to self-contained HTTP servers. Others, like treating logs as event streams, are baked into platform expectations (CloudWatch, Datadog, and ELK all assume log streams, not log files).
The methodology was open-sourced to evolve with the community. New considerations, such as health check endpoints, circuit breakers, and observability, extend the original twelve factors but do not replace them. The foundation remains solid.
Further Reading
- Adam Wiggins, The Twelve-Factor App. The original reference. Read each factor page; they are short and precise.
- Twelve-Factor App Methodology, Wikipedia. Background, history, and adoption context.
- IBM Developer, "Creating Cloud-Native Applications: 12-Factor Applications". Practical application of all twelve factors in a Java context.
- Pradeep Loganathan, "12 Factor App: The Complete Guide to Building Cloud-Native Applications". Comprehensive walkthrough with modern examples.
Assignment
Pick any application you have built or worked on. It can be a side project, a work codebase, or even a tutorial project. Score it on each of the twelve factors using this scale:
- 0 = Factor is violated (e.g., config is hardcoded, logs go to local files)
- 1 = Partially followed (e.g., most config is externalized but some secrets are in code)
- 2 = Fully followed
Create a table with columns: Factor, Score (0-2), Evidence (one sentence explaining your score).
- What is your total score out of 24?
- Which factor has the lowest score? What would it take to fix it?
- Which factor was the hardest to evaluate? Why?
Most applications score between 10 and 16 on their first assessment. A perfect 24 is rare. The point is not to achieve perfection but to identify where the gaps are and what risks they create.