Module 1: Architectural Foundations & Core Concepts (Printable)

The Foundation of Networked Computing

Nearly every application you use today follows the same structural pattern: one process asks for something, another process provides it. The asker is the client. The provider is the server. This separation of roles is the client-server model, and it has shaped how we build software since the 1960s.

Your browser is a client. When you type a URL and press Enter, it sends a request to a server. The server processes that request, retrieves or computes the appropriate data, and sends back a response. That exchange, from request to response, is the fundamental unit of interaction in networked systems.

Client-Server Model: An architecture in which a client initiates requests and a server fulfills them. The client and server are separate processes, often running on separate machines, communicating over a network using a defined protocol.

The Request-Response Cycle

Every HTTP interaction follows a predictable sequence. The client opens a connection to the server, sends a request containing a method (GET, POST, PUT, DELETE, etc.), a target resource (the URL path), headers (metadata about the request), and optionally a body (the data payload). The server reads the request, performs the work, and sends back a response containing a status code (200 OK, 404 Not Found, 500 Internal Server Error, etc.), headers, and optionally a body.

sequenceDiagram participant C as Client (Browser) participant S as Server C->>S: HTTP Request (GET /api/users) Note right of S: Server processes request Note right of S: Queries database Note right of S: Builds response S-->>C: HTTP Response (200 OK + JSON body) Note left of C: Client renders data

This cycle repeats for every interaction. Loading a single web page might trigger dozens of request-response cycles: one for the HTML document, several for CSS and JavaScript files, more for images, and additional ones for API calls that fetch dynamic data.

The key property of HTTP, as defined in RFC 7230, is that each request-response pair is an independent transaction. The server does not inherently remember anything about previous requests from the same client. This is what we mean by statelessness.

Stateless vs. Stateful Interactions

HTTP is a stateless protocol by design. Each request carries all the information the server needs to fulfill it. The server does not retain any memory of previous interactions between requests. This simplifies server implementation enormously: any server in a cluster can handle any request, because no request depends on what happened before.

But users expect continuity. When you log in to an application, you expect to stay logged in as you navigate between pages. When you add items to a shopping cart, you expect them to still be there when you visit the checkout page. These expectations require state, meaning data that persists across multiple request-response cycles.

This creates a tension. The protocol is stateless, but the application needs to be stateful. Resolving this tension is one of the core challenges in web application architecture.

Dimension	Stateless	Stateful
Server memory	No request context retained between calls	Server tracks client context across requests
Scalability	Easy. Any server can handle any request.	Harder. Client must reach the same server, or state must be shared.
Reliability	Server crash loses nothing. Client retries freely.	Server crash may lose session data.
Complexity	Simpler server logic	Requires session storage, replication, or sticky routing
Example	DNS lookup, static file serving, REST API call	Shopping cart, user login session, WebSocket connection
Bandwidth	May be higher (client resends context each time)	Lower per-request (context stored on server)

Session Management Strategies

When an application needs state on top of a stateless protocol, it must use a session management strategy. There are several approaches, each with distinct trade-offs.

1. Server-Side Sessions with Cookies

The most traditional approach. When a user logs in, the server creates a session object in memory (or in a database) and assigns it a unique session ID. This ID is sent to the client as a cookie via the Set-Cookie header, as specified in RFC 6265. On every subsequent request, the browser automatically includes this cookie. The server looks up the session ID, retrieves the stored state, and processes the request with full context.

The advantage is simplicity and security: the actual session data never leaves the server. The disadvantage is that the server must store and manage session data. In a multi-server deployment, either all servers must share a session store (e.g., Redis), or the load balancer must route the same client to the same server (sticky sessions).

2. Token-Based Authentication (JWT)

JSON Web Tokens take a different approach. Instead of storing state on the server, the server encodes the session data (user ID, permissions, expiration time) into a signed token and hands it to the client. The client includes this token in the Authorization header of every request. The server validates the token's signature and extracts the session data without touching any session store.

This approach scales well because any server can validate the token independently. But it introduces new challenges: tokens cannot be easily revoked once issued, and they increase the size of every request. If a token is stolen, the attacker has access until it expires.

3. Client-Side Storage

For non-sensitive state, applications can store data directly in the client using localStorage, sessionStorage, or cookies. Shopping cart contents, user preferences, and UI state are common candidates. This eliminates server-side storage entirely for that data, but the client cannot be trusted. Any data stored client-side can be inspected and modified by the user.

4. Hybrid Approaches

Most production systems combine strategies. A JWT handles authentication (who you are), server-side sessions handle authorization context (what you can do right now), and client-side storage handles UI preferences. The choice depends on security requirements, scalability needs, and how much trust you place in the client.

Systems Thinking Lens

The client-server model is a system with feedback loops. Server load affects response time. Response time affects user behavior. User behavior affects request volume, which feeds back into server load. A slow server causes users to retry, which increases load, which makes the server slower. This is a reinforcing feedback loop, the same concept from Session 0.4.

Session management adds another dimension. Stateful servers introduce coupling between the client and a specific server instance. This coupling constrains how you scale, how you handle failures, and how you deploy updates. Every architectural decision in this space creates downstream constraints on the rest of the system.

Understanding these trade-offs is not about memorizing which approach is "best." It is about recognizing that each choice shifts the balance of complexity, security, and scalability in predictable ways.

Assignment

Open your browser's developer tools (F12 or Ctrl+Shift+I) and go to the Network tab. Navigate to any website you use regularly. Watch the requests appear.

Identify 3 different requests in the list. For each one, note the method (GET/POST), the URL, and the status code.
Click on each request and look at the Request Headers. Does the request carry a Cookie header? Does it carry an Authorization header? If it carries neither, it is a stateless request. If it carries either, it is transporting session state.
For requests that carry session state, determine the strategy: is it a session cookie (short opaque string) or a JWT (long Base64-encoded string with two dots)?

Write down your findings. You have just traced how a real application manages the tension between a stateless protocol and stateful user experience.

The Network Stack

Before a client can talk to a server, several layers of networking infrastructure must cooperate. Every request you send travels through a stack of protocols, each handling a different concern: addressing, routing, reliability, and application logic. Understanding these layers is essential for diagnosing performance issues and making informed architectural decisions.

This session covers the protocols and components that make networked communication possible: IP, DNS, TCP, UDP, HTTP/HTTPS, WebSocket, and proxies.

IP: Addressing and Routing

The Internet Protocol (IP) is responsible for one thing: getting a packet from one machine to another. Every device on a network has an IP address. IPv4 addresses are 32-bit numbers written as four octets (e.g., 192.168.1.1), giving roughly 4.3 billion possible addresses. IPv6 addresses are 128-bit, written in hexadecimal groups (e.g., 2001:0db8::1), providing a vastly larger address space.

IP is a best-effort protocol. It routes packets toward their destination, but it does not guarantee delivery, ordering, or integrity. Those guarantees are the job of the transport layer above it.

DNS: Translating Names to Addresses

Humans use domain names (google.com, hibranwar.com). Machines use IP addresses. The Domain Name System, defined in RFC 1035, bridges that gap. DNS is a distributed, hierarchical database that maps domain names to IP addresses.

When you type a URL into your browser, a DNS resolution process begins. It follows a chain of servers, each responsible for a different level of the domain hierarchy.

sequenceDiagram participant B as Browser participant R as Recursive Resolver participant Root as Root DNS Server participant TLD as TLD Server (.com) participant Auth as Authoritative Server B->>R: What is the IP of example.com? R->>Root: Where is .com? Root-->>R: Ask TLD server at 192.5.6.30 R->>TLD: Where is example.com? TLD-->>R: Ask authoritative server at 205.251.195.35 R->>Auth: What is the IP of example.com? Auth-->>R: 93.184.216.34 (TTL: 3600s) R-->>B: 93.184.216.34 Note over B,R: Result cached for TTL duration

The recursive resolver (often provided by your ISP or a service like Cloudflare's 1.1.1.1) does the heavy lifting. It contacts root servers, TLD servers, and authoritative servers on your behalf. Results are cached at multiple levels, so most lookups resolve quickly from cache rather than traversing the full chain.

DNS Caching: Every DNS response includes a Time-To-Live (TTL) value. Resolvers, operating systems, and browsers all cache results for the TTL duration. This reduces latency and load on DNS servers, but means changes to DNS records take time to propagate.

TCP vs. UDP: The Transport Layer

Once DNS resolves the destination IP, the transport layer takes over. Two protocols dominate this layer: TCP and UDP. They represent fundamentally different trade-offs between reliability and speed.

TCP (RFC 793) provides a reliable, ordered, error-checked byte stream. Before any data flows, TCP establishes a connection using a three-way handshake (SYN, SYN-ACK, ACK). It guarantees that every byte arrives, in order, with no corruption. If a packet is lost, TCP retransmits it. This reliability comes at a cost: latency from the handshake, overhead from acknowledgments, and reduced throughput from congestion control.

UDP (RFC 768) is the opposite. No connection setup. No delivery guarantee. No ordering. No retransmission. A UDP packet is sent and forgotten. The entire UDP header is only 8 bytes, compared to TCP's minimum of 20 bytes. What you lose in reliability, you gain in speed and simplicity.

Property	TCP	UDP
Connection	Connection-oriented (three-way handshake)	Connectionless
Reliability	Guaranteed delivery, retransmission on loss	No delivery guarantee
Ordering	Bytes arrive in order	No ordering guarantee
Header size	20+ bytes	8 bytes
Latency	Higher (handshake + acknowledgments)	Lower (fire and forget)
Flow control	Yes (sliding window)	No
Use cases	Web browsing, email, file transfer, database queries	Video streaming, voice calls, DNS lookups, online gaming

The choice between TCP and UDP depends on whether your application can tolerate packet loss. A bank transfer cannot lose data, so it uses TCP. A video call can tolerate a dropped frame (you will not even notice), so it uses UDP. Waiting for retransmission in a live video call would cause stuttering, which is worse than losing a single frame.

HTTP and HTTPS

HTTP (Hypertext Transfer Protocol) is the application-layer protocol that powers the web. It runs on top of TCP (or, in HTTP/3, on top of QUIC, which runs on UDP). HTTP defines how clients and servers structure their messages: request methods, headers, status codes, and bodies.

HTTPS is HTTP with TLS (Transport Layer Security) encryption. The client and server perform a TLS handshake to establish an encrypted channel before any HTTP data flows. This protects data from eavesdropping and tampering in transit. Every production web application should use HTTPS.

WebSocket

HTTP follows a strict request-response pattern: the client asks, the server answers. But some applications need the server to push data to the client without being asked. Chat applications, live dashboards, multiplayer games, and real-time collaboration tools all require this capability.

WebSocket solves this. It begins as a standard HTTP request with an Upgrade header. If the server agrees, the connection is upgraded to a persistent, full-duplex channel. Both client and server can send messages at any time, in either direction, without the overhead of establishing new connections.

WebSocket: A protocol that provides full-duplex communication over a single, long-lived TCP connection. It begins with an HTTP handshake and then upgrades to a persistent channel where both parties can send messages independently.

Forward and Reverse Proxies

A proxy is an intermediary that sits between a client and a server, forwarding requests and responses. There are two types, and they serve very different purposes.

A forward proxy sits in front of clients. The client sends its request to the proxy, and the proxy forwards it to the destination server. The server sees the proxy's IP address, not the client's. Forward proxies are used for privacy, access control (blocking certain websites), and caching. Corporate networks often route all employee traffic through a forward proxy.

A reverse proxy sits in front of servers. The client sends its request to the proxy, thinking it is the actual server. The proxy forwards the request to one of several backend servers. Reverse proxies are used for load balancing, SSL termination, caching, and security (hiding the true server infrastructure). Nginx, HAProxy, and Cloudflare are common reverse proxies.

graph LR subgraph Forward Proxy C1[Client] --> FP[Forward Proxy] --> S1[Server] end

graph LR subgraph Reverse Proxy C2[Client] --> RP[Reverse Proxy] --> S2a[Server A] RP --> S2b[Server B] RP --> S2c[Server C] end

In systems thinking terms, a reverse proxy is a leverage point. It sits at a junction where many connections converge, making it an ideal place to implement cross-cutting concerns: rate limiting, authentication, logging, compression, and caching. Changing behavior at the proxy affects every request flowing through the system without touching any backend server.

Assignment

Answer this question in 3 sentences:

Why does a video call use UDP but a bank transfer uses TCP?

Your answer should reference at least two specific properties from the TCP vs. UDP comparison table (e.g., reliability, ordering, latency). Think about what happens in each scenario when a packet is lost. Which is worse: a brief glitch in video, or a missing digit in a transaction amount?

Two Approaches to API Communication

Once a client and server can communicate over HTTP (Session 1.1 and 1.2), the next question is: how should they structure that communication? What format should the request take? How should the server expose its data and operations?

This is the domain of API protocols. Two dominate modern system design: REST and GraphQL. They solve the same fundamental problem, allowing a client to read and write data on a server, but they make very different design choices about how to do it.

REST: Representational State Transfer

REST is not a protocol. It is an architectural style, defined by Roy Fielding in his 2000 doctoral dissertation. Fielding was one of the principal authors of the HTTP specification, and REST describes the design principles that made HTTP successful.

REST is built on six constraints. Four of them form the uniform interface, which is the defining feature of REST:

REST Uniform Interface: (1) Resources are identified by URIs. (2) Resources are manipulated through representations (JSON, XML, HTML). (3) Messages are self-descriptive, containing all information needed to process them. (4) Hypermedia drives application state (HATEOAS), meaning the server's responses include links to related resources and available actions.

In practice, most REST APIs follow a predictable pattern. Resources map to URL paths. HTTP methods map to operations.

HTTP Method	Operation	Example	Idempotent?
GET	Read	`GET /users/42`	Yes
POST	Create	`POST /users`	No
PUT	Replace	`PUT /users/42`	Yes
PATCH	Partial update	`PATCH /users/42`	No (by convention)
DELETE	Delete	`DELETE /users/42`	Yes

REST inherits HTTP's statelessness. Each request must contain all the information the server needs. No session context is assumed. This makes REST APIs easy to cache (GET responses can be cached by any HTTP cache), easy to scale (any server can handle any request), and easy to reason about (each endpoint has a clear, predictable behavior).

The Over-Fetching and Under-Fetching Problem

REST's simplicity comes with a cost. Each endpoint returns a fixed data structure. If you request GET /users/42, you get the entire user object: name, email, avatar, bio, preferences, creation date, and every other field. If you only needed the name and avatar, you still receive everything else. This is over-fetching.

Conversely, if you need a user's profile along with their 5 most recent posts and each post's comment count, you might need three separate requests: one for the user, one for the posts, one for the comment counts. This is under-fetching, and it means multiple round trips, each adding latency.

On a desktop with a fast connection, this is tolerable. On a mobile device over a cellular network, where each round trip adds 100-300ms of latency, it becomes a real performance problem.

GraphQL: Query What You Need

GraphQL was developed at Facebook in 2012 and open-sourced in 2015. The GraphQL specification describes it as a query language for APIs and a runtime for executing those queries against your data.

The core idea: the client specifies exactly what data it needs, and the server returns exactly that. No more, no less.

A GraphQL API exposes a single endpoint (typically POST /graphql). Instead of choosing between many endpoints, the client sends a query that describes the shape of the data it wants:

# GraphQL query
{
  user(id: 42) {
    name
    avatar
    posts(limit: 5) {
      title
      commentCount
    }
  }
}

This single request replaces the three REST calls from the previous example. The server returns a JSON response that mirrors the query structure exactly:

{
  "data": {
    "user": {
      "name": "Alice",
      "avatar": "/images/alice.jpg",
      "posts": [
        { "title": "On Feedback Loops", "commentCount": 12 },
        { "title": "Scaling Lessons", "commentCount": 7 }
      ]
    }
  }
}

GraphQL: A query language for APIs where the client defines the structure of the response. Uses a strongly-typed schema to describe available data. All requests go to a single endpoint. Eliminates over-fetching and under-fetching by letting clients request exactly the fields they need.

REST vs. GraphQL: A Comparison

sequenceDiagram participant C as Client participant R as REST API participant G as GraphQL API Note over C,R: REST: Multiple endpoints C->>R: GET /users/42 R-->>C: Full user object (all fields) C->>R: GET /users/42/posts?limit=5 R-->>C: Posts array (all fields per post) C->>R: GET /posts/101/comments/count R-->>C: Comment count Note over C,G: GraphQL: Single endpoint C->>G: POST /graphql (query with exact fields) G-->>C: Only requested fields, nested

Dimension	REST	GraphQL
Endpoints	Multiple (one per resource)	Single (`/graphql`)
Data fetching	Server decides what to return	Client decides what to return
Over-fetching	Common (fixed response shapes)	Eliminated (client specifies fields)
Under-fetching	Common (multiple round trips)	Eliminated (nested queries in one request)
Caching	Straightforward (HTTP caching by URL)	Complex (all requests hit same URL with POST)
Versioning	Typically URL-based (`/v1/`, `/v2/`)	Schema evolution (add fields, deprecate old ones)
Type system	None built-in (OpenAPI/Swagger is optional)	Strongly typed schema (required)
Error handling	HTTP status codes	Always returns 200; errors in response body
Learning curve	Low (uses standard HTTP conventions)	Higher (new query language, schema design)
Best suited for	Simple CRUD, public APIs, resource-oriented services	Complex data relationships, mobile clients, varied consumer needs

When REST Shines

REST is the better choice when your data model is simple and resource-oriented. If your API mostly serves CRUD operations (create, read, update, delete) on well-defined entities, REST's predictable URL structure and HTTP caching are hard to beat. Public APIs favor REST because it is universally understood, requires no specialized client libraries, and works with any HTTP client.

REST also excels when caching matters. Because each resource has its own URL, HTTP caches (browsers, CDNs, reverse proxies) can cache responses efficiently. A GET /users/42 response can be cached and reused by any client requesting the same resource.

When GraphQL Shines

GraphQL is the better choice when clients have diverse data needs. A mobile app might need a compact subset of user data. A desktop dashboard might need the full object with related entities. A third-party integration might need a completely different combination. With REST, you either build custom endpoints for each consumer or force everyone to over-fetch.

GraphQL also wins when the data model is deeply relational. If fetching a page requires combining data from users, posts, comments, reactions, and notifications, a single GraphQL query can traverse those relationships in one round trip. The equivalent REST implementation would require either multiple sequential requests or a custom aggregate endpoint.

The Trade-Off in Systems Terms

From a systems thinking perspective, REST and GraphQL shift complexity to different parts of the system. REST puts complexity on the client (which must orchestrate multiple requests and handle over-fetched data). GraphQL puts complexity on the server (which must resolve arbitrary query shapes and protect against expensive queries). Neither eliminates complexity. They relocate it.

This is a recurring theme in system design: there is no free lunch. Every architectural choice is a transfer of burden from one component to another. The skill is in choosing which component is best equipped to handle that burden given your specific constraints.

Assignment

You are building a mobile app for a social platform. The user profile screen shows:

User name and avatar
Bio (optional, only 40% of users have one)
Follower count
5 most recent posts (title and timestamp only)

Answer these questions:

If you use a REST API, how many requests would the client need to make? What data would be over-fetched?
If you use GraphQL, write the query (or describe its structure) that would fetch this screen's data in a single request.
Which protocol would reduce bandwidth usage for this mobile client, and why?

Consider: the mobile app is used on cellular networks where every kilobyte and every round trip counts. Your choice should reflect that constraint.

Beyond REST: Why Other Protocols Exist

Session 1.3 covered REST and GraphQL, both of which operate over HTTP/1.1 or HTTP/2 using text-based formats like JSON. They work well for many scenarios, but they carry overhead that becomes painful at scale or under specific constraints. Two protocols address these gaps directly: gRPC for high-performance service-to-service communication, and WebSocket for persistent, bidirectional real-time connections.

Understanding when to reach for each one is a core system design skill. The wrong protocol choice can introduce unnecessary latency, complexity, or resource consumption that compounds as the system grows.

gRPC: Binary, Typed, and Fast

gRPC is an open-source remote procedure call framework originally developed at Google. It uses HTTP/2 as its transport layer and Protocol Buffers (protobuf) as its serialization format. Both choices are deliberate.

Protocol Buffers are a language-neutral, platform-neutral mechanism for serializing structured data. You define your data schema in a .proto file, and the protobuf compiler generates strongly typed code in your target language. The binary encoding is significantly smaller and faster to parse than JSON.

HTTP/2 brings multiplexing (multiple requests over a single TCP connection), header compression, and server push. gRPC exploits all of these. A single connection between two services can carry hundreds of concurrent RPCs without the head-of-line blocking that plagues HTTP/1.1.

The Four gRPC Communication Patterns

gRPC supports four interaction modes, each suited to different scenarios:

Pattern	Description	Use Case
Unary	Client sends one request, server sends one response	Standard API call (fetch user profile)
Server streaming	Client sends one request, server sends a stream of responses	Downloading large datasets, log tailing
Client streaming	Client sends a stream of messages, server sends one response	Uploading telemetry data in batches
Bidirectional streaming	Both sides send streams of messages independently	Real-time collaboration between services

The strongly typed contract means both sides agree on the exact shape of every message at compile time. No runtime surprises from a missing field or a string where you expected an integer. This is a significant advantage in large systems where dozens of teams own different services.

WebSocket: Persistent and Bidirectional

WebSocket, standardized as RFC 6455 in 2011, solves a different problem. HTTP is inherently request-response: the client asks, the server answers, the connection is done. For applications that need the server to push data to the client without being asked, HTTP requires workarounds like long polling or server-sent events.

WebSocket replaces this with a persistent, full-duplex connection. The connection starts as a standard HTTP request with an Upgrade header. If the server agrees, the protocol switches from HTTP to WebSocket, and both sides can send messages at any time over the same TCP connection.

WebSocket provides full-duplex communication channels over a single TCP connection. After an initial HTTP handshake, the connection stays open. Either side can send messages at any time, with minimal framing overhead (as little as 2 bytes per frame).

The low per-message overhead makes WebSocket ideal for high-frequency, small-payload scenarios: chat messages, live price updates, multiplayer game state, collaborative editing cursors.

How They Differ in Practice

sequenceDiagram participant C as Client participant S as Server Note over C,S: gRPC Unary Call C->>S: HTTP/2 POST (binary protobuf) S-->>C: Response (binary protobuf) Note over C,S: Connection reused for next call Note over C,S: WebSocket Connection C->>S: HTTP GET (Upgrade: websocket) S-->>C: 101 Switching Protocols C->>S: Message (any time) S-->>C: Message (any time) C->>S: Message (any time) S-->>C: Message (any time) Note over C,S: Connection stays open

The gRPC call is request-response, even though the underlying HTTP/2 connection persists. Each RPC is a discrete unit with a defined start and end. WebSocket, by contrast, opens a channel that stays alive. Messages flow in both directions without the request-response framing.

Protocol Comparison Table

The following table compares gRPC, WebSocket, and REST across key dimensions relevant to system design decisions.

Dimension	REST (HTTP/JSON)	gRPC	WebSocket
Transport	HTTP/1.1 or HTTP/2	HTTP/2 (required)	TCP (after HTTP upgrade)
Data format	JSON (text)	Protocol Buffers (binary)	Any (text or binary frames)
Type safety	None (runtime validation)	Strong (compile-time from .proto)	None (application-defined)
Communication	Request-response	Unary + 3 streaming modes	Full-duplex, persistent
Browser support	Native	Requires gRPC-Web proxy	Native (all modern browsers)
Payload size	Larger (verbose JSON)	Smallest (binary encoding)	Depends on application
Best for	Public APIs, CRUD, web frontends	Internal service-to-service	Real-time client-server push
Tooling	Curl, Postman, any HTTP client	grpcurl, generated clients	Browser DevTools, wscat

Where Each Protocol Fits

gRPC dominates internal service communication in large distributed systems. When Service A calls Service B 10,000 times per second, the difference between JSON parsing and protobuf deserialization is measurable. The strict contract prevents the subtle breaking changes that plague loosely typed JSON APIs across team boundaries. Companies like Google, Netflix, and Stripe use gRPC extensively for backend-to-backend traffic.

WebSocket dominates scenarios where the server needs to push data to clients without waiting for a request. Chat applications, live dashboards, multiplayer games, collaborative editors. Any feature where a user expects to see updates the moment they happen, without refreshing.

REST remains the default for public-facing APIs, CRUD operations, and any scenario where simplicity, cacheability, and broad client compatibility matter more than raw performance.

Systems Thinking Lens

Protocol choice is a leverage point in the system. Choosing gRPC for internal communication reduces serialization overhead across every service boundary. That reduction compounds: fewer CPU cycles per request means fewer instances needed, which means lower cost, which means more budget for features. The feedback loop runs through infrastructure cost, team velocity, and product capability.

Conversely, choosing WebSocket where simple polling would suffice introduces connection management complexity, memory overhead for open connections, and operational burden for connection state during deploys. The protocol that feels more "advanced" can make the overall system worse if the problem did not require it.

The right question is never "which protocol is best?" It is "what does this specific interaction need, and what are the second-order effects of this choice on the rest of the system?"

Assignment

Match each use case below to the most appropriate protocol (REST, gRPC, or WebSocket). Write one or two sentences justifying each choice.

Live chat application where users see messages instantly as they arrive.
Microservice-to-microservice communication in a payment processing pipeline handling 50,000 transactions per second.
Stock ticker dashboard displaying real-time price updates for 200 symbols.
Mobile app API for a food delivery service where users browse restaurants, place orders, and track delivery.

For each answer, consider: What is the communication pattern? Who initiates data flow? How critical is latency? Does the client need a persistent connection, or is request-response sufficient?

Two Ways to Build Software

Every application starts as a single thing. One codebase, one deployment, one process. At some point, teams face a decision: keep it together or break it apart. This is the monolith-versus-microservices question, and it is one of the most consequential architectural decisions you will make.

The answer is rarely obvious, and the industry has swung between extremes. Understanding the real tradeoffs, not the marketing, is what matters.

The Monolith

A monolithic architecture deploys the entire application as a single unit. All modules, whether they handle user authentication, payment processing, notifications, or reporting, live in one codebase and run in one process (or a set of identical processes behind a load balancer).

Monolith: A software architecture where all components are packaged and deployed as a single unit. Function calls between modules happen in-process, not over the network.

Monoliths are not inherently bad. They have real structural advantages:

Simple deployment. One artifact to build, test, and ship. No coordination between services.
Easy debugging. A stack trace shows you the full call path. No distributed tracing needed.
In-process communication. A function call takes nanoseconds. A network call takes milliseconds. That is a six-order-of-magnitude difference.
Single database. Transactions are straightforward. ACID guarantees come free.
Lower operational overhead. One thing to monitor, one thing to scale, one thing to secure.

The problems emerge as the team and codebase grow. A change to the payment module requires redeploying the entire application. A memory leak in the reporting module crashes the authentication module. A team of 80 engineers stepping on each other in the same repository slows everyone down.

Microservices

A microservice architecture decomposes the application into small, independently deployable services. Each service owns a specific business capability, runs in its own process, and communicates with other services over the network (typically HTTP/REST or gRPC).

Microservices: An architectural style where the application is composed of loosely coupled, independently deployable services, each responsible for a specific business capability and maintaining its own data store.

graph TB subgraph Monolith M[Single Deployment Unit] M --- MA[Auth Module] M --- MB[Payment Module] M --- MC[Notification Module] M --- MD[Reporting Module] M --- DB1[(Single Database)] end

graph TB subgraph Microservices GW[API Gateway] GW --> SA[Auth Service] GW --> SB[Payment Service] GW --> SC[Notification Service] GW --> SD[Reporting Service] SA --- DB2[(Auth DB)] SB --- DB3[(Payment DB)] SC --- DB4[(Notif DB)] SD --- DB5[(Report DB)] end

The advantages are real but come with costs:

Independent deployment. Ship the payment service without touching authentication. Faster release cycles per team.
Team autonomy. Each team owns a service end-to-end. They choose their own language, framework, and release schedule.
Fault isolation. If the reporting service crashes, the payment service keeps running.
Targeted scaling. Scale the service that needs it, not the entire application.

The costs:

Network complexity. Every function call that was in-process is now a network call. Latency, retries, timeouts, circuit breakers.
Data consistency. No cross-service transactions. You need sagas, eventual consistency, or careful domain boundaries.
Operational overhead. Dozens or hundreds of services to deploy, monitor, log, and trace. You need CI/CD pipelines, container orchestration, service meshes, distributed tracing.
Integration testing. Testing the interaction between services is harder than testing a monolith's internal calls.

Comparison Across Dimensions

Dimension	Monolith	Microservices
Deployment	Single artifact, all-or-nothing	Independent per service
Codebase	One repository (usually)	Many repositories or monorepo
Communication	In-process function calls	Network calls (HTTP, gRPC, messaging)
Data management	Single shared database	Database per service
Scaling	Scale entire application	Scale individual services
Debugging	Single stack trace, single log	Distributed tracing across services
Team structure	Teams share one codebase	Teams own individual services
Fault isolation	One failure can crash everything	Failures contained to one service
Time to first deploy	Fast (minimal infrastructure)	Slow (needs orchestration, CI/CD)

Conway's Law

In 1967, Melvin Conway observed that "any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." This observation, later named Conway's Law, explains why architecture decisions and team decisions are inseparable.

Conway's Law: The architecture of a system mirrors the communication structure of the organization that builds it. A company with four teams will produce a system with four major components, regardless of what the optimal architecture might be.

Conway's Law works in both directions. If you have a single team, a monolith is natural and effective. As Martin Fowler notes, "a dozen or two people can have deep and informal communications, so Conway's Law indicates they will create a monolith, and that is fine." If you have 15 teams and force them into one monolith, they will create implicit service boundaries anyway through code ownership conventions, module walls, and meeting schedules.

The Inverse Conway Maneuver deliberately structures teams to encourage the desired architecture. Want microservices? Create small, autonomous teams organized around business capabilities. Want a well-structured monolith? Keep the team small and communicating tightly.

When to Start With What

Martin Fowler's "Monolith First" argument is straightforward: almost all successful microservice stories started with a monolith that grew too big and was broken up. Almost all cases where systems were built as microservices from scratch ended up in serious trouble.

The reasoning is practical. Early in a product's life, you do not know where the real boundaries are. You do not know which features will matter, which will be thrown away, and where the load will concentrate. A monolith lets you discover these boundaries cheaply. Refactoring a function boundary inside a monolith is an afternoon of work. Redrawing a service boundary in a microservice system means migrating data, rewriting APIs, and coordinating multiple teams.

The signal that you might need to break apart is usually organizational, not technical. When the team is large enough that people are blocked by each other. When deployment frequency drops because too many changes are coupled. When a single module's scaling needs are dramatically different from the rest. These are structural pressures that microservices address.

The Modular Monolith: A Middle Path

A modular monolith maintains strict module boundaries inside a single deployment. Modules communicate through defined internal APIs, not by reaching into each other's database tables or internal classes. The code is structured as if it could be split into services, but it runs as one process.

This approach captures the deployment simplicity of a monolith while maintaining the clear boundaries that make a future migration to microservices feasible. Shopify famously operates a modular monolith that serves millions of merchants, demonstrating that you do not need microservices to reach enormous scale.

Assignment

This assignment has two parts.

Part 1: Write a 3-sentence argument FOR starting with a monolith for a new startup building a food delivery app. Address why a monolith is the right choice at this stage.

Part 2: Write a 3-sentence argument for WHEN to break the monolith apart. Identify the specific signals (team size, deployment pain, scaling bottlenecks) that would trigger the transition to microservices.

In both parts, reference Conway's Law. How does the team's current structure support your recommendation?

Patterns as Reusable Decisions

An architectural pattern is a proven structural arrangement for organizing a software system. It is not a library or a framework. It is a set of decisions about how components are arranged, how they communicate, and where responsibilities live. Choosing a pattern means choosing which tradeoffs you accept.

This session surveys five major patterns. Each one solves a different class of problem. Most real production systems combine two or more of them.

Multi-Tier Architecture

Multi-tier (often called n-tier) separates the system into horizontal layers, each handling a distinct concern. The classic version has three tiers: presentation, business logic, and data. Each tier communicates only with its immediate neighbor.

Multi-tier architecture organizes a system into horizontal layers of responsibility. Each layer provides services to the layer above it and consumes services from the layer below it. Communication between non-adjacent layers is prohibited.

This pattern works because separation of concerns reduces cognitive load. Frontend developers work in the presentation tier without touching database queries. Backend developers implement business rules without worrying about HTML rendering. Database administrators optimize storage without breaking application logic.

The limitation is rigidity. Every request must traverse all tiers, even when only one tier does meaningful work. Adding a new feature often requires changes in every layer. And horizontal scaling is uneven: the data tier typically becomes the bottleneck while the presentation tier sits idle.

Event-Driven Architecture

Event-driven architecture (EDA) structures the system around the production, detection, and reaction to events. Instead of components calling each other directly, they emit events. Other components subscribe to the events they care about and react independently.

Event-driven architecture decouples producers from consumers through asynchronous events. A producer publishes an event to a broker or bus without knowing which consumers will process it. Consumers subscribe to event types and react independently.

graph LR OP[Order Placed] --> EB[Event Broker] EB --> INV[Inventory Service] EB --> PAY[Payment Service] EB --> NOT[Notification Service] EB --> ANA[Analytics Service] INV --> EB PAY --> EB

In this example, the order service publishes an "Order Placed" event. Four services react to it independently. The order service does not know or care how many consumers exist. Adding a fifth consumer (say, a fraud detection service) requires zero changes to the producer.

EDA comes in two topologies, as described in Mark Richards' Software Architecture Patterns:

Broker topology: Events flow through a lightweight message broker. No central orchestrator. Each consumer decides what to do with each event. Good for simple, decoupled flows.
Mediator topology: A central mediator receives events and orchestrates a sequence of steps across consumers. Good for complex workflows that require ordering and error handling.

The strength of EDA is loose coupling and extensibility. The weakness is that the overall flow of the system becomes harder to trace. When something goes wrong, you cannot follow a single stack trace. You follow events through a broker, across services, through time.

Microservices Architecture

Session 1.5 covered this in detail. As an architectural pattern, microservices decompose the system vertically by business capability rather than horizontally by technical layer. Each service owns its own data, logic, and interface for a specific domain.

The pattern's defining characteristic is independent deployability. If a service can only be deployed alongside other services, it is not a microservice. It is a distributed monolith, which inherits the costs of both architectures and the benefits of neither.

Microservices are often combined with event-driven architecture. Services communicate through events rather than direct API calls, which further reduces coupling and improves resilience.

Cell-Based Architecture

Cell-based architecture, documented in detail by WSO2's reference architecture, organizes the system into self-contained cells. Each cell is a complete unit that includes its own services, data stores, and communication gateway. Cells interact with each other through well-defined, versioned APIs at the cell boundary.

Cell-based architecture groups related services into autonomous cells, each with its own gateway, data stores, and internal communication. Cells are the unit of deployment, scaling, and failure isolation. Think of each cell as a small, self-sufficient system.

The cell boundary acts as a blast radius. If everything inside Cell A fails, Cell B is unaffected because it communicates only through the cell gateway, which can implement circuit breakers, retries, and fallbacks.

This pattern is particularly useful for very large systems where even microservices become hard to manage. Instead of managing 300 individual services, you manage 20 cells, each containing 10-20 services. The cell provides an intermediate level of organization between individual services and the overall system.

Serverless Architecture

Serverless architecture delegates all infrastructure management to a cloud provider. You write functions that execute in response to events (HTTP requests, message queue entries, file uploads, timers). The provider handles provisioning, scaling, and deprovisioning.

Serverless architecture runs application code in ephemeral, event-triggered functions managed entirely by a cloud provider. There are no servers to provision, patch, or scale. You pay only for the compute time consumed during execution.

Serverless excels at variable, unpredictable workloads. An image processing function that runs 10 times on Monday and 10,000 times on Friday costs proportionally. There is no idle capacity to pay for. The operational burden approaches zero for small to medium workloads.

The constraints are real. Functions have execution time limits (15 minutes on AWS Lambda). Cold starts add latency when a function has not been invoked recently. State must be stored externally. Vendor lock-in is significant because your function code is deeply coupled to the provider's event model, IAM system, and supporting services.

Pattern Comparison

Pattern	Strengths	Weaknesses	Best Fit
Multi-tier	Clear separation, well understood, easy to staff	Rigid layering, uneven scaling, all-tier changes	Traditional web apps, enterprise CRUD systems
Event-driven	Loose coupling, extensible, async by default	Hard to trace, eventual consistency, debugging complexity	Systems with many independent reactions to the same trigger
Microservices	Independent deploy, team autonomy, targeted scaling	Network overhead, data consistency, operational burden	Large teams, complex domains, varying scale per capability
Cell-based	Blast radius control, organizational grouping, versioned boundaries	Complex to design initially, over-engineering risk for small systems	Very large systems with hundreds of services needing organizational structure
Serverless	Zero ops, pay-per-use, automatic scaling	Cold starts, execution limits, vendor lock-in, stateless	Variable workloads, event processing, glue logic, prototypes

Combining Patterns

Production systems rarely use a single pattern in isolation. A typical e-commerce platform might use multi-tier for its web frontend, microservices for its backend capabilities, event-driven communication between those services, and serverless for image resizing and email sending. The patterns are not competing alternatives. They are complementary tools applied at different levels of the system.

The question is not "which pattern should we use?" It is "which pattern fits this specific part of the system, given our team size, operational maturity, and the constraints of the problem?"

Systems Thinking Lens

Each pattern creates a different feedback structure. Multi-tier creates long feedback loops (changes ripple through all layers). Event-driven creates many short, independent loops (each consumer reacts on its own). Microservices create team-level loops (each team iterates independently). Cell-based creates nested loops (within cells and between cells).

The pattern you choose determines where delays accumulate, where failures propagate, and where teams can move independently. These are structural decisions with compounding effects over months and years. A pattern that feels efficient today can create organizational gridlock in two years if the feedback loops it creates do not match how your teams actually work.

Assignment

Think about an application you work on or use frequently. It could be your company's product, an open-source project, or a well-known service like Grab, Gojek, or Spotify.

Draw a box diagram of the major components. Keep it high-level: 5-10 boxes representing services, databases, queues, and external systems. Draw arrows showing how they communicate.
Identify the primary pattern. Does it follow multi-tier, event-driven, microservices, cell-based, serverless, or a combination? What evidence in the diagram supports your answer?
Find the pattern boundary. Most systems combine patterns. Identify at least one place where the system switches from one pattern to another (for example, request-response for the API layer but event-driven for background processing).

If you are not sure about the internals, make reasonable assumptions based on what you observe as a user. Where do you see real-time updates (event-driven)? Where do you see request-response behavior (multi-tier or microservices)? Where do you see background processing that does not block the user (serverless or event-driven)?

What Is Three-Tier Architecture?

Three-tier architecture is the most widely deployed pattern for web applications. It divides an application into three logical layers, each with a distinct responsibility: presenting information to the user, processing business logic, and storing data. Each tier can be developed, deployed, and scaled independently.

This separation is not just organizational convenience. It enforces boundaries that prevent the presentation layer from directly querying the database, or the data layer from handling user interface concerns. When those boundaries are respected, teams can work on each tier without stepping on each other, and failures in one tier do not automatically cascade into the others.

Three-tier architecture separates an application into three layers: Presentation (user interface), Application (business logic), and Data (persistent storage). Each tier communicates only with its adjacent tier.

The Three Tiers

graph TB subgraph Presentation Tier A[Web Browser / Mobile App] end subgraph Application Tier B[Business Logic / API Server] end subgraph Data Tier C[Database / Cache / File Storage] end A -->|HTTP/HTTPS| B B -->|SQL/API calls| C

Presentation Tier

The presentation tier is what the user sees and interacts with. In a web application, this is the HTML, CSS, and JavaScript that runs in the browser. In a mobile app, it is the native UI. This tier collects user input, sends requests to the application tier, and renders responses.

The presentation tier should contain zero business logic. It does not validate whether a user has sufficient funds for a purchase. It does not calculate shipping costs. It renders what the application tier tells it to render. When presentation code starts making business decisions, you get logic scattered across tiers, which makes bugs harder to trace and changes harder to deploy.

Application Tier

The application tier (also called the logic tier or middle tier) is where the actual work happens. It receives requests from the presentation tier, applies business rules, orchestrates data operations, and returns results. Authentication checks, order processing, inventory validation, and pricing calculations all belong here.

This tier is typically the most complex and the most frequently changed. New features, rule changes, and integrations with external services all happen in this layer. Because it sits between the other two tiers, it also serves as a translation layer: converting user-facing requests into database queries, and database results into user-facing responses.

Data Tier

The data tier is responsible for persistent storage and retrieval. Relational databases, NoSQL stores, object storage, and caching layers all live here. The data tier receives structured queries from the application tier and returns results. It does not know what the data means in a business context. It stores rows and returns rows.

A well-designed data tier handles its own concerns: indexing, replication, backup, and query optimization. The application tier should not need to know whether the database is running on a single node or a cluster of replicas. That abstraction is the data tier's job.

Tier Responsibilities, Scaling, and AWS Services

Tier	Responsibility	Scaling Strategy	AWS Services
Presentation	Render UI, collect user input, display responses	CDN distribution, edge caching, static asset hosting	Amazon CloudFront, S3 (static hosting), Amplify
Application	Business logic, authentication, request orchestration	Horizontal scaling behind load balancer, auto-scaling groups	EC2 + ALB, Elastic Beanstalk, ECS, EKS
Data	Persistent storage, queries, caching	Read replicas, sharding, caching layer in front of DB	RDS, DynamoDB, ElastiCache, Aurora

Classic AWS Implementation

A standard three-tier deployment on AWS looks like this: CloudFront serves static assets from S3 for the presentation tier. An Application Load Balancer distributes traffic across EC2 instances (or ECS containers) running the application tier. Amazon RDS or DynamoDB handles the data tier, with ElastiCache for frequently accessed data.

Each tier lives in its own subnet within a VPC. The presentation tier sits in public subnets. The application tier runs in private subnets, accessible only through the load balancer. The data tier runs in isolated private subnets, accessible only from the application tier. This network segmentation limits blast radius when something goes wrong.

Serverless Variant

The serverless variant replaces managed servers with fully managed services. Amazon API Gateway replaces the load balancer and handles request routing. AWS Lambda replaces EC2 instances for compute. DynamoDB or Aurora Serverless replaces traditional database instances.

graph TB subgraph Presentation A[CloudFront + S3] end subgraph Application B[API Gateway] --> C[Lambda Functions] end subgraph Data D[DynamoDB / Aurora Serverless] end A -->|HTTPS| B C -->|SDK calls| D

The advantage is operational: no servers to patch, no capacity to pre-provision, and costs that scale to zero when there is no traffic. The disadvantage is cold start latency on Lambda functions and the constraints of execution time limits (15 minutes maximum per invocation). For request-response workloads with variable traffic, the serverless variant is often the most cost-effective option.

Kubernetes Variant

Organizations that need container orchestration, portability across cloud providers, or fine-grained control over their runtime environment often choose Amazon EKS (Elastic Kubernetes Service) for the application tier. Each tier runs as a set of Kubernetes pods, scaled by the Horizontal Pod Autoscaler based on CPU, memory, or custom metrics.

The Kubernetes variant adds operational complexity. You manage node groups, pod scheduling, service meshes, and Kubernetes upgrades. In return, you get portability (the same manifests can run on GKE or AKS), a rich ecosystem of observability tools, and the ability to run sidecar containers for logging, tracing, or security proxies alongside your application code.

When Three-Tier Breaks Down

Three-tier architecture works well for the majority of web applications. It breaks down when the application tier becomes a monolithic bottleneck that every request must pass through. If your application has fundamentally different workloads (real-time chat, batch processing, and CRUD operations), forcing them all through a single application tier creates coupling and scaling problems.

At that point, you start splitting the application tier into separate services, which is the beginning of microservices. But start with three-tier. It is simpler, easier to reason about, and sufficient for most applications until they reach significant scale.

Assignment

Design a three-tier architecture for an online bookstore. The store allows users to browse a catalog, search for books, add items to a cart, and check out.

Draw three boxes (Presentation, Application, Data) and label what runs in each tier. Be specific: name the technologies or AWS services you would use.
Write one scaling strategy per tier. For example: how does the presentation tier handle a traffic spike during a book launch? How does the data tier handle a growing catalog?
Identify one risk in your design. What happens if the application tier goes down? What is the user experience?

Why Serialization Matters

Every time data moves between systems, it must be converted from an in-memory structure to a byte stream (serialization) and back again (deserialization). This happens on every API call, every database write, every message pushed to a queue. The format you choose for that conversion affects payload size, parsing speed, debugging ease, and cross-language compatibility.

Choosing a serialization format is not a cosmetic decision. At high throughput, the difference between a verbose text format and a compact binary format translates directly into bandwidth costs, latency, and CPU utilization.

Serialization Formats Compared

JSON (JavaScript Object Notation)

JSON is the default format for web APIs. It is human-readable, natively supported in every major programming language, and requires no schema definition to use. You can inspect a JSON payload in a browser, log it as a string, and debug it with your eyes. That convenience comes at a cost: JSON is verbose. Field names are repeated in every object, numbers are stored as text, and there is no built-in support for binary data.

XML (Extensible Markup Language)

XML dominated the enterprise integration era (SOAP, WSDL, XSLT). It supports namespaces, attributes, and complex nested structures. It also carries significant overhead: closing tags, verbose syntax, and larger payloads than JSON for equivalent data. XML is still used in legacy systems, document formats (DOCX, SVG), and configuration files (Maven POM, Android manifests). For new API design, it has been largely replaced by JSON.

Protocol Buffers (Protobuf)

Developed by Google, Protobuf is a binary serialization format that requires a schema definition (a .proto file). Fields are identified by numeric tags rather than string names, which makes payloads compact. Protobuf is 3 to 7 times faster than JSON for serialization and deserialization, and payloads are typically 30 to 50 percent smaller. The tradeoff: you cannot read a Protobuf message without the schema, and both client and server must agree on the schema at compile time.

Apache Avro

Avro is a binary format developed within the Apache Hadoop ecosystem. Like Protobuf, it uses a schema, but the schema is included with the data (or stored in a schema registry). This makes Avro particularly strong for data pipelines where producers and consumers may not share a codebase. Avro files are often the most compact, but serialization and deserialization can use more memory than Protobuf.

Format Comparison

Feature	JSON	XML	Protobuf	Avro
Encoding	Text	Text	Binary	Binary
Human-readable	Yes	Yes	No	No
Schema required	No (optional via JSON Schema)	No (optional via XSD)	Yes (.proto file)	Yes (JSON schema)
Payload size	Large	Largest	Small	Smallest
Serialization speed	Moderate	Slow	Fast	Fast
Schema evolution	Manual, fragile	Supported via XSD versioning	Good (field numbers are stable)	Excellent (schema registry)
Best use case	Public APIs, web frontends	Legacy systems, document formats	Internal microservices (gRPC)	Data pipelines, Kafka, Hadoop

Web Sessions: Maintaining State in a Stateless Protocol

HTTP is stateless. Every request is independent. The server does not inherently remember who you are between requests. Session management is the set of techniques used to associate a sequence of requests with a single user. There are three dominant approaches.

Cookie-Based Server Sessions

The server generates a random session ID, stores session data (user ID, permissions, cart contents) in server-side storage (memory, Redis, a database), and sends the session ID to the client as an HTTP cookie. On every subsequent request, the browser automatically includes the cookie. The server looks up the session ID and retrieves the associated data.

This approach is simple and secure when implemented correctly. The cookie itself contains no sensitive data, just an opaque identifier. The server controls the session lifecycle: it can invalidate a session immediately by deleting the server-side record. The limitation is that session storage must be shared across all application servers, which requires sticky sessions or a centralized store like Redis.

JWT (JSON Web Token)

With JWT, the server encodes the user's identity and claims into a signed token and sends it to the client. The client stores the token (typically in localStorage or an HTTP-only cookie) and includes it in the Authorization header on each request. The server verifies the token's signature without querying any external store.

JWTs are stateless: no server-side storage is needed. This makes horizontal scaling straightforward because any server instance can verify the token. The tradeoff is that you cannot revoke a JWT before it expires without maintaining a blocklist, which reintroduces server-side state. JWTs also tend to be larger than session cookies (a typical JWT is 800 to 2000 bytes).

sequenceDiagram participant Client participant Server participant Auth as Auth Service Client->>Server: POST /login (credentials) Server->>Auth: Validate credentials Auth-->>Server: User verified Server->>Server: Create JWT (header.payload.signature) Server-->>Client: 200 OK + JWT token Client->>Server: GET /api/data (Authorization: Bearer JWT) Server->>Server: Verify JWT signature Server-->>Client: 200 OK + data

Server-Side Sessions with External Store

A hybrid approach uses cookies to carry a session ID while storing session data in a fast external store like Redis or Memcached. This combines the security of server-side control (immediate revocation) with the scalability of a shared store. Most production web frameworks (Express with connect-redis, Django with Redis backend, Spring Session) support this pattern out of the box.

Session Strategy Comparison

Feature	Cookie + Server Session	JWT	Cookie + External Store (Redis)
State location	Server memory or DB	Client (token)	External cache (Redis)
Scalability	Requires sticky sessions or shared store	Stateless, any server can verify	Scales with Redis cluster
Revocation	Immediate (delete session)	Difficult (wait for expiry or maintain blocklist)	Immediate (delete from Redis)
Payload size	Small (session ID only)	Large (800+ bytes)	Small (session ID only)
Cross-domain	Limited by cookie scope	Works across domains via headers	Limited by cookie scope
Best for	Simple single-server apps	APIs, microservices, mobile clients	Production web apps at scale

Choosing the Right Combination

The serialization format and session strategy are independent choices, but they interact. An API that uses Protobuf for internal service communication might still use JSON for its public-facing endpoints and JWTs for authentication. A data pipeline might use Avro for Kafka messages while the web frontend that triggers the pipeline uses cookie-based sessions and JSON.

The principle is the same in both cases: match the tool to the constraint. Human-readable formats for debugging and external APIs. Binary formats for throughput-sensitive internal paths. Server-side sessions when you need revocation. JWTs when you need stateless verification across services.

Assignment

An API serves 10,000 requests per second. The average JSON response payload is 2 KB. Switching from JSON to Protobuf would reduce payload size by 40%.

Calculate the bandwidth saved per hour after switching to Protobuf. Show your work.
If bandwidth costs $0.09 per GB (AWS data transfer pricing), how much would you save per month (30 days)?
What non-bandwidth costs would you incur to make this switch? Think about developer time, tooling, debugging difficulty, and client compatibility.

Hint: 10,000 req/s * 2 KB * 0.40 savings = bandwidth saved per second. Convert to GB/hour.

The Language of Distributed Systems

Before you can design systems that scale, you need a shared vocabulary. These terms appear in every system design discussion, every architecture review, and every post-mortem. They are not abstract academic concepts. Each one describes a concrete property that either exists in your system or does not.

This session defines the core terms, gives you a practical example for each, and introduces the CAP theorem, which formalizes the tradeoffs between three of them.

Core Terms

Term	Definition	One-Line Example
Scalability	The ability of a system to handle increased load by adding resources	Adding more web servers behind a load balancer to serve more users
Availability	The proportion of time a system is operational and accessible	99.99% availability means less than 53 minutes of downtime per year
Consistency	All nodes in a distributed system return the same data at the same time	After updating your profile photo, every server shows the new photo immediately
Fault Tolerance	The ability to continue operating correctly when components fail	A database cluster continues serving reads when one replica crashes
SPOF (Single Point of Failure)	A component whose failure brings down the entire system	A single database server with no replicas: if it dies, everything stops
Partition Tolerance	The system continues to operate despite network splits between nodes	Two data centers lose connectivity but both keep serving requests

Scalability: Vertical vs. Horizontal

Vertical scaling (scaling up) means adding more resources to a single machine: more CPU, more RAM, faster disks. Horizontal scaling (scaling out) means adding more machines to distribute the load.

Vertical scaling is simpler. You upgrade the server and your application code does not change. But every machine has a ceiling. You cannot add infinite RAM. You cannot buy a CPU with 10,000 cores. And while you are upgrading, the machine is typically offline.

Horizontal scaling has no theoretical ceiling, but it introduces complexity. Your application must handle multiple instances, shared state, load distribution, and network communication between nodes. Most production systems use a combination: scale vertically until it becomes cost-ineffective, then scale horizontally.

Availability: Measuring Uptime

Availability is expressed as a percentage, commonly referred to by the number of nines:

Availability	Downtime per Year	Downtime per Month
99% (two nines)	3.65 days	7.3 hours
99.9% (three nines)	8.76 hours	43.8 minutes
99.99% (four nines)	52.6 minutes	4.38 minutes
99.999% (five nines)	5.26 minutes	26.3 seconds

Each additional nine is exponentially harder and more expensive to achieve. Moving from 99.9% to 99.99% often requires redundant infrastructure across multiple availability zones, automated failover, and rigorous testing of failure scenarios. Most consumer web applications target three or four nines. Financial trading systems and emergency services aim for five.

Consistency: Strong vs. Eventual

Strong consistency guarantees that after a write completes, every subsequent read returns the updated value. If you transfer $100 from account A to account B, strong consistency means no reader will ever see the money in both accounts or neither account. The system behaves as if there is a single copy of the data.

Eventual consistency allows replicas to diverge temporarily. After a write, some replicas may return stale data for a period of time. Eventually, all replicas converge to the same value. The "eventually" part can range from milliseconds to seconds, depending on the system.

Strong consistency is easier to reason about but harder to scale. It often requires coordination between nodes (locks, consensus protocols), which adds latency. Eventual consistency scales better because replicas can operate independently, but your application logic must handle stale reads gracefully.

Social media feeds use eventual consistency. If you post a photo and your friend sees it two seconds later, nobody notices. Banking transactions use strong consistency. If the balance is wrong even briefly, the consequences are severe.

Fault Tolerance and Single Points of Failure

A fault-tolerant system is designed to continue operating when things break. Hardware fails. Networks partition. Disks corrupt. Software crashes. The question is not whether failures happen but what the system does when they happen.

The first step in designing for fault tolerance is identifying every Single Point of Failure (SPOF). A SPOF is any component that, if it fails, takes the entire system down. Common SPOFs include:

A single database server with no replicas
A single load balancer with no failover
A single DNS provider
An application that depends on one external API with no fallback

The remedy for a SPOF is redundancy: run multiple instances, in multiple locations, with automatic failover. This does not eliminate failure. It reduces the probability that a single failure becomes a system-wide outage.

The CAP Theorem

The CAP theorem, proposed by Eric Brewer in 2000 and formally proven by Seth Gilbert and Nancy Lynch in 2002, states that a distributed data store can guarantee at most two of three properties simultaneously: Consistency, Availability, and Partition Tolerance.

graph TD CAP((CAP Theorem)) C[Consistency
All nodes see same data] A[Availability
Every request gets a response] P[Partition Tolerance
System works despite network splits] CAP --- C CAP --- A CAP --- P CP[CP Systems
MongoDB, HBase, Redis Cluster] AP[AP Systems
Cassandra, DynamoDB, CouchDB] CA[CA Systems
Single-node RDBMS
Not viable in distributed systems] C --- CP C --- CA A --- AP A --- CA P --- CP P --- AP

In practice, partition tolerance is not optional. Networks fail. Packets get lost. Data centers lose connectivity. Any distributed system must tolerate partitions. The real choice is between consistency and availability during a partition:

CP systems (Consistency + Partition Tolerance): When a network partition occurs, the system refuses to serve requests that might return stale data. It sacrifices availability to maintain consistency. Example: MongoDB in its default configuration will reject writes to a minority partition.
AP systems (Availability + Partition Tolerance): When a partition occurs, the system continues serving requests, but different nodes may return different data. It sacrifices consistency to remain available. Example: Cassandra continues accepting writes on both sides of a partition and reconciles later.
CA systems (Consistency + Availability): This combination requires no partitions, which means a single node or a network that never fails. It does not exist in real distributed systems. A single PostgreSQL server is "CA" only because it is not distributed.

The CAP theorem does not say you must permanently give up consistency or availability. It says that during a network partition, you must choose which one to sacrifice. When the network is healthy, you can have both.

Putting It Together

These terms are not independent. They interact. A system that prioritizes strong consistency may sacrifice availability during partitions. A system designed for high availability may accept eventual consistency. Eliminating SPOFs improves fault tolerance, which improves availability. Horizontal scaling enables higher throughput but makes strong consistency harder to achieve.

Understanding these terms and their tradeoffs is the foundation of every design decision you will make in the rest of this course. When someone says "we need 99.99% availability," you should immediately think about what that costs in terms of consistency, complexity, and infrastructure.

Assignment

Without looking at the session content, define each of the following terms in your own words. One or two sentences each.

Scalability
Availability
Consistency
Fault Tolerance
Single Point of Failure
Partition Tolerance

After writing your definitions, compare them to the table at the top of this session. Where did your understanding differ? Which term was hardest to define precisely?

Bonus: Pick a service you use daily (Gmail, Spotify, Grab). Based on its behavior, would you classify it as a CP or AP system? What evidence supports your classification?

Origin and Purpose

The Twelve-Factor App methodology was written by Adam Wiggins, co-founder of Heroku, and published in 2011. It emerged from observing hundreds of applications deployed on Heroku's platform and distilling the patterns that separated applications that scaled cleanly from those that broke under pressure.

The methodology is not tied to any language, framework, or cloud provider. It describes twelve principles for building software-as-a-service applications that are portable, resilient, and deployable on modern cloud platforms. Fifteen years later, these principles remain the baseline for cloud-native application design.

The original document lives at 12factor.net and is worth reading in full. This session summarizes each factor, explains why it matters, and identifies the most common way teams violate it.

The Twelve Factors

#	Factor	Principle	Common Violation
1	Codebase	One codebase tracked in version control, many deploys	Separate repos for staging and production with copy-pasted code
2	Dependencies	Explicitly declare and isolate dependencies	Relying on system-level packages that are not in the dependency manifest
3	Config	Store configuration in the environment	Hardcoding database URLs, API keys, or feature flags in source code
4	Backing Services	Treat backing services as attached resources	Assuming the database is on localhost and will always be there
5	Build, Release, Run	Strictly separate build and run stages	SSHing into production to edit code or apply patches directly
6	Processes	Execute the app as one or more stateless processes	Storing user sessions in local memory instead of an external store
7	Port Binding	Export services via port binding	Requiring an external web server (Apache, IIS) to be pre-installed
8	Concurrency	Scale out via the process model	Running everything in a single monolithic process with threads only
9	Disposability	Maximize robustness with fast startup and graceful shutdown	Processes that take minutes to start or lose in-flight work on shutdown
10	Dev/Prod Parity	Keep development, staging, and production as similar as possible	Using SQLite in development but PostgreSQL in production
11	Logs	Treat logs as event streams	Writing logs to local files on disk instead of stdout
12	Admin Processes	Run admin/management tasks as one-off processes	Running database migrations by manually connecting to production

Deep Dive: The Factors That Trip People Up

Factor 3: Config

Configuration is everything that varies between deploys: database credentials, API keys, feature flags, third-party service URLs. The twelve-factor app stores these in environment variables, not in code.

This sounds obvious, but violations are everywhere. A config.py file with DATABASE_URL = "postgres://prod-server:5432/mydb" committed to the repo. A .env file checked into version control. An application that reads from a YAML file baked into the Docker image.

The test is simple: could you open-source the codebase right now without exposing any credentials or environment-specific values? If not, your config is not properly externalized.

Factor 6: Processes

Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a backing service (database, cache, object store). This means no sticky sessions, no in-memory caches that cannot be lost, and no local file storage that other processes need to read.

This factor is what makes horizontal scaling possible. If each process is stateless, you can add or remove instances at will. A load balancer can send any request to any instance. If an instance crashes, no data is lost because there was no data on that instance to begin with.

The most common violation is storing session data in process memory. It works fine with a single server. The moment you add a second server behind a load balancer, users lose their sessions when requests hit a different instance.

Factor 9: Disposability

Processes should start fast and shut down gracefully. Fast startup means new instances can be spun up quickly in response to load. Graceful shutdown means the process finishes in-flight requests, releases resources, and exits cleanly when it receives a SIGTERM signal.

This factor matters because cloud platforms routinely start and stop instances. Auto-scaling groups add and remove instances based on load. Kubernetes reschedules pods across nodes. Spot instances can be terminated with 30 seconds notice. If your process takes five minutes to start or drops connections on shutdown, these operations cause user-facing errors.

Factor 10: Dev/Prod Parity

The gap between development and production environments should be as small as possible. This means the same operating system, the same database engine (not just the same type), the same message queue, and the same cache. Docker and containerization have made this dramatically easier. You define your stack once in a Dockerfile and docker-compose.yml, and every developer runs the same environment.

The classic violation is using in-memory substitutes during development. H2 instead of PostgreSQL. A local directory instead of S3. A synchronous function call instead of a message queue. These substitutions hide bugs that only appear in production, where the real services behave differently.

How the Factors Connect

The twelve factors are not independent. They reinforce each other. Stateless processes (Factor 6) only work if config is externalized (Factor 3) and backing services are treated as attachable resources (Factor 4). Fast startup (Factor 9) requires that dependencies are explicitly declared and isolated (Factor 2) so that the environment can be set up predictably. Dev/prod parity (Factor 10) is easier when config lives in environment variables (Factor 3) rather than in environment-specific files.

graph LR F2[2. Dependencies] --> F9[9. Disposability] F3[3. Config] --> F6[6. Processes] F3 --> F10[10. Dev/Prod Parity] F4[4. Backing Services] --> F6 F6 --> F8[8. Concurrency] F5[5. Build/Release/Run] --> F10 F11[11. Logs] --> F9

When teams violate one factor, the violations tend to cascade. If config is hardcoded, dev/prod parity breaks. If processes are stateful, horizontal scaling fails. If the build and run stages are not separated, you end up patching production directly, which violates disposability because you cannot recreate the environment from scratch.

Twelve-Factor in 2026

The original methodology was written before Docker (2013), Kubernetes (2014), and the serverless movement (2015+). Some factors, like port binding, feel obvious now because modern frameworks default to self-contained HTTP servers. Others, like treating logs as event streams, are baked into platform expectations (CloudWatch, Datadog, and ELK all assume log streams, not log files).

The methodology was open-sourced to evolve with the community. New considerations, such as health check endpoints, circuit breakers, and observability, extend the original twelve factors but do not replace them. The foundation remains solid.

Assignment

Pick any application you have built or worked on. It can be a side project, a work codebase, or even a tutorial project. Score it on each of the twelve factors using this scale:

0 = Factor is violated (e.g., config is hardcoded, logs go to local files)
1 = Partially followed (e.g., most config is externalized but some secrets are in code)
2 = Fully followed

Create a table with columns: Factor, Score (0-2), Evidence (one sentence explaining your score).

What is your total score out of 24?
Which factor has the lowest score? What would it take to fix it?
Which factor was the hardest to evaluate? Why?

Most applications score between 10 and 16 on their first assessment. A perfect 24 is rare. The point is not to achieve perfection but to identify where the gaps are and what risks they create.

Module 1: Architectural Foundations & Core Concepts

The Foundation of Networked Computing

The Request-Response Cycle

Stateless vs. Stateful Interactions

Session Management Strategies

1. Server-Side Sessions with Cookies

2. Token-Based Authentication (JWT)

3. Client-Side Storage

4. Hybrid Approaches

Systems Thinking Lens

Further Reading

Assignment

The Network Stack

IP: Addressing and Routing

DNS: Translating Names to Addresses

TCP vs. UDP: The Transport Layer

HTTP and HTTPS

WebSocket

Forward and Reverse Proxies

Further Reading

Assignment

Two Approaches to API Communication

REST: Representational State Transfer

The Over-Fetching and Under-Fetching Problem

GraphQL: Query What You Need

REST vs. GraphQL: A Comparison

When REST Shines

When GraphQL Shines

The Trade-Off in Systems Terms

Further Reading

Assignment

Beyond REST: Why Other Protocols Exist

gRPC: Binary, Typed, and Fast

The Four gRPC Communication Patterns

WebSocket: Persistent and Bidirectional

How They Differ in Practice

Protocol Comparison Table

Where Each Protocol Fits

Systems Thinking Lens

Further Reading

Assignment

Two Ways to Build Software

The Monolith

Microservices

Comparison Across Dimensions

Conway's Law

When to Start With What

The Modular Monolith: A Middle Path

Further Reading

Assignment

Patterns as Reusable Decisions

Multi-Tier Architecture

Event-Driven Architecture

Microservices Architecture

Cell-Based Architecture

Serverless Architecture

Pattern Comparison

Combining Patterns

Systems Thinking Lens

Further Reading

Assignment

What Is Three-Tier Architecture?

The Three Tiers

Presentation Tier

Application Tier

Data Tier

Tier Responsibilities, Scaling, and AWS Services

Classic AWS Implementation

Serverless Variant

Kubernetes Variant

When Three-Tier Breaks Down

Further Reading

Assignment

Why Serialization Matters

Serialization Formats Compared

JSON (JavaScript Object Notation)

XML (Extensible Markup Language)

Protocol Buffers (Protobuf)

Apache Avro

Format Comparison