Course → Module 1: Architectural Foundations & Core Concepts

Why Serialization Matters

Every time data moves between systems, it must be converted from an in-memory structure to a byte stream (serialization) and back again (deserialization). This happens on every API call, every database write, every message pushed to a queue. The format you choose for that conversion affects payload size, parsing speed, debugging ease, and cross-language compatibility.

Choosing a serialization format is not a cosmetic decision. At high throughput, the difference between a verbose text format and a compact binary format translates directly into bandwidth costs, latency, and CPU utilization.

Serialization Formats Compared

JSON (JavaScript Object Notation)

JSON is the default format for web APIs. It is human-readable, natively supported in every major programming language, and requires no schema definition to use. You can inspect a JSON payload in a browser, log it as a string, and debug it with your eyes. That convenience comes at a cost: JSON is verbose. Field names are repeated in every object, numbers are stored as text, and there is no built-in support for binary data.

XML (Extensible Markup Language)

XML dominated the enterprise integration era (SOAP, WSDL, XSLT). It supports namespaces, attributes, and complex nested structures. It also carries significant overhead: closing tags, verbose syntax, and larger payloads than JSON for equivalent data. XML is still used in legacy systems, document formats (DOCX, SVG), and configuration files (Maven POM, Android manifests). For new API design, it has been largely replaced by JSON.

Protocol Buffers (Protobuf)

Developed by Google, Protobuf is a binary serialization format that requires a schema definition (a .proto file). Fields are identified by numeric tags rather than string names, which makes payloads compact. Protobuf is 3 to 7 times faster than JSON for serialization and deserialization, and payloads are typically 30 to 50 percent smaller. The tradeoff: you cannot read a Protobuf message without the schema, and both client and server must agree on the schema at compile time.

Apache Avro

Avro is a binary format developed within the Apache Hadoop ecosystem. Like Protobuf, it uses a schema, but the schema is included with the data (or stored in a schema registry). This makes Avro particularly strong for data pipelines where producers and consumers may not share a codebase. Avro files are often the most compact, but serialization and deserialization can use more memory than Protobuf.

Format Comparison

Feature JSON XML Protobuf Avro
Encoding Text Text Binary Binary
Human-readable Yes Yes No No
Schema required No (optional via JSON Schema) No (optional via XSD) Yes (.proto file) Yes (JSON schema)
Payload size Large Largest Small Smallest
Serialization speed Moderate Slow Fast Fast
Schema evolution Manual, fragile Supported via XSD versioning Good (field numbers are stable) Excellent (schema registry)
Best use case Public APIs, web frontends Legacy systems, document formats Internal microservices (gRPC) Data pipelines, Kafka, Hadoop

Web Sessions: Maintaining State in a Stateless Protocol

HTTP is stateless. Every request is independent. The server does not inherently remember who you are between requests. Session management is the set of techniques used to associate a sequence of requests with a single user. There are three dominant approaches.

Cookie-Based Server Sessions

The server generates a random session ID, stores session data (user ID, permissions, cart contents) in server-side storage (memory, Redis, a database), and sends the session ID to the client as an HTTP cookie. On every subsequent request, the browser automatically includes the cookie. The server looks up the session ID and retrieves the associated data.

This approach is simple and secure when implemented correctly. The cookie itself contains no sensitive data, just an opaque identifier. The server controls the session lifecycle: it can invalidate a session immediately by deleting the server-side record. The limitation is that session storage must be shared across all application servers, which requires sticky sessions or a centralized store like Redis.

JWT (JSON Web Token)

With JWT, the server encodes the user's identity and claims into a signed token and sends it to the client. The client stores the token (typically in localStorage or an HTTP-only cookie) and includes it in the Authorization header on each request. The server verifies the token's signature without querying any external store.

JWTs are stateless: no server-side storage is needed. This makes horizontal scaling straightforward because any server instance can verify the token. The tradeoff is that you cannot revoke a JWT before it expires without maintaining a blocklist, which reintroduces server-side state. JWTs also tend to be larger than session cookies (a typical JWT is 800 to 2000 bytes).

sequenceDiagram participant Client participant Server participant Auth as Auth Service Client->>Server: POST /login (credentials) Server->>Auth: Validate credentials Auth-->>Server: User verified Server->>Server: Create JWT (header.payload.signature) Server-->>Client: 200 OK + JWT token Client->>Server: GET /api/data (Authorization: Bearer JWT) Server->>Server: Verify JWT signature Server-->>Client: 200 OK + data

Server-Side Sessions with External Store

A hybrid approach uses cookies to carry a session ID while storing session data in a fast external store like Redis or Memcached. This combines the security of server-side control (immediate revocation) with the scalability of a shared store. Most production web frameworks (Express with connect-redis, Django with Redis backend, Spring Session) support this pattern out of the box.

Session Strategy Comparison

Feature Cookie + Server Session JWT Cookie + External Store (Redis)
State location Server memory or DB Client (token) External cache (Redis)
Scalability Requires sticky sessions or shared store Stateless, any server can verify Scales with Redis cluster
Revocation Immediate (delete session) Difficult (wait for expiry or maintain blocklist) Immediate (delete from Redis)
Payload size Small (session ID only) Large (800+ bytes) Small (session ID only)
Cross-domain Limited by cookie scope Works across domains via headers Limited by cookie scope
Best for Simple single-server apps APIs, microservices, mobile clients Production web apps at scale

Choosing the Right Combination

The serialization format and session strategy are independent choices, but they interact. An API that uses Protobuf for internal service communication might still use JSON for its public-facing endpoints and JWTs for authentication. A data pipeline might use Avro for Kafka messages while the web frontend that triggers the pipeline uses cookie-based sessions and JSON.

The principle is the same in both cases: match the tool to the constraint. Human-readable formats for debugging and external APIs. Binary formats for throughput-sensitive internal paths. Server-side sessions when you need revocation. JWTs when you need stateless verification across services.

Further Reading

Assignment

An API serves 10,000 requests per second. The average JSON response payload is 2 KB. Switching from JSON to Protobuf would reduce payload size by 40%.

  1. Calculate the bandwidth saved per hour after switching to Protobuf. Show your work.
  2. If bandwidth costs $0.09 per GB (AWS data transfer pricing), how much would you save per month (30 days)?
  3. What non-bandwidth costs would you incur to make this switch? Think about developer time, tooling, debugging difficulty, and client compatibility.

Hint: 10,000 req/s * 2 KB * 0.40 savings = bandwidth saved per second. Convert to GB/hour.