Course → Module 2: Scalability, Load Balancing & API Design

CDN as Geographic Horizontal Scaling

A Content Delivery Network is, at its core, a system for putting copies of your content closer to your users. Instead of every request traveling to a single origin server (which might be thousands of kilometers away), requests are served by edge servers distributed across dozens or hundreds of locations worldwide.

This is horizontal scaling applied geographically. Rather than making one server faster, you add more servers in more places. The result is lower latency, reduced origin load, and better availability during traffic spikes.

The physics are straightforward. Light in fiber travels at roughly 200,000 km/s. A round trip from Jakarta to a server in Virginia (about 16,000 km each way) takes at least 160ms just for the speed of light, before any processing happens. Put an edge server in Singapore (900 km away), and that round trip drops to under 10ms. For a page that requires 20 round trips to fully load, this difference is enormous.

How a CDN Request Works

When a user requests a resource, DNS resolves the domain to the nearest CDN edge server (using anycast routing or geo-DNS). The edge server checks its local cache. If the content is there and still valid, it serves it directly. This is a cache hit. If the content is missing or expired, the edge server fetches it from the origin, caches it, and serves it. This is a cache miss.

graph TD User["User (Jakarta)"] --> DNS["DNS Resolution"] DNS --> Edge["CDN Edge Server
(Singapore)"] Edge --> CacheCheck{"Cache Hit?"} CacheCheck -->|Yes| Serve["Serve from Edge
(~10ms)"] CacheCheck -->|No| Shield["Origin Shield
(if configured)"] Shield --> ShieldCheck{"Shield
Cache Hit?"} ShieldCheck -->|Yes| ServeShield["Serve from Shield"] ShieldCheck -->|No| Origin["Origin Server"] Origin --> Shield ServeShield --> Edge Shield --> Edge Edge --> User style User fill:#2a2a2a,stroke:#c8a882,color:#ede9e3 style DNS fill:#2a2a2a,stroke:#c8a882,color:#ede9e3 style Edge fill:#2a2a2a,stroke:#6b8f71,color:#ede9e3 style CacheCheck fill:#2a2a2a,stroke:#c8a882,color:#ede9e3 style Serve fill:#2a2a2a,stroke:#6b8f71,color:#ede9e3 style Shield fill:#2a2a2a,stroke:#c8a882,color:#ede9e3 style ShieldCheck fill:#2a2a2a,stroke:#c8a882,color:#ede9e3 style ServeShield fill:#2a2a2a,stroke:#6b8f71,color:#ede9e3 style Origin fill:#2a2a2a,stroke:#c8a882,color:#ede9e3

Cache-Control Headers

The origin server controls caching behavior through the Cache-Control HTTP header. This header tells both browsers and CDN edge servers how long to cache a response and under what conditions.

The most important directives:

stale-while-revalidate is one of the most valuable CDN directives. It eliminates the latency spike that occurs when cached content expires. Instead of making the first user after expiry wait for a full origin fetch, it serves stale content instantly and refreshes in the background. For most content, slightly stale is far better than slow.

Cache Invalidation Strategies

Phil Karlton famously said there are only two hard things in computer science: cache invalidation and naming things. In CDN operations, cache invalidation is the act of removing or refreshing content before its TTL expires.

Strategy How It Works Speed Cost Best For
TTL Expiry Content expires naturally based on Cache-Control headers No action needed Zero Content that can tolerate staleness (images, CSS, JS with hashed filenames)
Purge Explicitly delete a specific URL from all edge caches Seconds to minutes API call per URL Urgent corrections, legal takedowns, breaking news
Purge by Tag/Surrogate Key Tag responses with categories, purge all content matching a tag Seconds One API call per tag CMS content updates (purge all "product-123" pages)
Stale-While-Revalidate Serve stale, refresh in background Transparent Zero (built into response) Content where freshness matters but not to the millisecond
Versioned URLs Change the URL when content changes (e.g., style.a1b2c3.css) Instant (new URL, no cache to invalidate) Build pipeline change Static assets (CSS, JS, images)

Versioned URLs deserve special attention. If you hash the file contents into the filename (bundle.8f3a2c.js), you can set max-age=31536000 (one year) because the URL itself changes whenever the content changes. There is nothing to invalidate. This is the most reliable caching strategy for static assets and the one you should use by default.

Origin Shielding

Without origin shielding, every edge server that experiences a cache miss fetches directly from the origin. If you have 200 edge locations and a popular asset expires simultaneously, the origin receives 200 concurrent requests for the same file. This is called a "thundering herd" or "cache stampede."

Origin shielding adds an intermediate cache layer. All edge servers route their cache misses through a single (or small number of) shield servers. The shield checks its own cache first. If it has the content, it serves all the edge servers. If not, only the shield fetches from the origin. This collapses 200 origin requests into 1.

Origin shielding reduces origin load by acting as a funnel between edge servers and the origin. It is especially valuable for origins with limited capacity (shared hosting, legacy systems) or content that expires frequently across many edge locations simultaneously.

Push vs. Pull CDN

In a pull CDN, the edge server fetches content from the origin on demand (on first request or after cache expiry). This is the default model used by Cloudflare, Fastly, and most CDN providers. You do not upload anything to the CDN. It pulls what it needs, when it needs it.

In a push CDN, you upload content to the CDN's storage explicitly. The CDN serves directly from its own storage rather than proxying to your origin. AWS CloudFront with S3 as the origin operates in this model. You control exactly what is on the CDN and when.

Pull CDNs are simpler to operate: point DNS, configure caching headers, and you are done. Push CDNs give you more control and are better for large static assets (video, software downloads) where you want to guarantee the content is available on every edge before users request it.

What to Cache and What Not To

Not everything belongs in a CDN cache. The general rule: cache anything that is the same for all users, and do not cache anything that is personalized or sensitive.

Measuring CDN Performance

The key metric is cache hit ratio: the percentage of requests served from edge cache without touching the origin. A well-configured CDN should achieve 85% or higher for a typical content site. If your ratio is below 60%, something is wrong: either too many unique URLs, too-short TTLs, or headers that prevent caching.

Monitor X-Cache or equivalent headers in your CDN's responses. They tell you whether each request was a hit, miss, or stale revalidation. Use this data to identify caching gaps and optimize your header configuration.

Further Reading

Assignment

You are deploying a web application with the following user distribution:

  • 60% Indonesia
  • 30% Malaysia
  • 10% global (scattered)

Your origin server is in Singapore. The application serves a mix of static assets (images, CSS, JS), public product pages, and authenticated user dashboards.

Design the CDN strategy:

  1. Edge node placement. Where would you place edge nodes (or choose PoP locations from a provider like Cloudflare or AWS CloudFront)? Justify each location based on the user distribution.
  2. Caching rules. For each content type below, specify the Cache-Control header you would set and explain why:
    • Static assets (CSS, JS with hashed filenames)
    • Product images uploaded by sellers
    • Public product listing pages
    • Authenticated user dashboard
    • API response for product search
  3. Origin shielding. Would you enable origin shielding? If yes, where would you place the shield server and why?
  4. Invalidation plan. A seller updates a product image. Describe step by step how the new image reaches users. Which invalidation strategy do you use?