Session 9.10: Capstone: Combined Systems Analysis

Course → Module 9: Advanced Topics & Emerging Architectures

The Point of All of This

You have studied systems thinking principles, architectural patterns, databases, caching, reliability, distributed systems, design frameworks, and real-world case studies across nine modules. This final session asks you to do what systems thinkers do: connect things that appear separate.

Every system you designed in Modules 7 and 8 was treated as standalone. The URL shortener existed in isolation. The ride-hailing platform did not share infrastructure with the payment system. The notification service was drawn as a box inside one architecture, not as a shared platform serving many. In reality, large organizations run dozens of these systems simultaneously, and they interact in ways that create emergent behavior, both beneficial and dangerous.

This capstone session combines two case studies from Modules 7-8, maps their interactions through causal loop diagrams (Module 0), identifies shared infrastructure, and locates the leverage points and failure cascades that emerge when systems connect.

Choosing Two Systems

For this walkthrough, we combine the ride-hailing system (Session 7.5) with the payment system that underlies every transaction. In a real organization like Uber or Grab, these are separate engineering teams with separate codebases, separate databases, and separate on-call rotations. But they share infrastructure: the API gateway, the notification service, the identity system, and the observability platform.

The same exercise works with any pair. E-commerce (7.6) and notification (7.7). Video streaming (7.3) and search engine (8.1). Chat (7.4) and collaborative editor (8.4). The goal is not the specific pair. The goal is the practice of seeing across boundaries.

Mapping Shared Infrastructure

When two systems coexist in an organization, they share more than you expect. The table below maps shared components between ride-hailing and payments.

Shared Component	Ride-Hailing Usage	Payment Usage	Failure Impact
API Gateway	Routes rider/driver requests, handles rate limiting	Routes charge/refund requests, enforces auth	Gateway outage blocks both ride requests and payments simultaneously
Identity Service	Authenticates riders and drivers	Authenticates payment tokens and merchant accounts	Auth failure prevents rides from starting and payments from processing
Notification Service	Sends ride status updates, ETA, driver arrival	Sends payment receipts, refund confirmations	Notification backlog delays ride updates and payment confirmations
Observability Platform	Traces ride matching latency, monitors dispatch SLOs	Traces payment processing latency, monitors charge success rates	Observability outage blinds both teams during incidents
Kafka Cluster	Events: ride.requested, ride.matched, ride.completed	Events: payment.initiated, payment.completed, payment.failed	Kafka lag delays ride completion and payment settlement
Redis Cluster	Caches driver locations, surge pricing multipliers	Caches user payment methods, idempotency keys	Redis failure causes stale driver locations and duplicate charges

Six shared components. Each is a potential coupling point. Each creates a path through which a failure in one system can propagate to the other.

Combined Causal Loop Diagram

Session 0.8 introduced causal loop diagrams (CLDs) as a tool for mapping feedback relationships. We now apply that tool across system boundaries. The diagram below shows how ride-hailing and payment systems interact through reinforcing and balancing loops.

graph TB RD[Ride Demand] -->|+| MR[Matching Requests] MR -->|+| PL[Payment Load] PL -->|+| PLL[Payment Latency] PLL -->|+| RCT[Ride Completion Time] RCT -->|−| UX[User Experience] UX -->|−| RD MR -->|+| KL[Kafka Event Volume] KL -->|+| CL[Consumer Lag] CL -->|+| PLL CL -->|+| NL[Notification Delay] NL -->|−| UX PL -->|+| GW[Gateway Load] GW -->|+| GWL[Gateway Latency] GWL -->|+| PLL GWL -->|+| RCT SP[Surge Pricing] -->|−| RD SP -->|+| DS[Driver Supply] DS -->|+| MR

Read the diagram carefully. Ride demand increases matching requests, which increases payment load. Payment latency increases ride completion time, which degrades user experience, which reduces demand. This is a balancing loop: the system naturally slows itself down when overloaded. But there is also a reinforcing loop through surge pricing: high demand triggers surge pricing, which attracts more drivers, which increases matching capacity, which increases payment load further.

The Kafka cluster appears as a shared bottleneck. Both ride events and payment events flow through it. Consumer lag in Kafka increases both payment latency and notification delay, creating two separate paths to degraded user experience.

Identifying Leverage Points

Session 0.6 introduced Donella Meadows' concept of leverage points: places in a system where a small intervention produces large effects. In a combined system, leverage points often sit at shared infrastructure boundaries.

Leverage Point	Type	Intervention	Impact
API Gateway rate limiting	Balancing loop (flow control)	Per-service rate limits prevent one system from consuming all gateway capacity	Prevents ride-hailing traffic spikes from starving payment requests
Kafka partition isolation	Buffer (decoupling)	Separate Kafka topics and consumer groups for ride events vs payment events	Consumer lag in ride events does not affect payment processing
Circuit breaker on payment service	Balancing loop (damage control)	When payment latency exceeds threshold, queue charges instead of blocking ride completion	Rides complete even when payment is slow; charges settle asynchronously
Redis cluster isolation	Buffer (resource separation)	Separate Redis clusters for location data and payment data	Location cache eviction does not affect payment idempotency
Notification priority queues	Information flow (prioritization)	Ride status updates get higher priority than payment receipts	Users see "driver arriving" in real time even when receipt delivery is delayed

Notice that three of these five leverage points are about isolation: preventing one system's load from affecting another. This is the central insight of combined systems analysis. When systems share infrastructure, the most powerful interventions are usually at the boundaries, not inside either system.

Failure Cascades

A failure cascade occurs when a failure in one component propagates through shared infrastructure to cause failures in apparently unrelated components. In the combined ride-hailing + payment system, two cascades stand out.

Cascade 1: Payment gateway timeout. The external payment processor (Stripe, Adyen) experiences elevated latency. Payment service threads block waiting for responses. The thread pool exhausts. Payment service stops responding. The API gateway's connection pool to the payment service fills up. The gateway starts rejecting all requests, including ride-hailing requests that have nothing to do with payments. Riders cannot request rides because the gateway is overwhelmed by backed-up payment connections.

Cascade 2: Kafka cluster degradation. A Kafka broker loses a disk. Partition rebalancing causes temporary consumer lag across all topics. Ride completion events are delayed, which delays payment initiation. Payment events queue up behind the ride events. The notification service, also consuming from Kafka, falls behind. Users see no ride status updates and no payment confirmations. Support ticket volume spikes. The support system, which also uses the shared notification service, adds more load to the already-lagging notification pipeline.

Both cascades follow the same pattern: a single failure crosses a shared boundary and amplifies through feedback loops. The CLD makes these paths visible before they happen in production.

Systems thinking is not a module you complete. It is a lens you keep. Every system you design from here forward will be shaped by how you see connections.

Applying This to Any System Pair

The process is repeatable. For any two systems in the same organization:

List shared infrastructure. API gateway, databases, caches, message brokers, identity services, observability platforms, CDNs.
Draw the CLD. Map how load in one system creates load in shared components, and how that affects the other system. Mark reinforcing loops (R) and balancing loops (B).
Identify leverage points. Look for places where isolation, rate limiting, circuit breaking, or priority ordering can prevent cross-system interference.
Trace failure cascades. For each shared component, ask: "If this fails, what happens to System A? What happens to System B? How does A's failure mode affect B through shared dependencies?"
Design interventions. For each cascade, identify the cheapest intervention that breaks the propagation chain.

This is systems thinking applied to system design. It is the skill that separates engineers who build individual services from architects who build organizations.

Assignment

This is the capstone assignment for the entire course.

Pick two systems from Modules 7-8. Choose systems that would plausibly coexist in the same organization. Suggestions: e-commerce (7.6) + notification (7.7), chat (7.4) + collaborative editor (8.4), video streaming (7.3) + search engine (8.1), ride-hailing (7.5) + ticketing (8.2).
Map shared infrastructure. Create a table like the one above. Identify at least 5 shared components. For each, describe how both systems use it and what happens when it fails.
Draw a combined CLD. Map the causal relationships between the two systems. Include at least one reinforcing loop and one balancing loop. Use Mermaid, a whiteboard, or paper. Label every arrow with + or − to show polarity.
Identify 3 leverage points. For each, describe the intervention, its type (isolation, flow control, prioritization, information flow), and the expected impact.
Trace 2 failure cascades. For each, describe the initiating failure, the propagation path through shared infrastructure, the impact on both systems, and the intervention that would break the cascade.

This assignment synthesizes material from every module in the course: feedback loops (Module 0), architectural patterns (Module 1), scaling (Module 2), databases and caching (Module 3), reliability (Module 4), distributed systems (Module 5), design methodology (Module 6), and case studies (Modules 7-8). If you can complete it thoroughly, you have internalized the core skill this course teaches: seeing systems as connected wholes, not isolated parts.

Capstone: Combined Systems Analysis