Capstone: Combined Systems Analysis
Session 9.10 · ~5 min read
The Point of All of This
You have studied systems thinking principles, architectural patterns, databases, caching, reliability, distributed systems, design frameworks, and real-world case studies across nine modules. This final session asks you to do what systems thinkers do: connect things that appear separate.
Every system you designed in Modules 7 and 8 was treated as standalone. The URL shortener existed in isolation. The ride-hailing platform did not share infrastructure with the payment system. The notification service was drawn as a box inside one architecture, not as a shared platform serving many. In reality, large organizations run dozens of these systems simultaneously, and they interact in ways that create emergent behavior, both beneficial and dangerous.
This capstone session combines two case studies from Modules 7-8, maps their interactions through causal loop diagrams (Module 0), identifies shared infrastructure, and locates the leverage points and failure cascades that emerge when systems connect.
Choosing Two Systems
For this walkthrough, we combine the ride-hailing system (Session 7.5) with the payment system that underlies every transaction. In a real organization like Uber or Grab, these are separate engineering teams with separate codebases, separate databases, and separate on-call rotations. But they share infrastructure: the API gateway, the notification service, the identity system, and the observability platform.
The same exercise works with any pair. E-commerce (7.6) and notification (7.7). Video streaming (7.3) and search engine (8.1). Chat (7.4) and collaborative editor (8.4). The goal is not the specific pair. The goal is the practice of seeing across boundaries.
Mapping Shared Infrastructure
When two systems coexist in an organization, they share more than you expect. The table below maps shared components between ride-hailing and payments.
| Shared Component | Ride-Hailing Usage | Payment Usage | Failure Impact |
|---|---|---|---|
| API Gateway | Routes rider/driver requests, handles rate limiting | Routes charge/refund requests, enforces auth | Gateway outage blocks both ride requests and payments simultaneously |
| Identity Service | Authenticates riders and drivers | Authenticates payment tokens and merchant accounts | Auth failure prevents rides from starting and payments from processing |
| Notification Service | Sends ride status updates, ETA, driver arrival | Sends payment receipts, refund confirmations | Notification backlog delays ride updates and payment confirmations |
| Observability Platform | Traces ride matching latency, monitors dispatch SLOs | Traces payment processing latency, monitors charge success rates | Observability outage blinds both teams during incidents |
| Kafka Cluster | Events: ride.requested, ride.matched, ride.completed | Events: payment.initiated, payment.completed, payment.failed | Kafka lag delays ride completion and payment settlement |
| Redis Cluster | Caches driver locations, surge pricing multipliers | Caches user payment methods, idempotency keys | Redis failure causes stale driver locations and duplicate charges |
Six shared components. Each is a potential coupling point. Each creates a path through which a failure in one system can propagate to the other.
Combined Causal Loop Diagram
Session 0.8 introduced causal loop diagrams (CLDs) as a tool for mapping feedback relationships. We now apply that tool across system boundaries. The diagram below shows how ride-hailing and payment systems interact through reinforcing and balancing loops.
Read the diagram carefully. Ride demand increases matching requests, which increases payment load. Payment latency increases ride completion time, which degrades user experience, which reduces demand. This is a balancing loop: the system naturally slows itself down when overloaded. But there is also a reinforcing loop through surge pricing: high demand triggers surge pricing, which attracts more drivers, which increases matching capacity, which increases payment load further.
The Kafka cluster appears as a shared bottleneck. Both ride events and payment events flow through it. Consumer lag in Kafka increases both payment latency and notification delay, creating two separate paths to degraded user experience.
Identifying Leverage Points
Session 0.6 introduced Donella Meadows' concept of leverage points: places in a system where a small intervention produces large effects. In a combined system, leverage points often sit at shared infrastructure boundaries.
| Leverage Point | Type | Intervention | Impact |
|---|---|---|---|
| API Gateway rate limiting | Balancing loop (flow control) | Per-service rate limits prevent one system from consuming all gateway capacity | Prevents ride-hailing traffic spikes from starving payment requests |
| Kafka partition isolation | Buffer (decoupling) | Separate Kafka topics and consumer groups for ride events vs payment events | Consumer lag in ride events does not affect payment processing |
| Circuit breaker on payment service | Balancing loop (damage control) | When payment latency exceeds threshold, queue charges instead of blocking ride completion | Rides complete even when payment is slow; charges settle asynchronously |
| Redis cluster isolation | Buffer (resource separation) | Separate Redis clusters for location data and payment data | Location cache eviction does not affect payment idempotency |
| Notification priority queues | Information flow (prioritization) | Ride status updates get higher priority than payment receipts | Users see "driver arriving" in real time even when receipt delivery is delayed |
Notice that three of these five leverage points are about isolation: preventing one system's load from affecting another. This is the central insight of combined systems analysis. When systems share infrastructure, the most powerful interventions are usually at the boundaries, not inside either system.
Failure Cascades
A failure cascade occurs when a failure in one component propagates through shared infrastructure to cause failures in apparently unrelated components. In the combined ride-hailing + payment system, two cascades stand out.
Cascade 1: Payment gateway timeout. The external payment processor (Stripe, Adyen) experiences elevated latency. Payment service threads block waiting for responses. The thread pool exhausts. Payment service stops responding. The API gateway's connection pool to the payment service fills up. The gateway starts rejecting all requests, including ride-hailing requests that have nothing to do with payments. Riders cannot request rides because the gateway is overwhelmed by backed-up payment connections.
Cascade 2: Kafka cluster degradation. A Kafka broker loses a disk. Partition rebalancing causes temporary consumer lag across all topics. Ride completion events are delayed, which delays payment initiation. Payment events queue up behind the ride events. The notification service, also consuming from Kafka, falls behind. Users see no ride status updates and no payment confirmations. Support ticket volume spikes. The support system, which also uses the shared notification service, adds more load to the already-lagging notification pipeline.
Both cascades follow the same pattern: a single failure crosses a shared boundary and amplifies through feedback loops. The CLD makes these paths visible before they happen in production.
Systems thinking is not a module you complete. It is a lens you keep. Every system you design from here forward will be shaped by how you see connections.
Applying This to Any System Pair
The process is repeatable. For any two systems in the same organization:
- List shared infrastructure. API gateway, databases, caches, message brokers, identity services, observability platforms, CDNs.
- Draw the CLD. Map how load in one system creates load in shared components, and how that affects the other system. Mark reinforcing loops (R) and balancing loops (B).
- Identify leverage points. Look for places where isolation, rate limiting, circuit breaking, or priority ordering can prevent cross-system interference.
- Trace failure cascades. For each shared component, ask: "If this fails, what happens to System A? What happens to System B? How does A's failure mode affect B through shared dependencies?"
- Design interventions. For each cascade, identify the cheapest intervention that breaks the propagation chain.
This is systems thinking applied to system design. It is the skill that separates engineers who build individual services from architects who build organizations.
Further Reading
- Donella Meadows, Leverage Points: Places to Intervene in a System. The foundational essay on leverage points in complex systems.
- Sustainability Methods, System Thinking and Causal Loop Diagrams. Comprehensive guide to constructing and interpreting CLDs.
- Creately, Causal Loop Diagram: How to Visualize and Analyze System Dynamics. Practical tutorial on CLD notation, construction, and analysis.
- Nature Scientific Reports, Using Network Analysis to Identify Leverage Points Based on Causal Loop Diagrams. Research on formal methods for locating leverage points in CLDs.
Assignment
This is the capstone assignment for the entire course.
- Pick two systems from Modules 7-8. Choose systems that would plausibly coexist in the same organization. Suggestions: e-commerce (7.6) + notification (7.7), chat (7.4) + collaborative editor (8.4), video streaming (7.3) + search engine (8.1), ride-hailing (7.5) + ticketing (8.2).
- Map shared infrastructure. Create a table like the one above. Identify at least 5 shared components. For each, describe how both systems use it and what happens when it fails.
- Draw a combined CLD. Map the causal relationships between the two systems. Include at least one reinforcing loop and one balancing loop. Use Mermaid, a whiteboard, or paper. Label every arrow with + or − to show polarity.
- Identify 3 leverage points. For each, describe the intervention, its type (isolation, flow control, prioritization, information flow), and the expected impact.
- Trace 2 failure cascades. For each, describe the initiating failure, the propagation path through shared infrastructure, the impact on both systems, and the intervention that would break the cascade.
This assignment synthesizes material from every module in the course: feedback loops (Module 0), architectural patterns (Module 1), scaling (Module 2), databases and caching (Module 3), reliability (Module 4), distributed systems (Module 5), design methodology (Module 6), and case studies (Modules 7-8). If you can complete it thoroughly, you have internalized the core skill this course teaches: seeing systems as connected wholes, not isolated parts.