SQL vs. NoSQL
Session 3.1 · ~5 min read
Two Models, Two Philosophies
Every application stores data. The question is how that data is organized, queried, and scaled. For decades, relational databases (SQL) dominated. Then, around 2009, a wave of distributed systems gave rise to NoSQL. These are not competing religions. They are engineering tradeoffs. The right choice depends on your data shape, your access patterns, and your consistency requirements.
Relational Databases (SQL)
A relational database stores data in tables with predefined schemas, enforces relationships through foreign keys, and guarantees ACID properties: Atomicity, Consistency, Isolation, Durability. Every write either fully succeeds or fully rolls back.
Relational databases were designed for correctness. A banking system transferring money between accounts needs a guarantee: debit one account and credit the other, or do neither. ACID transactions provide that guarantee. The schema enforces structure. You define columns and types upfront. If you try to insert a string into an integer column, the database rejects it. This strictness prevents entire categories of bugs.
SQL databases excel at complex queries. Joins allow you to combine data from multiple tables in a single query. An e-commerce system can answer "show me all orders from customers in Jakarta who bought more than 3 items last month" with one SQL statement. Try doing that efficiently in a key-value store.
The cost of this power is rigidity. Schema changes (adding a column, changing a type) require migrations. On large tables with billions of rows, migrations can take hours and lock the table. Horizontal scaling is difficult because joins across machines are expensive. Most relational databases scale vertically first, then add read replicas for read-heavy workloads.
Major relational databases: PostgreSQL, MySQL, Oracle, SQL Server, SQLite.
NoSQL Databases
NoSQL databases abandon the rigid table-and-schema model in favor of flexible data structures (documents, key-value pairs, wide columns, or graphs). Most trade strong consistency for horizontal scalability and eventual consistency.
NoSQL emerged to solve problems that relational databases handle poorly: massive write throughput across distributed clusters, highly variable data shapes, and workloads where horizontal scaling matters more than multi-table joins.
A document database like MongoDB stores JSON-like documents. Each document can have different fields. You do not need to run ALTER TABLE to add a new attribute. You just start writing documents with the new field. This flexibility accelerates early development and works well for data that does not fit neatly into rows and columns.
Most NoSQL databases achieve horizontal scalability by partitioning data across nodes. Cassandra, for example, distributes data using consistent hashing. Adding a node automatically rebalances part of the data. This makes scaling to hundreds of nodes straightforward, something relational databases struggle with.
The tradeoff is consistency. Most NoSQL systems offer eventual consistency by default: after a write, replicas will converge to the same state, but a read immediately after a write might return stale data. Some NoSQL databases (like MongoDB with majority write concern, or DynamoDB with strong reads) can provide stronger guarantees at the cost of latency.
Throughput Comparison
Benchmark numbers depend heavily on hardware, configuration, data size, and workload shape. The following chart uses approximate figures from YCSB (Yahoo Cloud Serving Benchmark) studies and vendor benchmarks to illustrate relative throughput ranges. These are order-of-magnitude comparisons, not absolute truths for your specific workload.
Redis dominates because it is an in-memory store. Cassandra's write throughput is high because its LSM-tree storage engine is optimized for sequential writes. PostgreSQL's numbers are lower on raw throughput but it handles complex queries that the others cannot. These numbers shift dramatically with indexing, replication, and query complexity.
Comparison Across Dimensions
| Dimension | SQL (Relational) | NoSQL |
|---|---|---|
| Data model | Fixed schema, tables with rows and columns | Flexible: documents, key-value, wide-column, graph |
| Schema | Schema-on-write (enforced at insert time) | Schema-on-read (interpreted at query time) |
| Query language | SQL (standardized, declarative) | Varies per database (MongoDB query API, CQL, Cypher) |
| Joins | Native, efficient multi-table joins | Generally unsupported or expensive |
| Transactions | Full ACID across multiple tables | Limited (single-document or single-partition in most) |
| Consistency | Strong consistency by default | Eventual consistency by default (tunable in some) |
| Scaling | Primarily vertical; horizontal via read replicas or sharding | Designed for horizontal scaling from the start |
| Schema evolution | ALTER TABLE migrations (can be slow on large tables) | Add fields freely; old documents still valid |
| Best for | Transactional systems, complex queries, data integrity | High write throughput, variable schemas, massive scale |
Polyglot Persistence
Polyglot persistence means using different database technologies for different parts of the same system, matching each data store to the access pattern it handles best.
Most real-world systems do not pick one database for everything. An e-commerce platform might use PostgreSQL for orders and inventory (transactions matter), Redis for session storage and caching (speed matters), Elasticsearch for product search (full-text indexing matters), and a graph database for recommendations (relationship traversal matters).
This is not over-engineering. It is recognizing that no single database excels at every workload. The cost is operational complexity: more systems to monitor, more failure modes, more data synchronization concerns.
Orders, Users, Payments
(ACID transactions)"] API --> Redis["Redis
Sessions, Cache
(sub-ms latency)"] API --> ES["Elasticsearch
Product Search
(full-text indexing)"] API --> Mongo["MongoDB
Product Catalog
(flexible schema)"] API --> Neo4j["Neo4j
Recommendations
(graph traversal)"] PG --- |"Source of truth"| Sync[Change Data Capture] Sync --> ES Sync --> Redis
The diagram above shows a polyglot architecture. PostgreSQL serves as the source of truth for transactional data. Change data capture (CDC) streams changes to Elasticsearch for search indexing and to Redis for cache warming. Each database handles the workload it was designed for.
The ACID vs. BASE Spectrum
SQL databases follow ACID. Most NoSQL databases follow BASE: Basically Available, Soft state, Eventually consistent. These are not binary categories. Many databases sit on a spectrum.
MongoDB now supports multi-document ACID transactions (since version 4.0). DynamoDB offers strongly consistent reads. CockroachDB is a distributed SQL database that provides ACID guarantees across multiple nodes. The boundaries between SQL and NoSQL are blurring. What matters is understanding the default behavior and what guarantees you are paying for in terms of latency and throughput.
Systems Thinking Lens
Choosing a database is a leverage point. Get it right early and the system scales naturally. Get it wrong and you spend months migrating under pressure. The feedback loop is delayed: you pick a database in month one, but the pain of a bad choice shows up in month twelve when traffic grows 10x.
Polyglot persistence creates a balancing loop. Each additional database solves one problem but introduces operational overhead. At some point, the cost of managing five different databases outweighs the performance benefit. The systems thinker asks: where is the point of diminishing returns for our team size and operational maturity?
Further Reading
- Martin Kleppmann, Designing Data-Intensive Applications (O'Reilly, 2017), Chapters 2 and 3. The definitive treatment of data models, storage engines, and the SQL vs. NoSQL landscape.
- Wikipedia, NoSQL. Comprehensive overview of the NoSQL movement, history, and taxonomy.
- Rick Houlihan, Amazon DynamoDB Deep Dive (AWS re:Invent 2018). Masterclass on NoSQL data modeling and access pattern design from the DynamoDB team.
- Percona, PostgreSQL Replication Guide. Practical guide to scaling PostgreSQL with replicas.
- benchANT, Database Performance Ranking. Independent benchmarks comparing database throughput and cost across cloud providers.
Assignment
You are designing a system that stores four types of data. For each one, choose SQL or NoSQL (and name a specific database). Justify your choice in 2-3 sentences, explaining which properties of the workload drove your decision.
- User profiles with fields that vary by user type (individual, business, admin). Some users have 5 fields, others have 50.
- Chat messages in a messaging app with 10 million daily active users. Messages are write-heavy, read by time range, and rarely updated.
- Product catalog for an e-commerce site with 2 million SKUs. Users search by name, category, price range, and attributes that vary by product type.
- Social graph for a platform where users follow each other. Key queries: "who does user X follow?", "who follows user X?", "mutual friends between X and Y."
For each choice, also state the consistency model you would use (strong or eventual) and why.