Chapters

This textbook is organized into 16 chapters covering 254 concepts across six database paradigms, distributed systems fundamentals, ACID and NewSQL transactions, high-availability architecture, vector search, LLM embeddings, and the ATAM-based database selection framework.

Chapter Overview

The ATAM Method — Introduces the CMU SEI Architecture Tradeoff Analysis Method, utility trees, quality attribute scenarios, and the structured process for making and documenting architectural decisions.
Database Foundations — Covers the core concepts underlying all database systems: data models, schemas, query languages, indexes, storage engines, and workload characterization (OLTP, OLAP, HTAP).
Relational Databases — Explores the relational model, SQL, normalization, indexing, concurrency control (locking and MVCC), write-ahead logging, and canonical products PostgreSQL and MySQL.
Analytical Databases — Covers columnar storage, MPP architectures, data warehousing (Inmon vs. Kimball), OLAP cubes, ETL pipelines, and analytical products including Snowflake and BigQuery.
Key-Value Stores — Examines the key-value data model, hash-based storage, TTL, cache eviction policies, caching patterns, and products Redis and DynamoDB.
Column-Family Databases — Covers wide-column data models, LSM trees, compaction, Bloom filters, partition and clustering keys, and products Apache Cassandra and HBase.
Document Databases — Explores document and JSON storage models, embedded documents, aggregation pipelines, schema flexibility, and products MongoDB and Couchbase.
Graph Databases — Introduces property graph and RDF models, Cypher and SPARQL, traversal algorithms, distributed graph databases including TigerGraph, and knowledge graphs.
CAP Theorem and Consensus Protocols — Covers the CAP theorem, PACELC model, the full spectrum of consistency models from eventual to linearizable, and foundational consensus protocols Paxos and Raft.
ACID Transactions — Examines ACID properties, all four transaction isolation levels, concurrency anomalies, two-phase locking, optimistic concurrency control, and distributed sagas.
Distributed Scaling and Replication — Covers horizontal scaling, sharding strategies, replication topologies (single-leader, multi-leader, leaderless), quorum reads/writes, leader election, and coordination tools ZooKeeper and etcd.
Distributed ACID and NewSQL — Explores two-phase commit, distributed sagas, compensating transactions, and NewSQL databases including Google Spanner, CockroachDB, and YugabyteDB.
High Availability Architecture — Covers five-nines SLA targets, failure domains, active-active and active-passive clustering, failover, chaos engineering, MTBF/MTTR, and geographic redundancy.
Vector Search as a Database Feature — Introduces vector embeddings, similarity metrics, ANN indexes (HNSW, IVF), pgvector, semantic search, and hybrid search across database paradigms.
LLM-Generated Embeddings — Covers transformer architecture, tokenization, pooling strategies, embedding model selection, production embedding pipelines, cost at scale, and re-embedding migration.
Database Selection and Polyglot Persistence — Brings all prior chapters together through ATAM-based database selection frameworks, scoring matrices, polyglot persistence patterns, migration planning, and capstone decision-making exercises.

How to Use This Textbook

Each chapter builds on the concepts introduced in prior chapters. Chapters 1–2 establish the decision framework and database fundamentals. Chapters 3–8 survey the six major database paradigms. Chapters 9–13 cover distributed systems theory and practice. Chapters 14–15 address vector search and AI-generated embeddings. Chapter 16 integrates everything into a structured selection process.

Readers with strong distributed systems backgrounds may skim Chapters 9–12; practitioners already familiar with a specific database paradigm may read that chapter as a review before focusing on the ATAM-based selection content.

Note: Each chapter includes a list of concepts covered and links to prerequisite chapters. All concept dependencies are respected — concepts always appear after their prerequisites.