Skip to content

Concept Taxonomy: The Right Database

Total Categories: 14 Total Concepts: 254

Categories

1. ATAM — Architecture Tradeoff Analysis Method

TaxonomyID: ATAM Concepts: 22 (8.7%)

Concepts related to the CMU SEI ATAM process: utility trees, quality attribute scenarios, sensitivity and tradeoff points, architectural risks, stakeholder roles, and ATAM output artifacts.

2. FOUND — Foundation Concepts

TaxonomyID: FOUND Concepts: 18 (7.1%)

Core database and systems concepts that underpin all paradigms: DBMS, data models, schemas, query languages, indexes, storage engines, workload characterization, OLTP/OLAP/HTAP, impedance mismatch, and operational complexity.

3. REL — Relational Databases

TaxonomyID: REL Concepts: 20 (7.9%)

Relational model, SQL, normalization, joins, B-tree indexes, MVCC, locking, write-ahead logging, stored procedures, and canonical products (PostgreSQL, MySQL).

4. ANAL — Analytical Databases

TaxonomyID: ANAL Concepts: 15 (5.9%)

Columnar storage, MPP, data warehousing, star/snowflake schemas, OLAP cubes, ETL pipelines, Inmon vs. Kimball architectures, and analytical database products (Snowflake, BigQuery, Parquet).

5. KV — Key-Value Stores

TaxonomyID: KV Concepts: 12 (4.7%)

Key-value data model, hash table storage, TTL, cache eviction policies, caching patterns (read-through, write-through, write-behind), hot key problem, and products (Redis, DynamoDB).

6. COL — Column-Family Databases

TaxonomyID: COL Concepts: 12 (4.7%)

Column-family and wide-row data models, LSM trees, compaction, Bloom filters, partition/clustering keys, write-optimized storage, read/write amplification, and products (Cassandra, HBase).

7. GRAPH — Graph Databases

TaxonomyID: GRAPH Concepts: 15 (5.9%)

Property graph and RDF models, Cypher and SPARQL query languages, traversal algorithms, distributed graph databases, graph partitioning, knowledge graphs, and products (Neo4j, TigerGraph, Amazon Neptune).

8. DOC — Document Databases

TaxonomyID: DOC Concepts: 12 (4.7%)

Document and JSON storage models, BSON, embedded vs. referenced documents, aggregation pipelines, schema flexibility, text search, change streams, and products (MongoDB, Couchbase).

9. DIST — Distributed Consistency

TaxonomyID: DIST Concepts: 20 (7.9%)

CAP theorem, PACELC model, consistency models (eventual, strong, causal, linearizable, serializable), BASE semantics, conflict resolution, vector clocks, and gossip protocols.

10. ACID — ACID Transactions

TaxonomyID: ACID Concepts: 18 (7.1%)

ACID properties, transaction isolation levels (read uncommitted through serializable), dirty/phantom/non-repeatable reads, two-phase locking, MVCC, optimistic concurrency, sagas, rollback, and commit protocols.

11. NACID — Distributed ACID and NewSQL

TaxonomyID: NACID Concepts: 16 (6.3%)

Two-phase commit, three-phase commit, distributed sagas (orchestration and choreography), compensating transactions, NewSQL databases, consensus protocols (Paxos, Raft, ZAB), and products (Spanner, CockroachDB, YugabyteDB).

12. SCALE — Distributed Scaling

TaxonomyID: SCALE Concepts: 18 (7.1%)

Horizontal and vertical scaling, sharding strategies (range, hash, directory, geographic), replication topologies (single-leader, multi-leader, leaderless), quorum reads/writes, replication lag, split-brain, leader election, and coordination (ZooKeeper, etcd).

13. HA — High Availability

TaxonomyID: HA Concepts: 15 (5.9%)

Five-nines SLA, SLA decomposition, failure domains, single points of failure, active-active vs. active-passive clustering, failover, chaos engineering, MTBF/MTTR, geographic redundancy, multi-region deployment, and circuit breaker pattern.

14. VEC — Vector Search and Embeddings

TaxonomyID: VEC Concepts: 14 (5.5%)

Vector embeddings, similarity metrics (cosine, dot product, Euclidean), ANN indexes (HNSW, IVF, flat), pgvector, semantic and hybrid search, ANN recall vs. speed tradeoffs, native vector search as a database feature.

15. LLM — LLM Embeddings

TaxonomyID: LLM Concepts: 15 (5.9%)

Large language models, transformer architecture, tokenization, attention mechanism, pooling strategies (CLS, mean), embedding model selection, OpenAI Embeddings API, Sentence Transformers, self-hosted models, embedding pipelines, cost at scale, re-embedding migration, multimodal embeddings, and model versioning.

16. SEL — Database Selection Framework

TaxonomyID: SEL Concepts: 12 (4.7%)

Polyglot persistence, database selection frameworks, scoring matrices, total cost of ownership, vendor lock-in risk, migration planning, schema migration, multi-model databases, operational runbooks, team expertise, deprecation risk, and data access pattern analysis.