Quiz: Modern Databases and Lakehouses¶

Test your fluency across the modern database categories and the warehouse, lake, and lakehouse patterns.

1. The acronym OLTP refers to:¶

Online Library and Text Processing
Online Transaction Processing — short, frequent reads and writes that keep the business running
Open Logical Table Protocol
Offline Long-Term Persistence

Show Answer

The correct answer is B. OLTP (Online Transaction Processing) describes workloads with many short, concurrent reads and writes — the workloads that run the day-to-day business (order entry, banking transactions, inventory updates). OLAP (Online Analytical Processing) describes the opposite workload pattern — large analytical queries over historical data. The two have very different storage and indexing requirements.

Concept Tested: OLTP vs OLAP

2. In a star schema, what does the fact table contain?¶

Numerical measurements (such as sales amounts) and foreign keys to dimension tables
Descriptive attributes about customers, products, and dates
Only metadata about the database
Encrypted historical archives

Show Answer

The correct answer is A. The fact table holds the quantitative measurements of business events (sales amount, units sold, profit) along with foreign keys to surrounding dimension tables that provide descriptive context (which customer, which product, which date). Dimensions are the "by what" columns; facts are the "how much" columns.

Concept Tested: Fact Table

3. A "Slowly Changing Dimension" handles which problem?¶

How to make queries run faster on small dimension tables
How to compress dimension tables for cold storage
How to track changes to dimension attributes over time, such as a customer changing address
How to convert relational tables to JSON documents

Show Answer

The correct answer is C. Slowly Changing Dimensions (SCD) address how to track attribute changes that happen infrequently — a customer moves, a product is reclassified. Type 1 overwrites; Type 2 keeps history with effective dates; Type 3 keeps a limited prior value. Choosing the right SCD type for each attribute is a core dimensional-modeling decision.

Concept Tested: Slowly Changing Dimension

4. Which database category is best suited for a use case requiring extremely fast, simple lookups by exact key — for example, a session cache for a high-traffic web application?¶

Relational database
Graph database
Key-value store
Data warehouse

Show Answer

The correct answer is C. Key-value stores like Redis and DynamoDB excel at extremely fast, simple lookups by an exact key. Session caches, leaderboards, and rate-limiter counters all fit naturally. Relational databases handle richer queries but are slower for pure key lookup; graph databases optimize for relationship traversal; data warehouses optimize for analytical queries.

Concept Tested: Key-Value Store

5. A document database like MongoDB is most appropriate when:¶

The data has a fixed schema enforced rigidly across all rows
Queries primarily traverse many-to-many relationships
The system requires Serializable isolation across millions of small transactions
Data is naturally hierarchical and varies in shape across records, such as product catalogs with category-specific attributes

Show Answer

The correct answer is D. Document databases store records as JSON-like documents, naturally fitting hierarchical data with variable structure — product catalogs with category-specific attributes, content-management systems, user-profile data. Rigid schemas favor relational; many-to-many traversals favor graph; strict isolation across millions of transactions tends toward relational with mature concurrency control.

Concept Tested: Document Database

6. The CAP theorem states that a distributed data store can simultaneously guarantee at most two of which three properties?¶

Concurrency, Auditability, Performance
Consistency, Availability, Partition tolerance
Compression, Atomicity, Persistence
Caching, Aggregation, Pipelining

Show Answer

The correct answer is B. The CAP theorem (Brewer) holds that in the presence of a network partition, a distributed data store must choose between consistency (every read sees the latest write) and availability (every request gets a response). Since partitions are inevitable in real distributed systems, the practical choice is between CP and AP behavior during partitions. The other lists are not the CAP properties.

Concept Tested: CAP Theorem

7. A retail company uses a relational database for transactions, a search engine for product search, a cache for sessions, and a graph database for product recommendations. This architectural pattern is called:¶

Polyglot persistence
Database normalization
Star schema
Two-phase commit

Show Answer

The correct answer is A. Polyglot persistence is the deliberate use of multiple database technologies, each chosen for the workload it serves best. The tradeoff is operational complexity (more systems to run, more integration work) against fit-for-purpose performance and developer experience. Normalization, star schemas, and two-phase commit are unrelated concepts.

Concept Tested: Polyglot Persistence

8. What is the most accurate description of a data lakehouse?¶

A traditional relational warehouse with no support for unstructured data
A small subset of a data lake limited to one department's use
An architecture that combines the open, low-cost storage of a data lake with the schema, ACID guarantees, and SQL performance traditionally associated with a warehouse
A physical building where data engineers work

Show Answer

The correct answer is C. A lakehouse (Delta Lake, Iceberg, Hudi) layers schema, transactions, and SQL performance on top of object-storage data lakes, eliminating the historical split between cheap-but-messy lakes and expensive-but-rigorous warehouses. The result is one system serving both raw and curated data with consistent governance and good query performance.

Concept Tested: Data Lakehouse

Relational database with many JOINs
Graph database
Key-value store
Columnar warehouse

Show Answer

The correct answer is B. Graph databases (Neo4j, Amazon Neptune, TigerGraph) are designed for traversing connected data efficiently — friend-of-friend queries that take many JOINs in a relational system can run in milliseconds in a graph database. Relational tables struggle with deep, variable-depth traversals; key-value stores have no concept of relationships; columnar warehouses optimize for analytical scans, not graph traversals.

Concept Tested: Database Selection

10. An analytics team selects Snowflake for a data warehouse over a traditional row-store database. Which property of columnar storage most directly drives this choice?¶

Columnar storage scans only the columns the query needs and compresses better, dramatically speeding analytical queries that aggregate a few columns over many rows
Columnar storage eliminates the need for any SQL knowledge
Columnar storage is required for transactional integrity
Columnar storage is the only format that supports JSON

Show Answer

The correct answer is A. Columnar storage organizes data by column rather than by row, which means analytical queries that touch a few columns over many rows scan vastly less data and compress far better. This is why columnar systems (Snowflake, Redshift, BigQuery) are dominant for OLAP workloads. Row stores remain better for OLTP. The other options misstate columnar storage's purpose.

Concept Tested: Columnar Storage

Quiz: Modern Databases and Lakehouses¶

1. The acronym OLTP refers to:¶

2. In a star schema, what does the fact table contain?¶

3. A "Slowly Changing Dimension" handles which problem?¶

4. Which database category is best suited for a use case requiring extremely fast, simple lookups by exact key — for example, a session cache for a high-traffic web application?¶

5. A document database like MongoDB is most appropriate when:¶

6. The CAP theorem states that a distributed data store can simultaneously guarantee at most two of which three properties?¶

7. A retail company uses a relational database for transactions, a search engine for product search, a cache for sessions, and a graph database for product recommendations. This architectural pattern is called:¶

8. What is the most accurate description of a data lakehouse?¶

9. A team needs to store and query connections between millions of users in a social network — friend-of-friend queries, shortest paths, community detection. Which database category is best fit?¶

10. An analytics team selects Snowflake for a data warehouse over a traditional row-store database. Which property of columnar storage most directly drives this choice?¶