Quiz: Enterprise Knowledge Graphs — Core Patterns¶

Test your understanding of canonical entities, hub-and-spoke vs. federated architectures, graph ETL pipelines, schema governance, ontologies, and billion-edge scaling patterns.

1. According to the chapter, which three properties define a well-defined entity in an enterprise knowledge graph?¶

Stable identity, canonical properties, and governed provenance
Hash key, primary index, and partition key
Embedding vector, similarity score, and cluster label
SQL primary key, foreign key, and unique constraint

Show Answer

The correct answer is A. A well-defined entity needs a globally unique persistent identifier (stable identity), a defined set of attributes that mean the same thing across systems (canonical properties), and a recorded history of where its data came from (governed provenance). The other options describe storage and indexing concepts that do not address what makes an entity semantically well-defined.

Concept Tested: Entity

2. Which best describes a hub-and-spoke graph architecture?¶

Each domain system exposes its own graph API, and a query layer routes queries across them
Every node in the graph is replicated to every storage shard
All domain systems write canonical entity data into a central knowledge graph hub that maintains the authoritative copy
A vector store holds embeddings while a relational database holds entities

Show Answer

The correct answer is C. Hub-and-spoke centralizes the authoritative graph in a hub that domain systems ingest into. Option A describes federated architecture. Option B describes replication, not architecture. Option D describes a vector+SQL combination unrelated to the hub-and-spoke pattern.

Concept Tested: Hub-and-Spoke Graph Architecture

3. What is schema drift in a production knowledge graph?¶

The gradual, usually undocumented change in source-system schemas that accumulates over time and silently corrupts the graph
A planned migration from one graph database vendor to another
The use of multiple schema versions in a single ETL pipeline
The intentional rotation of edge types to balance shard load

Show Answer

The correct answer is A. Schema drift is the gradual, often undocumented divergence between the declared schema and what source systems actually produce — new fields added, edge types quietly renamed, and so on. The other options describe deliberate engineering activities, not the silent drift the chapter warns against.

Concept Tested: Schema Drift

4. Why does the chapter recommend setting `valid_to` timestamps on edges instead of simply deleting them when relationships end?¶

Because deleting edges is impossible in most graph databases
Because historical relationships are often as valuable as current ones for decision trace analysis, and valid_to lets queries filter to current state while preserving history
Because deletion violates the closed world assumption
Because timestamps reduce the storage footprint of the edge

Show Answer

The correct answer is B. Setting valid_to preserves the historical relationship for later decision-trace queries while still allowing current-state queries to exclude expired edges. Deletion is supported in graph databases (A is wrong). The closed world assumption is unrelated (C). Adding a timestamp increases, not decreases, edge size (D).

Concept Tested: Stale Edge Detection

5. A finance team wants to trace every approval that authorized exceptions to a specific revenue policy last quarter. Which domain graph is the most directly relevant?¶

Operational log graph
Product catalog graph
Finance data graph
HR data graph

Show Answer

The correct answer is C. The finance data graph captures accounts, transactions, purchase orders, invoices, and approval chains — exactly what an exception-approval trace requires. The operational log graph (A) captures infrastructure events. The product catalog (B) holds SKU hierarchies. The HR graph (D) is needed to look up approvers, but the primary subgraph for an approval trace is finance.

Concept Tested: Finance Data Graph

6. An ingestion engineer is designing the stage that converts source records like `cust_id: 8821-B` from a legacy CRM into a canonical entity ID like `ENT-00441872`. Which stage of the graph ingestion pipeline is this?¶

Extract
Resolve
Transform
Validate

Show Answer

The correct answer is B. Resolve is the stage that maps source-system IDs to canonical entity IDs using the entity resolution index. Extract (A) pulls raw records. Transform (C) converts to graph node/edge format after resolution. Validate (D) checks schema conformance afterward. ID resolution is its own dedicated stage.

Concept Tested: Graph Ingestion Pattern

7. An LLM agent is observed making product recommendations that route purchase orders to suppliers who no longer carry the affected product lines. Which operational failure mode is the most likely root cause?¶

Insufficient graph replication causing read-replica lag
Failure to detect and mark stale edges, so the graph still shows supplier-product relationships that no longer exist
Excessive graph sharding causing cross-shard query slowdowns
Missing provenance metadata on the supplier nodes

Show Answer

The correct answer is B. The LLM is following edges that no longer reflect reality — the classic signature of stale-edge detection failure. Replication lag (A) would cause short-term inconsistency, not persistent wrong-supplier recommendations. Sharding issues (C) cause latency, not wrong answers. Missing provenance (D) is a trust problem but does not by itself cause the wrong edges to appear in traversal.

Concept Tested: Stale Edge Detection

8. How does an ontology differ from a taxonomy?¶

An ontology only expresses parent-child hierarchy; a taxonomy expresses richer relationships
An ontology is a JSON file format; a taxonomy is a YAML file format
An ontology defines types, the relationships between types, the constraints on those relationships, and rules for inference; a taxonomy is a hierarchical classification expressing only the is-a relationship
An ontology is always stored in RDF; a taxonomy is always stored in an LPG

Show Answer

The correct answer is C. Taxonomies express only the is-a hierarchy, while ontologies define types, typed relationships, constraints, and inference rules. Option A reverses the two. Option B invents a file-format distinction. Option D incorrectly ties each formalism to a specific storage model.

Concept Tested: Taxonomy vs Ontology

9. Why does graph sharding require minimizing cross-shard edges, in contrast to relational sharding?¶

Because graph databases cannot physically store edges that span shards
Because every cross-shard edge traversal requires a network hop, which dramatically increases query latency on the traversal-heavy workloads graphs are built for
Because cross-shard edges violate the closed world assumption
Because cross-shard edges break the directionality of the edge

Show Answer

The correct answer is B. Graph queries are traversal-based, so every cross-shard edge becomes a network hop and slows down the multi-hop queries the graph exists to serve. Cross-shard edges are physically storable (A). The closed world assumption (C) is unrelated. Directionality (D) is preserved regardless of shard placement.

Concept Tested: Graph Sharding

10. A platform-business graph has 8 billion supplier-product-transaction edges. Sub-second multi-hop traversal performance is critical, but data freshness is also important and write throughput is heavy. Which combination of techniques should the team prioritize, based on the chapter?¶

Eliminate sharding entirely so every query stays on a single node
Direct LLM retrieval reads to graph replicas (keeping the write path uncongested) and shard so that tightly connected node clusters live on the same shard (minimizing cross-shard traversal)
Convert the graph to RDF triples to take advantage of the open world assumption
Drop all property indexes and rely solely on full-graph scans

Show Answer

The correct answer is B. The chapter pairs graph replication for read scaling (so LLM retrieval does not contend with writes) with cluster-aware sharding to minimize cross-shard traversal. Eliminating sharding (A) is infeasible at 8 billion edges. Switching to RDF (C) contradicts the entire enterprise-LPG argument. Dropping property indexes (D) would make range queries catastrophically slow.

Concept Tested: Graph Replication

Quiz: Enterprise Knowledge Graphs — Core Patterns¶

1. According to the chapter, which three properties define a well-defined entity in an enterprise knowledge graph?¶

2. Which best describes a hub-and-spoke graph architecture?¶

3. What is schema drift in a production knowledge graph?¶

4. Why does the chapter recommend setting valid_to timestamps on edges instead of simply deleting them when relationships end?¶

5. A finance team wants to trace every approval that authorized exceptions to a specific revenue policy last quarter. Which domain graph is the most directly relevant?¶

6. An ingestion engineer is designing the stage that converts source records like cust_id: 8821-B from a legacy CRM into a canonical entity ID like ENT-00441872. Which stage of the graph ingestion pipeline is this?¶

7. An LLM agent is observed making product recommendations that route purchase orders to suppliers who no longer carry the affected product lines. Which operational failure mode is the most likely root cause?¶

8. How does an ontology differ from a taxonomy?¶

9. Why does graph sharding require minimizing cross-shard edges, in contrast to relational sharding?¶

10. A platform-business graph has 8 billion supplier-product-transaction edges. Sub-second multi-hop traversal performance is critical, but data freshness is also important and write throughput is heavy. Which combination of techniques should the team prioritize, based on the chapter?¶

4. Why does the chapter recommend setting `valid_to` timestamps on edges instead of simply deleting them when relationships end?¶

6. An ingestion engineer is designing the stage that converts source records like `cust_id: 8821-B` from a legacy CRM into a canonical entity ID like `ENT-00441872`. Which stage of the graph ingestion pipeline is this?¶