Quiz: Data Engineering and Infrastructure¶
Test your understanding of data mesh architecture, data products and contracts, SLAs and observability, streaming and orchestration tools, multi-model integration, change data feeds, graph-derived features, and the silent-data-failure footgun that threatens context graph trustworthiness.
1. Which three data flows does the chapter say a context graph data engineering stack must support?¶
- North-south, east-west, internal
- The write flow (capturing source events into graph mutations), the read flow (retrieving subgraphs and serving context to AI), and the governance flow (tracking lineage, enforcing quality, producing audit records)
- Ingest, process, archive
- Production, staging, development
Show Answer
The correct answer is B. The chapter names exactly these three flows. The other options name unrelated infrastructure categories.
Concept Tested: Context Graph Observability
2. In a data mesh for context graphs, which set of three layers does the chapter describe?¶
- Domain data products owned by domain teams, a central context graph platform owned by the platform team, and federated governance (often the Center of Excellence) that sets and enforces cross-domain standards
- Frontend, middleware, backend
- Cache, database, archive
- Public, private, partner
Show Answer
The correct answer is A. The chapter names exactly these three layers and their distinct responsibilities. The other options describe unrelated architecture categories.
Concept Tested: Data Mesh
3. A data contract for a context graph data product specifies four things, according to the chapter. Which set names them correctly?¶
- Schema, freshness SLA, completeness guarantee, and quality assertions
- Vendor, price, support hours, renewal date
- CPU, RAM, disk, network
- Author, version, license, copyright
Show Answer
The correct answer is A. The chapter lists these four contract elements. The other options describe contractual or system attributes that are not data-contract specifications.
Concept Tested: Data Contract
4. A streaming pipeline ingests a decision-trace event that references an entity not yet present in the graph because the events arrived out of order. According to the chapter, what is the correct handling?¶
- Silently drop the event
- Either buffer the event until all referenced entities are present, or apply the update speculatively and reconcile when the referenced entity arrives — the streaming processing platform must keep the graph internally consistent at all times even under concurrent multi-stream updates
- Reject every out-of-order event
- Switch the entire pipeline to batch mode
Show Answer
The correct answer is B. The chapter prescribes buffer-or-reconcile to preserve consistency. The other options either lose data (A, C) or abandon the streaming pattern entirely (D).
Concept Tested: Streaming Graph Update
5. The chapter identifies four types of system states distinguishable by combined infrastructure, data quality, and AI output monitoring. Which state does it call out as the most dangerous and explain why?¶
- Infrastructure failure — because dashboards turn red
- Fully operational — because complacency sets in
- AI model drift — because models are expensive to retrain
- Silent data failure — infrastructure healthy (green dashboards), data quality degraded (stale sources, missing fields, broken entity links), and AI output silently wrong; without data-quality monitoring this can persist for weeks before users notice
Show Answer
The correct answer is D. The chapter explicitly calls silent data failure the most dangerous state and explains exactly why. The other options are visible or less dangerous.
Concept Tested: Data Observability
6. Why does the chapter recommend change data feeds (reading the source database transaction log) over polling-based extraction for context graph freshness?¶
- Because polling is faster
- Completeness (the transaction log captures every write, including ones that would be overwritten between poll intervals), low source-system impact (read-only, non-blocking), and order preservation (events arrive in the source's exact processing order — essential for correctly reconstructing decision histories)
- Because CDC eliminates the need for governance
- Because polling requires a graph database
Show Answer
The correct answer is B. The chapter lists exactly these three CDC advantages. The other options misstate the trade-off.
Concept Tested: Change Data Feed
7. A context graph deployment needs to store full contract PDFs, transcripts, and policy documents alongside the decision-trace metadata. According to the chapter, which integration pattern handles this best?¶
- Store the documents as base64 strings on decision-trace nodes
- Discard the documents and rely on summaries
- Document store integration — the graph holds decision-trace metadata and a document reference (ID, title, sections); the document store holds the full content; document-level access control is applied separately from graph-traversal access — both systems share a consistent entity-linking ID so a traversal can fetch the document without a second identifier mapping
- A single relational table containing both graph nodes and document blobs
Show Answer
The correct answer is C. The chapter prescribes this exact pattern, including the access-control rationale. The other options either bloat graph nodes (A), lose information (B), or undo the benefits of specialized storage (D).
Concept Tested: Document Store Integration
8. A team trains a model to predict which decisions will require escalation. They need features as they existed when each historical decision was made, not as they look today. According to the chapter, what makes the context graph particularly well-suited to serve as the training feature store?¶
- Because the context graph has bitemporal versioning (from Chapter 13), it can return point-in-time correct features for any historical date — answering "what did the precedent graph look like on this specific date for this specific entity?" — a query flat feature stores cannot answer without enormous complexity
- Because the context graph is faster than a relational database
- Because the context graph caches embeddings
- Because the context graph is owned by the data science team
Show Answer
The correct answer is A. The chapter explicitly cites bitemporal versioning as the property that makes the knowledge-graph-as-feature-store pattern uniquely valuable for training. The other options misstate the property.
Concept Tested: Knowledge Graph as Feature Store
9. The chapter recommends graph-based recommendation over pure embedding-based similarity for context graph applications. What advantage does it cite?¶
- Embeddings are mathematically incorrect
- Graph-based recommendations are inherently explainable — "these three precedents are recommended because they involved the same customer, the same exception type, and were approved by the same authority" is a traceable explanation; "these three decisions have cosine similarity 0.87 in the embedding space" is not
- Graph databases are always faster than vector indexes
- Embeddings are not allowed under GDPR
Show Answer
The correct answer is B. The chapter cites explainability as the key advantage. The other options misstate technical or legal claims.
Concept Tested: Graph-Based Recommendation
10. A context graph team observes user-reported low confidence in retrieval results, despite all infrastructure dashboards showing green. Which observability gap is the most likely culprit, and what is the structural fix?¶
- The team is missing data quality layer monitoring (semantic correctness — required fields populated, entity references resolving, source freshness, schema conformance) and AI output layer monitoring (retrieval relevance, coverage, user feedback); the structural fix is to add automated data-quality assertions that run on every pipeline execution and continuous AI-output monitoring, not just infrastructure dashboards
- The graph database is broken
- The LLM weights need retraining
- The vector index needs more dimensions
Show Answer
The correct answer is A. The chapter calls this exact gap silent data failure and prescribes adding data-quality and AI-output layers atop infrastructure monitoring. The other options jump to conclusions inconsistent with the green infrastructure signal.
Concept Tested: Data SLA