Chapter 21: Data Engineering and Infrastructure¶
Welcome to Chapter 21!
The previous chapters established what context graphs do and how organizations adopt them. This chapter examines the infrastructure underneath — the data engineering patterns, streaming platforms, orchestration tools, and observability systems that keep context graphs accurate, fresh, and trustworthy at scale. Nexus has seen context graphs fail not because the graph was wrong but because the data feeding it was stale, missing, or inconsistent. Infrastructure is not glamorous, but it is what separates a working system from a broken one. Let's trace the pipeline.
The Data Engineering Foundation¶
A context graph is only as good as the data flowing into it. The most sophisticated graph schema and retrieval algorithm cannot compensate for source data that is stale, incomplete, or inconsistent. Data engineering for context graphs is therefore not a supporting activity — it is a load-bearing layer of the architecture.
Context graphs impose specific data engineering requirements that differ from traditional data warehousing. In a data warehouse, the primary design goal is analytical query performance: how quickly can I aggregate millions of rows along multiple dimensions? In a context graph, the primary goals are freshness (how recently was this decision trace updated?), completeness (does this trace include all the context elements that existed at decision time?), and traversal efficiency (how quickly can I retrieve a subgraph of related decisions?). These goals shape the entire data engineering stack.
Before examining specific tools and patterns, it is useful to understand the three data flows that a context graph data engineering stack must support:
The write flow — capturing decision events, policy updates, and entity changes from source systems and converting them into graph node and edge insertions or updates. Write flow performance determines how quickly new information appears in the graph.
The read flow — retrieving subgraphs, traversing precedent chains, and serving context to AI inference pipelines. Read flow performance determines the latency of context graph queries as seen by end users and AI agents.
The governance flow — tracking data lineage, enforcing data quality constraints, detecting anomalies, and producing audit records that satisfy compliance requirements. Governance flow correctness determines whether the context graph is trustworthy enough for regulated use cases.
Data Mesh Architecture¶
The Data Mesh is an organizational and architectural approach to data infrastructure that addresses one of the fundamental scaling problems of centralized data platforms: as the number of data sources grows, the central data team becomes a bottleneck for onboarding new sources, maintaining pipelines, and ensuring data quality.
A data mesh decentralizes data ownership: each domain team (finance, sales, engineering, customer success) owns and operates the data pipelines for their domain, following organization-wide standards enforced by a central platform team. The central platform team provides the infrastructure and the standards; the domain teams provide the domain expertise and the operational responsibility.
For context graph deployments, the data mesh model is a natural fit. Context graph data sources are domain-specific — financial decisions live with the finance domain, sales decisions live with the sales domain — and the people who best understand the quality requirements and exception patterns for that data are in the domain team. A centralized data engineering team trying to build and maintain pipelines for every domain's decision data will produce generic pipelines that miss domain-specific nuances.
The data mesh model for context graphs involves three layers:
Domain data products — each domain team produces a well-defined data product (described in the next section) that represents their decisions and entities in a format the context graph can ingest. The domain team owns the quality and freshness of their data product.
Central graph platform — the central platform team operates the context graph infrastructure: the graph database, the query layer, the AI integration, and the observability stack. The platform team defines the schema standards and ingestion protocols that all domain data products must conform to.
Federated governance — a cross-domain governance function, typically the Center of Excellence described in the previous chapter, sets and enforces the standards that span domains: data retention policies, access control requirements, audit trail standards. The governance function does not own any specific data product — it owns the rules that all products must follow.
Nexus Thinks About Data Mesh Trade-offs
Data mesh is not a free lunch. Decentralizing data ownership requires domain teams to develop data engineering competence they may not currently have. The organizational transformation to data mesh can take longer than the technical transformation. For smaller organizations (fewer than 500 employees), a centralized data team with strong domain partnerships is often more practical than a full data mesh. Data mesh scales well for organizations where more than five domains each generate substantial decision data.
Data Products and Data Contracts¶
A Data Product is a self-contained, discoverable, and trustworthy data asset that a domain team produces for consumption by other teams or systems. The term "product" is intentional — it implies that the producing team has the same obligations to consumers that a product team has to customers: clear documentation, stable interfaces, quality guarantees, and responsive support.
In the context of context graph data engineering, each domain team's data product is a structured stream or batch of decision events, entity updates, and policy changes that conforms to the context graph's ingestion schema. The data product is not raw database records — it is a curated, validated, and documented dataset that the context graph platform can ingest without requiring domain-specific transformation logic.
A Data Contract is the formal agreement between the data product producer and its consumers. A data contract for a context graph source specifies:
Schema. The exact structure of the data product's records, including field names, data types, required versus optional fields, and the meaning of each field. Schema changes require advance notice and a migration plan.
Freshness SLA. How recently the data product's records reflect the state of the source system. A decision event should appear in the data product within a specified time window (e.g., 15 minutes of the decision being recorded in the source system).
Completeness guarantee. The proportion of source events that are expected to appear in the data product. A completeness guarantee of 99.9% means that no more than 0.1% of source events are expected to be missing due to pipeline failures.
Quality assertions. Specific validation rules that the data product guarantees to satisfy: no null values in required fields, no future-dated event timestamps, no entity references that do not resolve to a known entity in the entity registry.
Data contracts are typically implemented as code — a schema definition in a schema registry plus a set of automated quality checks that run against each data product batch or stream before it is accepted by the central platform.
#### Diagram: Data Mesh for Context Graphs
Sim Spec:
- Type: vis-network diagram
- sim-id: data-mesh-context-graph
- Library: vis-network
- Status: not started
- Bloom Level: Analysis
- Bloom Verb: analyze
- Learning Objective: Analyze how a data mesh architecture distributes ownership of context graph data sources while maintaining central platform governance
- Instructional Rationale: A multi-layer network diagram makes the domain-platform-governance relationship explicit, helping learners understand how decentralization and standardization coexist
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Data Mesh for Context Graphs</title>
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<style>
body { margin: 0; background: #0d1117; color: #e6edf3; font-family: sans-serif; }
#network { width: 100%; height: 520px; border: 1px solid #30363d; }
</style>
</head>
<body>
<div id="network"></div>
<script>
const nodes = new vis.DataSet([
// Domain teams
{ id: 1, label: "Finance\nDomain Team", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -360, y: -160 },
{ id: 2, label: "Sales\nDomain Team", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -360, y: 0 },
{ id: 3, label: "Engineering\nDomain Team", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -360, y: 160 },
// Data products
{ id: 4, label: "Finance\nData Product", color: { background: "#312d00", border: "#d29922" }, font: { color: "#e6edf3" }, shape: "ellipse", x: -120, y: -160 },
{ id: 5, label: "Sales\nData Product", color: { background: "#312d00", border: "#d29922" }, font: { color: "#e6edf3" }, shape: "ellipse", x: -120, y: 0 },
{ id: 6, label: "Engineering\nData Product", color: { background: "#312d00", border: "#d29922" }, font: { color: "#e6edf3" }, shape: "ellipse", x: -120, y: 160 },
// Central platform
{ id: 7, label: "Context Graph\nPlatform", color: { background: "#8250df", border: "#d2a8ff" }, font: { color: "#e6edf3", size: 15 }, shape: "box", x: 120, y: 0 },
// Governance
{ id: 8, label: "Federated\nGovernance\n(CoE)", color: { background: "#3d0d0d", border: "#f85149" }, font: { color: "#e6edf3" }, shape: "box", x: 120, y: -240 },
// Consumers
{ id: 9, label: "AI Inference\nLayer", color: { background: "#1a4321", border: "#3fb950" }, font: { color: "#e6edf3" }, shape: "box", x: 360, y: -80 },
{ id: 10, label: "Decision\nUsers", color: { background: "#1a4321", border: "#3fb950" }, font: { color: "#e6edf3" }, shape: "box", x: 360, y: 80 },
// Data contracts
{ id: 11, label: "Data Contract\n(Schema + SLA)", color: { background: "#0d1117", border: "#30363d" }, font: { color: "#8b949e", size: 11 }, shape: "text", x: -240, y: -290 },
]);
const edges = new vis.DataSet([
{ from: 1, to: 4, arrows: "to", label: "produces" },
{ from: 2, to: 5, arrows: "to", label: "produces" },
{ from: 3, to: 6, arrows: "to", label: "produces" },
{ from: 4, to: 7, arrows: "to", color: "#d29922" },
{ from: 5, to: 7, arrows: "to", color: "#d29922" },
{ from: 6, to: 7, arrows: "to", color: "#d29922" },
{ from: 7, to: 9, arrows: "to" },
{ from: 7, to: 10, arrows: "to" },
{ from: 8, to: 4, arrows: "to", color: "#f85149", dashes: true, label: "enforces standards" },
{ from: 8, to: 5, arrows: "to", color: "#f85149", dashes: true },
{ from: 8, to: 6, arrows: "to", color: "#f85149", dashes: true },
{ from: 11, to: 4, dashes: true, color: "#30363d" },
]);
const container = document.getElementById("network");
new vis.Network(container, { nodes, edges }, {
physics: false,
interaction: { dragNodes: true },
edges: { smooth: { type: "curvedCW", roundness: 0.2 }, font: { size: 11, color: "#8b949e" } }
});
</script>
</body>
</html>
Data SLA and Data Observability¶
A Data SLA (Service Level Agreement) specifies the performance commitments that a data product or data pipeline must meet. For context graph data engineering, relevant SLAs include:
Freshness SLA: The maximum acceptable lag between an event occurring in the source system and that event being queryable in the context graph. Freshness requirements vary by workflow: real-time incident response workflows may require a freshness SLA of 60 seconds, while quarterly compliance reporting workflows may tolerate 24-hour lag.
Availability SLA: The proportion of time that the data ingestion pipeline is operational. An availability SLA of 99.5% allows for approximately 44 hours of downtime per year — acceptable for batch-oriented workflows, too loose for real-time decision support.
Latency SLA: The maximum time required to execute a context graph query and return results. A latency SLA of 500 milliseconds for a single-hop query and 2 seconds for a multi-hop traversal is typical for interactive decision support workflows.
Data SLAs are enforced through Data Observability — the practice of continuously monitoring data pipelines and data quality to detect and alert on SLA violations before they affect end users. Data observability for context graphs includes monitoring the following dimensions:
Freshness monitoring. Tracking the timestamp of the most recent event ingested for each data source and alerting when the lag exceeds the freshness SLA. Freshness monitors must account for expected low-activity periods (weekends, overnight) where low event volume is normal rather than indicative of a pipeline failure.
Volume monitoring. Tracking the number of events ingested per time window for each data source and alerting when the volume drops significantly below the historical baseline. A sudden drop in event volume often indicates a pipeline failure or a schema change in the source system that broke the parser.
Schema monitoring. Detecting changes in the structure of incoming data products — new fields, removed fields, changed data types — that may indicate a schema evolution in the source system that was not communicated through the data contract process.
Semantic monitoring. Validating that the content of incoming data is semantically correct — that entity references resolve to known entities, that timestamps are in valid ranges, that enumerated field values are within the allowed set. Semantic violations that pass schema validation are the most dangerous category because they produce incorrect graph data that is difficult to detect downstream.
SQL Transformation Tools¶
SQL Transformation Tools are the data engineering infrastructure for transforming raw source data into the structured format required by the context graph's ingestion layer. The term "SQL" is used broadly here — these tools use SQL as their transformation language but add capabilities for dependency management, incremental processing, and data quality testing that raw SQL lacks.
These tools operate on a declarative model: the data engineer defines the desired output structure (a table or view representing the transformation result) and the tool manages the execution order, incremental updates, and dependency graph among transformations. A transformation that depends on three upstream transformations will be re-executed automatically when any of those upstream transformations produces new output.
For context graph data engineering, SQL transformation tools serve two primary functions:
Source system normalization. Enterprise source systems often have idiosyncratic data models: date fields stored as strings, entity IDs that differ between systems (the customer ID in the CRM is different from the customer ID in the billing system), and denormalized records that must be split into multiple graph entities. SQL transformations normalize these source system quirks into the consistent format the context graph ingestion layer expects.
Decision event reconstruction. Some source systems do not emit discrete decision events — they maintain a current state that is overwritten when a decision changes it. Converting current-state records into decision events requires a transformation that computes the difference between the current state and the previous state (change data capture), and reconstructs the decision event from that difference. SQL transformation tools with incremental processing capabilities are well-suited to this pattern.
A data quality testing layer is an essential complement to SQL transformation tools. These tools allow data engineers to write assertions against transformation outputs — "the decision_type field should never be null," "the event_timestamp should always be before the ingestion_timestamp" — and to run those assertions automatically on every transformation execution. Failed assertions are surfaced as test failures, not as silent data quality issues that propagate into the context graph.
Workflow Orchestration Tools¶
Workflow Orchestration Tools manage the scheduling, dependency resolution, and failure handling for data engineering pipelines. A context graph data engineering stack typically involves dozens of interdependent processes: source system extraction, transformation, quality validation, graph ingestion, and post-ingestion verification. Orchestration tools ensure these processes execute in the correct order, with appropriate parallelization, and with automatic retry and alerting on failure.
For context graph data engineering, orchestration tools must support two execution patterns:
Batch orchestration — the execution of a complete pipeline on a schedule (hourly, daily, weekly) that processes all data accumulated since the previous batch run. Batch orchestration is appropriate for data sources that are not time-critical and where the overhead of maintaining a streaming connection to the source system is not justified.
Event-driven orchestration — the execution of pipeline steps in response to events: a new data product file appearing in a storage bucket, a message appearing in an event queue, or a webhook from a source system indicating that a new decision event is available. Event-driven orchestration minimizes freshness lag by processing events as they arrive rather than waiting for the next batch window.
Most context graph data engineering stacks use both patterns: event-driven orchestration for real-time decision events from high-priority workflows, and batch orchestration for lower-priority data sources and for the periodic maintenance tasks (quality audits, freshness checks, obsolescence marking) that do not need to execute in real time.
Graph Batch Processing¶
Graph Batch Processing refers to the execution of large-scale graph computations on the accumulated context graph data — computations that are too expensive to run in real time but that produce valuable outputs for planning, analysis, or model training. Examples of batch graph computations include:
Precedent similarity clustering. Grouping decision traces by semantic similarity to identify recurring exception patterns that can be converted into formal policies. A batch job that runs weekly and produces a ranked list of "exception clusters that appear more than N times and have no corresponding formal policy" gives the governance team a data-driven list of policy gaps to address.
Staleness detection. Identifying decision traces that reference policies or precedents that have since been superseded, so those traces can be flagged as historically accurate but no longer policy-compliant. This is inherently a batch computation because it requires joining every trace against the current policy timeline — an operation that cannot be performed cheaply in real time.
Graph statistics computation. Computing graph-level metrics — degree distribution, clustering coefficient, average path length between related decision types — that characterize the structure of the organizational knowledge and help identify areas where the graph is sparse or unusually dense. Sparse areas may indicate workflows that are not being captured; unusually dense areas may indicate over-specificity in the schema.
Streaming Graph Update¶
Streaming Graph Update refers to the pattern of applying incremental graph changes — new nodes, new edges, modified properties — in near-real time as events flow through the ingestion pipeline. Streaming updates require a streaming processing platform that can handle high event throughput with low latency and provide exactly-once delivery guarantees.
The critical challenge in streaming graph updates is consistency: the graph must remain internally consistent at all times, even as it is being updated by multiple concurrent streams. A decision trace that references an entity that has not yet been ingested into the graph — because the entity event and the decision event arrived in a different order than they were generated — will fail to resolve the entity reference. The streaming processing platform must either buffer events until all referenced entities are present, or apply the update speculatively and reconcile it once the referenced entity arrives.
Event Streaming Platforms¶
Event Streaming Platforms are the infrastructure for high-throughput, low-latency event ingestion and distribution. For context graph data engineering, the event streaming platform is the central bus through which decision events, entity updates, and policy changes flow from source systems to the context graph.
An event streaming platform provides three capabilities that raw message queues do not:
Retention and replay. Events are stored for a configurable retention period (days to weeks) and can be re-consumed from any point in the retention window. If the context graph's ingestion pipeline fails and recovers after 4 hours, the pipeline can replay all events from the last successfully processed event rather than having to request a full re-extraction from the source system.
Multiple consumer isolation. Multiple independent consumers can read from the same event stream without interfering with each other. The context graph's graph ingestion consumer, the audit log consumer, and the training data preparation consumer all read the same event stream independently — each maintains its own position in the stream and processes events at its own rate.
Partitioning for ordered processing. Events can be partitioned by a key (e.g., entity ID or workflow ID) so that all events for the same entity or workflow are processed in order by the same consumer. This is essential for decision traces, where the order of events (original decision, amendment, appeal, override) must be preserved to reconstruct the correct decision history.
Multi-Model Database Integration¶
A Multi-Model Database is a database system that can store and query data in multiple models — graph, document, relational, key-value — within a single system. For context graph deployments, multi-model databases reduce the operational complexity of managing separate specialized systems for graph data, document storage, and key-value caching.
Two adjacent integration patterns — Document Store Integration and Time-Series Integration — extend the context graph's data coverage beyond the native graph model.
Document Store Integration¶
Document Store Integration connects the context graph to a document-oriented storage system that holds the full content of documents referenced by decision traces. A decision trace might reference a contract document, a compliance policy PDF, or a meeting transcript. The decision trace in the graph holds the metadata — document ID, title, creation date, relevant sections — while the full document content lives in the document store.
This separation is important for two reasons. Graph databases are optimized for traversal, not for storing and retrieving large unstructured documents. And document content often contains sensitive information that should be accessible only to specific users — separating it from the graph allows access control to be applied at the document level without restricting graph traversal access to the decision trace metadata.
Integrating the document store with the context graph requires a consistent entity linking model: the same document entity is referenced by the same ID in both systems, so a query that traverses the graph to a document reference can immediately fetch the document content from the document store without a secondary lookup through a separate identifier mapping.
Time-Series Integration¶
Time-Series Integration connects the context graph to a time-series database that holds quantitative measurements over time — system performance metrics, market prices, operational statistics. Decision traces often need to reference the state of key metrics at the time a decision was made: "this exception was approved because the inventory level was below the reorder threshold on that date."
A time-series database can answer "what was the value of metric X at time T?" efficiently — it is specifically optimized for this query pattern. The context graph holds the decision trace and its temporal metadata; the time-series database holds the metric history that provides quantitative context for decisions. Integration between the two systems allows the context graph's retrieval layer to enrich decision traces with the quantitative context that was present at decision time.
#### Diagram: Multi-Model Integration Architecture
Sim Spec:
- Type: vis-network diagram
- sim-id: multi-model-integration-architecture
- Library: vis-network
- Status: not started
- Bloom Level: Analysis
- Bloom Verb: differentiate
- Learning Objective: Differentiate the roles of graph, document, time-series, and vector storage in a complete context graph infrastructure stack
- Instructional Rationale: Showing the four storage models as complementary rather than competing clarifies why context graph deployments require a multi-model approach rather than a single database
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Multi-Model Integration</title>
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<style>
body { margin: 0; background: #0d1117; color: #e6edf3; font-family: sans-serif; }
#network { width: 100%; height: 500px; border: 1px solid #30363d; }
</style>
</head>
<body>
<div id="network"></div>
<script>
const nodes = new vis.DataSet([
{ id: 1, label: "Context Graph\nQuery Layer", color: { background: "#8250df", border: "#d2a8ff" }, font: { color: "#e6edf3", size: 15 }, shape: "box", x: 0, y: 0, fixed: true },
{ id: 2, label: "Graph Database\n(Decisions, Entities,\nRelationships)", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -280, y: -160, fixed: true },
{ id: 3, label: "Document Store\n(Contracts, Policies,\nTranscripts)", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: 280, y: -160, fixed: true },
{ id: 4, label: "Time-Series DB\n(Metrics, Prices,\nPerformance)", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -280, y: 160, fixed: true },
{ id: 5, label: "Vector Index\n(Semantic Search,\nEmbeddings)", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: 280, y: 160, fixed: true },
{ id: 6, label: "Traversal\nQueries", color: { background: "#0d1117", border: "#30363d" }, font: { color: "#8b949e", size: 12 }, shape: "text", x: -140, y: -280 },
{ id: 7, label: "Full Document\nRetrieval", color: { background: "#0d1117", border: "#30363d" }, font: { color: "#8b949e", size: 12 }, shape: "text", x: 140, y: -280 },
{ id: 8, label: "Point-in-time\nMetric Lookup", color: { background: "#0d1117", border: "#30363d" }, font: { color: "#8b949e", size: 12 }, shape: "text", x: -140, y: 280 },
{ id: 9, label: "Semantic\nSimilarity", color: { background: "#0d1117", border: "#30363d" }, font: { color: "#8b949e", size: 12 }, shape: "text", x: 140, y: 280 },
]);
const edges = new vis.DataSet([
{ from: 1, to: 2, arrows: "both", label: "primary\nstore" },
{ from: 1, to: 3, arrows: "both", label: "doc\nref" },
{ from: 1, to: 4, arrows: "both", label: "metric\ncontext" },
{ from: 1, to: 5, arrows: "both", label: "semantic\nretrieval" },
{ from: 6, to: 2, dashes: true, color: "#30363d" },
{ from: 7, to: 3, dashes: true, color: "#30363d" },
{ from: 8, to: 4, dashes: true, color: "#30363d" },
{ from: 9, to: 5, dashes: true, color: "#30363d" },
]);
const container = document.getElementById("network");
new vis.Network(container, { nodes, edges }, {
physics: false,
interaction: { dragNodes: false },
edges: { smooth: { type: "curvedCW", roundness: 0.25 }, font: { size: 11, color: "#8b949e" } }
});
</script>
</body>
</html>
Feature Engineering from Graphs¶
Feature Engineering from Graph is the process of converting graph-structured data into numerical feature vectors that can be used as inputs to machine learning models. Graph data contains rich structural information — node centrality, neighborhood density, path lengths between entities, community membership — that flat tabular datasets cannot represent. Extracting these structural features and combining them with node-level attributes produces richer feature sets than either graph structure or node attributes alone.
Several categories of graph-derived features are particularly valuable for context graph applications:
Centrality features. The centrality of a decision node (how many other nodes reference it, how many chains pass through it) is a proxy for its importance as a precedent. A highly central decision has influenced many subsequent decisions; a central policy node has governed many decisions. Centrality features can be computed as node-level features that machine learning models can use to weight the importance of retrieved precedents.
Temporal features. The age of a decision trace, the recency of its last reference in a subsequent decision, and the rate at which it has been cited over time are temporal features that capture the relevance evolution of a precedent. A precedent that was heavily cited two years ago but has not been referenced in six months may reflect a policy or practice that has become obsolete, even if no formal obsolescence has been recorded.
Community features. Community detection algorithms partition graph nodes into groups with dense internal connections and sparse external connections. In a context graph, communities often correspond to functional areas (finance exception decisions, procurement approval decisions) or organizational units. Community membership is a useful feature for retrieval ranking — a decision from the same community as the current decision is more likely to be relevant than a decision from a distant community.
Path features. The shortest path length between two decision nodes measures their conceptual distance in the knowledge graph. Two decisions separated by two hops (a common policy) are more similar than two decisions separated by five hops. Path length features help retrieval models distinguish between decisions that appear semantically similar but are structurally distant.
Knowledge Graph as Feature Store¶
A Knowledge Graph as Feature Store is an architectural pattern where a context graph serves as the source of features for machine learning model training and inference. Rather than maintaining a separate feature store (a specialized database for ML feature serving), the context graph's rich node and edge attributes — combined with graph-derived features — provide a comprehensive feature source for models that reason about organizational decisions.
This pattern has a compelling property: because the context graph already maintains temporal versioning (bitemporal modeling, as described in Chapter 13), it can provide point-in-time correct features for training. When training a model to predict which decisions will require escalation, you need to know what features were present at the time each historical decision was made — not what the features look like today. A context graph with proper temporal versioning can answer "what did the precedent graph look like on this specific date for this specific entity?" — a query that flat feature stores cannot answer without enormous complexity.
Graph-Based Recommendation and Personalization¶
Graph-Based Recommendation uses the structure of a knowledge graph to generate recommendations — of relevant precedents, of applicable policies, of subject matter experts — based on graph proximity rather than statistical correlation alone. In a context graph, graph-based recommendation surfaces "decisions similar to the current one that were resolved in a specific way," where similarity is defined by shared entities, shared policies, and structural proximity in the graph.
Graph-based recommendation has an advantage over pure embedding-based similarity for context graph applications: it is inherently explainable. "These three precedent decisions are recommended because they involved the same customer, the same exception type, and were approved by the same authority" is a traceable explanation. "These three decisions have a cosine similarity of 0.87 in the embedding space" is not.
Personalization Graph¶
A Personalization Graph is a context graph extension that captures individual user preferences, expertise areas, and decision patterns to tailor context graph outputs to the specific user. A personalization graph links user nodes to the types of decisions they most frequently handle, the precedents they most frequently cite, and the experts they most frequently consult — building a structured representation of each user's decision-making context.
Personalization in context graph retrieval improves relevance in two ways. First, it filters retrieved precedents to those from workflow areas the user is responsible for, reducing noise from irrelevant organizational contexts. Second, it ranks retrieved precedents by how closely they match the decision patterns this specific user has followed in the past — surfacing the precedents that are most consistent with their existing judgment rather than the most statistically common precedents overall.
Session Graph and User Journey Graph¶
A Session Graph is a temporary subgraph that captures the decisions, queries, and context retrievals that occur within a single user session — a bounded period of work on a specific decision or set of related decisions. Session graphs enable context graph systems to maintain state across multiple queries within a session: "the user asked about three related precedents in this session, so subsequent queries should be interpreted in that context."
A User Journey Graph extends the session graph concept to the full lifecycle of a user's interaction with the context graph system over time: which workflows they use most frequently, which types of exceptions they encounter most often, how their querying patterns evolve as they become more experienced with the system. User journey graphs are a source of implicit feedback for system improvement — users who stop querying for certain types of precedents may have developed internal expertise (positive signal) or may have stopped trusting the results (negative signal that requires investigation).
Change Data Feed¶
A Change Data Feed (CDF) is a pattern for capturing changes to data in a source system and delivering those changes as a stream of events to downstream systems. Change data feeds are the most important integration pattern for context graph freshness because they enable real-time or near-real-time synchronization between source systems and the context graph without requiring the source system to implement explicit event publishing.
Change data feeds work by monitoring the transaction log of the source database — the internal record that the database maintains of every write operation. Each write operation (insert, update, delete) appears as an event in the transaction log; a change data feed consumer reads these log events and converts them into the format expected by the context graph ingestion layer.
Three properties make change data feeds particularly valuable for context graph data engineering:
Completeness. Because the transaction log captures every database write, a change data feed provides complete coverage of source system changes — unlike polling-based approaches that can miss changes that occur and are subsequently overwritten between polling intervals.
Low source system impact. Reading the transaction log has minimal impact on the source system's performance, unlike query-based approaches that execute SELECT statements against production tables. Transaction log reading is a read-only, non-blocking operation.
Order preservation. Transaction logs preserve the exact order in which changes occurred. Change data feed events arrive at the context graph ingestion layer in the same order that the source system processed them, which is essential for correct reconstruction of decision histories.
#### Diagram: Change Data Feed Pipeline
Sim Spec:
- Type: vis-network diagram
- sim-id: change-data-feed-pipeline
- Library: vis-network
- Status: not started
- Bloom Level: Application
- Bloom Verb: construct
- Learning Objective: Construct a change data feed pipeline that connects source system transaction logs to context graph ingestion in real time
- Instructional Rationale: Tracing the path from source database write to graph node update illustrates the end-to-end data flow that enables context graph freshness
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Change Data Feed Pipeline</title>
<script src="https://unpkg.com/vis-network/standalone/umd/vis-network.min.js"></script>
<style>
body { margin: 0; background: #0d1117; color: #e6edf3; font-family: sans-serif; }
#network { width: 100%; height: 460px; border: 1px solid #30363d; }
</style>
</head>
<body>
<div id="network"></div>
<script>
const nodes = new vis.DataSet([
{ id: 1, label: "Source System\n(CRM / ERP / etc)", color: { background: "#3d0d0d", border: "#f85149" }, font: { color: "#e6edf3" }, shape: "box", x: -480, y: 0, fixed: true },
{ id: 2, label: "Transaction\nLog", color: { background: "#312d00", border: "#d29922" }, font: { color: "#e6edf3" }, shape: "cylinder", x: -320, y: 0, fixed: true },
{ id: 3, label: "CDC\nConnector", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: -160, y: 0, fixed: true },
{ id: 4, label: "Event\nStream", color: { background: "#8250df", border: "#d2a8ff" }, font: { color: "#e6edf3" }, shape: "box", x: 0, y: 0, fixed: true },
{ id: 5, label: "Schema\nValidation", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: 160, y: -80, fixed: true },
{ id: 6, label: "Dead Letter\nQueue", color: { background: "#3d0d0d", border: "#f85149" }, font: { color: "#e6edf3" }, shape: "box", x: 160, y: 80, fixed: true },
{ id: 7, label: "Graph\nIngestion", color: { background: "#1f6feb", border: "#58a6ff" }, font: { color: "#e6edf3" }, shape: "box", x: 320, y: -80, fixed: true },
{ id: 8, label: "Context\nGraph DB", color: { background: "#1a4321", border: "#3fb950" }, font: { color: "#e6edf3", size: 15 }, shape: "box", x: 480, y: -80, fixed: true },
{ id: 9, label: "Observability\nAlerts", color: { background: "#312d00", border: "#d29922" }, font: { color: "#e6edf3" }, shape: "box", x: 320, y: 80, fixed: true },
]);
const edges = new vis.DataSet([
{ from: 1, to: 2, arrows: "to", label: "writes" },
{ from: 2, to: 3, arrows: "to", label: "reads log" },
{ from: 3, to: 4, arrows: "to", label: "publishes events" },
{ from: 4, to: 5, arrows: "to" },
{ from: 5, to: 7, arrows: "to", label: "valid" },
{ from: 5, to: 6, arrows: "to", label: "invalid", color: "#f85149" },
{ from: 7, to: 8, arrows: "to", label: "upsert" },
{ from: 4, to: 9, arrows: "to", dashes: true, label: "metrics" },
]);
const container = document.getElementById("network");
new vis.Network(container, { nodes, edges }, {
physics: false,
interaction: { dragNodes: false },
edges: { smooth: { type: "curvedCW", roundness: 0.2 }, font: { size: 11, color: "#8b949e" } }
});
</script>
</body>
</html>
Context Graph Observability¶
Context Graph Observability is the practice of instrumenting the context graph system to provide continuous visibility into its operational health, data quality, and AI performance. Observability for context graphs goes beyond traditional database monitoring — it must cover not just system performance metrics but also data semantic quality and AI output quality.
A complete context graph observability stack monitors three layers.
Infrastructure layer. Traditional database and pipeline monitoring: query latency, throughput, error rate, storage utilization, replication lag. Infrastructure layer monitoring is necessary but not sufficient — a system can be operationally healthy at the infrastructure layer while producing incorrect outputs due to data quality issues.
Data quality layer. Monitoring of the semantic correctness of graph contents: the proportion of decision traces with all required fields populated, the proportion of entity references that resolve correctly, the freshness of each data source, and the volume of events flowing through each ingestion pipeline. Data quality layer monitoring requires application-level instrumentation — the context graph application must actively check and report on data quality, not just system metrics.
AI output layer. Monitoring of the quality of context graph outputs as experienced by AI agents and end users: retrieval relevance scores (do retrieved precedents have high relevance to the query?), coverage rates (what proportion of queries return at least one relevant result?), and feedback scores (do users rate retrieved context as helpful?). AI output layer monitoring closes the loop between the data engineering work and the actual value delivered to users.
These three monitoring layers together provide the observability required to distinguish between four types of system states:
| System State | Infrastructure | Data Quality | AI Output |
|---|---|---|---|
| Fully operational | Healthy | Healthy | Healthy |
| Silent data failure | Healthy | Degraded | Degraded |
| AI model drift | Healthy | Healthy | Degraded |
| Infrastructure failure | Degraded | Unknown | Unknown |
The most dangerous state is silent data failure: the infrastructure is operationally healthy (all monitoring dashboards show green), but the data quality is degraded (stale sources, missing fields, incorrect entity links), causing AI output quality to degrade silently. Without data quality layer monitoring, this state can persist for weeks before users notice and report the problem.
The Silent Data Failure Footgun
Silent data failure is a footgun pattern with all three defining properties: it is silent (no system alerts, no error messages — just quietly wrong outputs), it is easy to trigger (any pipeline that lacks data quality assertions will eventually experience it), and it causes delayed and invisible damage (users lose trust in the system gradually rather than experiencing a single visible failure they can report). The fix is structural: data quality assertions must run automatically on every pipeline execution, not as a periodic manual audit.
Summary¶
This chapter covered the data engineering infrastructure that makes context graphs operational at scale. Seven interconnected architectural layers work together to provide accurate, fresh, and observable context graph data.
The Data Mesh decentralizes data ownership to domain teams while the central platform team provides shared infrastructure and governance standards enforced by a Center of Excellence. Data Products and Data Contracts formalize the interface between domain teams and the central platform, with explicit schema, freshness, completeness, and quality guarantees. Data SLAs and Data Observability define the performance commitments and monitoring practices that detect SLA violations before they affect users.
SQL Transformation Tools normalize source system data into context graph ingestion format with automated quality assertions. Workflow Orchestration Tools manage the dependency graph, scheduling, and failure handling for both batch and event-driven pipelines. Event Streaming Platforms provide the high-throughput, ordered, replayable event bus that enables near-real-time graph updates and multiple independent consumer patterns.
Multi-Model Database Integration — connecting graph, document, time-series, and vector storage — provides the full data coverage that context graph retrieval requires. Change Data Feeds enable real-time synchronization from source systems by reading database transaction logs rather than polling production tables.
Feature Engineering from Graphs converts graph structure into ML model inputs: centrality, temporal, community, and path features. The Knowledge Graph as Feature Store pattern leverages temporal versioning to provide point-in-time correct features for training. Graph-Based Recommendation, Personalization Graphs, Session Graphs, and User Journey Graphs extend the context graph's capabilities into adaptive, user-aware retrieval.
Finally, Context Graph Observability — monitoring infrastructure, data quality, and AI output quality simultaneously — prevents the silent data failure failure mode that is the most dangerous threat to context graph trustworthiness.
Chapter 21: Complete!
You now understand the full data engineering stack that keeps context graphs accurate, fresh, and observable. The final chapter — Chapter 22 — explores security and vector search: how to protect sensitive decision data with fine-grained access control, how to combine vector similarity search with graph traversal for the most powerful retrieval patterns, and how to evaluate and monitor the security posture of a production context graph deployment. The last piece of the architecture. Let's trace the security boundary!