Quiz: Modeling Patterns and Data Loading
Test your understanding of graph modeling patterns, anti-patterns, data loading strategies, and schema evolution.
1. What is a subgraph?
- A graph below sea level
- A portion of a larger graph containing a subset of nodes and edges, often extracted for focused analysis
- A type of submarine
- A backup copy
Show Answer
The correct answer is B. A subgraph is a portion of a larger graph containing subsets of nodes and edges, extracted for analysis, visualization, or processing. For example, extracting customers who purchased in the last month creates a focused subgraph for analyzing recent buying patterns without processing the entire customer base.
Concept Tested: Subgraphs
See: Graph Patterns
2. What is time-based modeling and why is it important?
- Setting database clocks
- Techniques for representing temporal aspects like valid times, transaction times, and time-varying relationships in graph data
- Measuring query speed
- Scheduling backups
Show Answer
The correct answer is B. Time-based modeling represents temporal aspects of data: when facts are valid (valid-time), when they were recorded (transaction-time), and how relationships change over time. For example, modeling job history: (Person)-[:WORKED_AT {start: "2018-01", end: "2022-06"}]->(Company) captures employment duration. This enables historical queries and temporal analysis.
Concept Tested: Time-Based Modeling
See: Temporal Patterns
3. What is an ETL pipeline in the context of graph databases?
- A type of graph algorithm
- Extract, Transform, Load processes that move data from sources, convert formats, and load into graph databases
- A visualization tool
- A backup strategy
Show Answer
The correct answer is B. ETL (Extract, Transform, Load) pipelines extract data from source systems (relational databases, APIs, files), transform it to graph format (mapping tables to nodes, foreign keys to edges), and load it into the graph database. For example: extract customer data from CRM, transform to graph format, load as Person nodes and relationship edges.
Concept Tested: ETL Pipelines
See: Data Loading
4. What makes bulk loading different from incremental loading?
- They are the same
- Bulk loading imports large volumes in single operations for initial population, while incremental loading adds data in small batches continuously
- Bulk loading is always slower
- Incremental loading only works with small databases
Show Answer
The correct answer is B. Bulk loading imports large data volumes (millions of records) in optimized single operations, ideal for initial database population. Incremental loading adds new data in small batches or continuously as it arrives, updating the graph with daily transactions, new users, or streaming data. Both are important for different lifecycle stages.
Concept Tested: Bulk Loading, Incremental Loading
5. Why should you avoid creating supernodes in graph models?
- Supernodes are good for performance
- Nodes with millions of connections create performance bottlenecks when traversing requires processing all edges
- Supernodes use too much disk space
- Databases cannot store supernodes
Show Answer
The correct answer is B. Supernodes (nodes with millions of connections) create performance bottlenecks because any traversal from that node must process all its edges. For example, connecting all US customers to a single "USA" node means every query touching that node processes millions of edges. The solution: hierarchical modeling like City→State→Country distributing connections.
Concept Tested: Supernodes, Anti-Patterns
See: Anti-Patterns
6. What is schema evolution and why does it matter?
- Schemas cannot change
- The process of modifying database schemas over time while preserving existing data and maintaining compatibility
- Deleting old schemas
- A type of graph algorithm
Show Answer
The correct answer is B. Schema evolution is modifying database schemas over time (adding node types, edge types, properties) while preserving existing data. Graph databases support additive evolution gracefully: adding Address nodes and LIVES_AT edges doesn't disrupt existing Person nodes. Schema-optional modeling makes evolution easier than rigid relational schemas requiring migrations.
Concept Tested: Schema Evolution
See: Schema Management
7. Given a scenario where you need to migrate customer and order data from relational tables to a graph, how would you structure the transformation?
- Copy tables directly without changes
- Convert Customer and Order tables to node types, foreign keys to edges: (Customer)-[:PLACED]->(Order), Order_Items join table to (Order)-[:CONTAINS]->(Product) edges
- Delete the relational data
- Don't migrate
Show Answer
The correct answer is B. Relational-to-graph migration transforms: tables → node types, foreign keys → edges, join tables → edges or intermediate nodes. Customer and Order tables become nodes, customer_id foreign key becomes (Customer)-[:PLACED]->(Order) edge, Order_Items join table becomes (Order)-[:CONTAINS {quantity: 2}]->(Product) with quantity as edge property.
Concept Tested: Data Migration, ETL Pipelines
See: Migration Patterns
8. What are time trees and when are they useful?
- Trees that grow over time
- Hierarchical graph structures organizing time-based data (year→month→day→hour) enabling efficient temporal queries
- A scheduling algorithm
- A type of index
Show Answer
The correct answer is B. Time trees organize temporal data in hierarchical structures: Year→Month→Day→Hour nodes connected by edges. Events connect to appropriate time nodes, enabling efficient queries like "all events in March 2024" without scanning all timestamps. This pattern is common for event logging, IoT data, and historical analysis.
Concept Tested: Time Trees, Time-Based Modeling
See: Temporal Patterns
9. How does CSV import typically work for graph databases?
- CSV cannot be imported
- Mapping CSV columns to node properties and relationship properties, with separate files for nodes and edges
- CSV files replace the graph
- Only Excel files work
Show Answer
The correct answer is B. CSV import maps columns to graph elements: one CSV for nodes (columns become properties), another for edges (source ID, target ID, edge type, properties). For example, customers.csv creates Person nodes with properties from columns, while orders.csv creates PURCHASED edges referencing customer and product IDs. Most graph databases provide optimized CSV import tools.
Concept Tested: CSV Import, Data Loading
See: Data Loading Methods
10. Why is data migration from relational to graph often valuable despite the effort?
- It's not valuable
- Relationship-heavy queries become exponentially faster, multi-hop traversals become practical, and schema flexibility enables agile development
- It only works for small datasets
- Relational databases are always better
Show Answer
The correct answer is B. Migration to graphs yields dramatic benefits for relationship-heavy applications: queries requiring multiple self-joins in relational systems (friends-of-friends, supply chain paths, fraud rings) become simple, efficient traversals; multi-hop analysis becomes practical; schema flexibility supports agile development. For connected data use cases, performance improvements of 100-1000x are common, justifying migration effort.
Concept Tested: Data Migration, Tradeoff Analysis
See: Migration Decisions
Quiz Complete!
Questions: 10 Cognitive Levels: Remember (2), Understand (4), Apply (2), Analyze (2) Concepts Covered: Subgraphs, Anti-Patterns, Supernodes, Time-Based Modeling, Time Trees, Schema Evolution, ETL Pipelines, Data Loading, Bulk Loading, Incremental Loading, Data Migration, CSV Import
Next Steps: - Review Chapter Content for modeling best practices - Practice designing data migration strategies - Continue to Chapter 10: Commerce, Supply Chain, and IT