Introduction to Graph Databases

Credits: 3 Length: 14 Weeks Level: Undergraduate (Junior/Senior) or Graduate Introductory Level Prerequisites:

Prior coursework in databases or data modeling (recommended)
Basic programming knowledge (Python, JavaScript, or similar)
Familiarity with data structures (arrays, hash maps, trees)

Course Overview

This course introduces students to graph databases as powerful tools for representing, querying, and analyzing highly connected information. Students learn why traditional relational databases struggle with modern, relationship-heavy data and how Labeled Property Graph (LPG) databases treat relationships as first-class citizens with relationaship types, attributes, directionality, and semantics.

We begin by contrasting the architectural foundations of RDBMS vs. NoSQL systems, explore the design motivations behind graph data models, and introduce the formal elements of LPGs: nodes, edges, properties, labels, and schema options. Students then gain hands-on experience modeling and querying real-world graphs using languages such as openCypher, or GSQL (depending on instructor preference).

The course emphasizes building real applications: social networks, recommendation engines, fraud detection pipelines, supply-chain models, knowledge graphs, bill-of-materials (BOM), and healthcare data modeling. Students practice evaluating when to choose graph data models, how to optimize them, how to measure performance, and how to design graph schemas aligned with real business domains.

The capstone project involves building an end-to-end graph application using an LPG graph database.

Sample Outline (14 Weeks)

Week 1 – Introduction to Graph Thinking

Why data modeling matters in our AI-driven world
The importance of world-models
Knowledge representation strategy
Six major representations of data
RDBMS vs. OLAP vs. NoSQL
When graphs outperform tables
Edges are a first class citizen
LPG: The most maintainable information model
Case Study: Neo4j
Timeline of Graph Database

Week 2 – NoSQL and the Rise of Graphs

Key-value, document, wide-column, and graph stores
Tradeoff analysis (model precision, flexibility, scaling)
Representations of knowledge
The Knowledge Triangle
Single server graphs
Distributed graphs
Case Study: TigerGraph

Week 3 – Labeled Property Graph (LPG) Information Model

Nodes, edges, labels, properties
Representing metadata
Open vs. Closed World Models
Schema-optional vs. schema-enforced modeling
Tools to view graph data models
Adding rules to graphs
Validating documents
Validating graphs

Week 4 – Query Languages for Graphs

openCypher
GSQL - using the map-reduce pattern on a distributed cluster
Accumulators - keeping queries short
Path patterns, hops, aggregations
GQL - the emerging standard for advanced query languages

Week 5 – Index-Free Adjacency & Performance

Traversal fundamentals
Constant-time neighbor access
Cost comparison: joins vs. traversals
Hop count
MicroSim: Chart: Comparing Multi-hop performance on RDBMS vs. Graph
Degree of a node
Indegree
Outdegree
Edges per node ratios
Indexes
Vector indexes
Graph metrics
Statistical Query Tuning

Week 6 – Benchmarking Techniques

Why benchmarking is critical to promoting graphs
Graph benchmarking is difficult
Synthetic benchmarks
Single node benchmarks
Multi-node benchmarks
Predicting the future value of insights
LDBC SNB benchmark
Graph 500 rankings
Measuring graph performance
Query latency, throughput, and scalability
Case Study - six degrees of separation
Case Study - Graph 500 rankings
Case Study - Healthcare Operations

Friend graphs
Modeling human resources
Case Study: The Org Chart and Skill Management
Diagram: Org Chart Models
Influence graphs
Modeling with edges as first-class citizens
Extending your model
Adding discussions
Adding natural language language processing
Adding products to your graph
Adding sentiment to your graph
Detecting bad fake accounts
Case Study: Assigning Tasks from the Backlog

Week 8 – Knowledge Representation with Concept Graphs**

Concept dependency graphs
Curriculum graphs
Ontology-connected graph structures
The Simple Knowledge Organization System (SKOS)
Preferred Labels and Alternate Labels
The Acronym List
The Glossary
The Controlled Vocabulary
The Taxonomy
The Ontology
Modeling Enterprise Knowledge
Modeling Department Knowledge
Modeling Project Knowledge
Case Study - Extracting Acton Items from Call Transcripts
Modeling Personal Knowledge
Notetaking
Personal Knowledge Graphs
Knowledge Capture
Tacit Knowledge and Codifiable Knowledge,
Enterprise Knowledge Management

Week 9 - Graph Algorithms

Search
Breath First Search (BFS)
Depth First Search (DFS)
A-Star (A*)
Pathfinding
Traveling Salesman
PageRank
Community detection
Graph Neural Networks
Data Science toolkits

Week 9 – Graph Modeling Patterns**

Subgraphs
Supernode vs. anti-pattern nodes
Hyperedges, multi-edges
Time-based modeling patterns
Time Trees
Modeling Internet of Things Events
Modeling Rules and Decision Trees
Bitemporal Graph Models (Advanced Topic)
Graph quality metrics

Weeks 10 and 11 – Industry Reference Data Models

Web storefront graph model
Product catalogs
Bill-of-Materials (BOM) and complex parts
Supply chain modeling
Modeling financial transactions
Fraud detection graphs
Highly Regulated Industries
Anti-Money Laundering (AML)
Know Your Customer (KYC)
Account-network traversal
Provider/patient graphs
Electronic health record modeling
IT asset and dependency graphs
Graph analytics vs. transactional graph queries
Graph embeddings (introduction)

Weeks 12, 13 and 14 – Capstone Projects and Presentations

Students present a full graph application
Modeling choices, data loading, queries, and performance measurements

Sample of Concepts Covered**

NoSQL Databases
Six Representations of Data
RDBMS vs. Graph Databases
OLAP vs. OLTP Workloads
Key-Value Stores
Document Databases
Graph Stores
Tradeoff Analysis / CAP
Representations of Knowledge
Concept Graphs
Index-Free Adjacency
Performance & Benchmarking
Web Storefront Modeling
Learning Management System Modeling
Curriculum & Course Dependency Graphs
Healthcare Data Graphs
IT Asset & Dependency Graphs
Financial Transaction Graphs
Fraud Detection Graphs
Complex Parts & BOM Graphs
Supply Chain Models
Graph Modeling Anti-Patterns
Graph Algorithms
openCypher / GSQL querying
Data loading pipelines
Best practices for graph schema design

Topics Not Covered

How neural networks work
Deep learning
Complex statistics
Details of how other databases work

Learning Objectives

Organized by Bloom’s Taxonomy – 2001 Revision

Below are the learning objectives grouped by Remember → Understand → Apply → Analyze → Evaluate → Create.

1. Remember (Factual Knowledge)

Students will be able to:

Define key terms such as node, edge, property, label, schema-optional, and index-free adjacency.
List the major categories of NoSQL systems.
Identify the components of an LPG information model.
Recall common graph query languages (openCypher, GSQL, Gremlin).

2. Understand (Conceptual Knowledge)

Students will be able to:

Explain why traditional RDBMS systems struggle with highly connected data.
Describe the tradeoffs among key-value, document, and graph stores.
Summarize how graph queries locate patterns more naturally than SQL joins.
Explain how concept dependency graphs represent knowledge structures.
Compare various real-world graph models (social, supply chain, healthcare, etc.).

3. Apply (Procedural Knowledge)

Students will be able to:

Construct simple LPG models using nodes, edges, and properties.
Write openCypher or GSQL queries to retrieve and aggregate graph data.
Load data sets into a graph database using CSV or ETL pipelines.
Implement graph traversal queries that compute multi-hop patterns.
Use performance measurement tools to benchmark graph workloads.

4. Analyze (Breakdown & Structure)

Students will be able to:

Differentiate between good and bad graph modeling choices.
Decompose a domain into entities, relationships, and multi-edge structures.
Examine performance logs to identify bottlenecks in graph queries.
Analyze alternative graph schema representations for a given domain.
Map complex business processes into multi-layered graph models (e.g., supply chain, IT dependency graph).

5. Evaluate (Judgment & Critique)

Students will be able to:

Justify when a graph database is more appropriate than an RDBMS or document store.
Evaluate competing graph schema designs for clarity, scalability, and performance.
Critique query patterns for correctness, efficiency, and maintainability.
Assess the appropriateness of chosen benchmarks and workload profiles.
Defend the modeling decisions used in their capstone project.

6. Create (Synthesis & Design)

Students will be able to:

Design a complete LPG schema for a complex domain (healthcare, finance, supply chain, etc.).
Develop multi-step graph queries supporting application requirements.
Create an end-to-end graph system including ETL, schema, queries, and visualizations.
Build and present a capstone graph application grounded in real-world data.
Propose design improvements using graph algorithms or structural pattern refinements.

Introduction to Graph Databases

Course Overview

Sample Outline (14 Weeks)

Week 1 – Introduction to Graph Thinking

Week 2 – NoSQL and the Rise of Graphs

Week 3 – Labeled Property Graph (LPG) Information Model

Week 4 – Query Languages for Graphs

Week 5 – Index-Free Adjacency & Performance

Week 6 – Benchmarking Techniques

Week 7 – Modeling Social Networks and Language

Week 8 – Knowledge Representation with Concept Graphs**

Week 9 - Graph Algorithms

Week 9 – Graph Modeling Patterns**

Weeks 10 and 11 – Industry Reference Data Models

Weeks 12, 13 and 14 – Capstone Projects and Presentations

Sample of Concepts Covered**

Topics Not Covered

Learning Objectives

1. Remember (Factual Knowledge)

2. Understand (Conceptual Knowledge)

3. Apply (Procedural Knowledge)

4. Analyze (Breakdown & Structure)

5. Evaluate (Judgment & Critique)

6. Create (Synthesis & Design)