Skip to content

Labeled Property Graph Information Model

Summary

This chapter introduces the Labeled Property Graph (LPG) information model, the foundation of modern graph databases. You'll learn how nodes, edges, properties, and labels work together to create expressive, flexible data models where relationships are first-class citizens. The chapter covers both schema-optional and schema-enforced approaches, explores index-free adjacency for performance, and introduces fundamental graph operations including traversal, pattern matching, and multi-hop queries.

Concepts Covered

This chapter covers the following 23 concepts from the learning graph:

  1. Labeled Property Graph
  2. Nodes
  3. Edges
  4. Properties
  5. Labels
  6. Schema-Optional Modeling
  7. Schema-Enforced Modeling
  8. Index-Free Adjacency
  9. Traversal
  10. Graph Query
  11. Pattern Matching
  12. Multi-Hop Queries
  13. Aggregation
  14. Path Patterns
  15. Constant-Time Neighbor Access
  16. First-Class Relationships
  17. Edge Direction
  18. Graph Data Model
  19. Graph Schema
  20. Metadata Representation
  21. Graph Validation
  22. Document Validation
  23. Rule Systems

Prerequisites

This chapter builds on concepts from:


Welcome to Graph Land: Where Everything Connects

Okay, deep breath. This chapter introduces a bunch of new concepts that might feel weird at first—and that's totally normal! If you've spent any time with traditional databases (or even if you haven't), the Labeled Property Graph model is going to ask you to think differently about data. Some of these ideas will click immediately, others might take a few read-throughs. That's not just okay—it's expected.

Here's the good news: the concepts we're covering aren't actually complicated; they're just different. And once they click (which they will, with a bit of repetition), you'll wonder why anyone ever thought tables were a good way to represent connected information.

We're going to introduce a lot of vocabulary in this chapter—23 concepts to be exact. Don't panic! Many of them build on each other, and we'll revisit the same ideas multiple times from different angles. By the end, terms like "index-free adjacency" and "first-class relationships" will feel as natural as "nodes" and "edges."

Ready? Let's dive in. And remember: if something doesn't make sense the first time, keep reading. It will.

The Labeled Property Graph: The Whole Enchilada

Let's start with the big picture. A Labeled Property Graph (LPG) is the data model that most modern graph databases use. Think of it as the "rules of the game" for how information gets structured and stored.

Don't worry about memorizing that definition. What matters is understanding the four building blocks that make up an LPG:

  1. Nodes - The "things" in your data (people, products, accounts, locations)
  2. Edges - The connections between things (relationships like FRIEND_OF, PURCHASED, DEPENDS_ON)
  3. Properties - The attributes or details (name, age, price, date)
  4. Labels - The categories that organize nodes and edges (Person, Product, PURCHASED)

Here's the simplest possible graph to illustrate:

1
2
3
(Alice:Person {age: 28})
    -[:FRIEND_OF {since: 2020}]->
(Bob:Person {age: 35})

Let's break this down piece by piece: - Alice is a node (a thing that exists) - :Person is a label (Alice belongs to the "Person" category) - {age: 28} is a property (a detail about Alice) - -[:FRIEND_OF {since: 2020}]-> is an edge (a connection from Alice to Bob) - :FRIEND_OF is the edge's label (the type of relationship) - {since: 2020} is the edge's property (when they became friends) - The arrow -> shows direction (Alice is friends with Bob)

If that feels like a lot, don't stress. We're going to explore each piece in detail, and you'll see the same concepts repeated in different examples. By the tenth example, this notation will feel completely natural.

Nodes: The Stars of the Show

Nodes (also called vertices if you want to sound fancy) represent the entities in your graph—the people, places, things, concepts, or events that you care about. If your graph is a social network, nodes are users. If it's a supply chain, nodes are products, warehouses, and vendors. If it's a knowledge graph, nodes are concepts and facts.

Think of nodes as the nouns in your data's story.

What makes a node?

  1. A unique identity: Every node is distinct, even if two nodes have the same properties
  2. Optional properties: Nodes can have attributes (name, age, email) or none at all
  3. Optional labels: Nodes can belong to one or more categories
  4. Connections: Nodes are connected to other nodes via edges

Let's look at some concrete examples:

Social network nodes:

1
2
3
(user1:Person {name: "Alice", email: "alice@example.com", joined: "2020-01-15"})
(user2:Person {name: "Bob", email: "bob@example.com", joined: "2019-06-22"})
(post1:Post {content: "Loving graph databases!", timestamp: "2025-01-18T10:30:00"})

E-commerce nodes:

1
2
3
(product1:Product {name: "Laptop", price: 899.99, sku: "LAP-001"})
(category1:Category {name: "Electronics", description: "Electronic devices"})
(vendor1:Vendor {name: "TechCorp", country: "USA"})

Notice a few things: - Different nodes can have completely different properties - Node identifiers (user1, product1) are internal; the actual data is in the properties - Labels (Person, Product, Category) help organize nodes into types

The abstract concept: A node represents a discrete entity in your domain. Each node is a standalone unit of information that can be connected to other units.

The concrete analogy: Think of nodes like index cards in a massive library card catalog. Each card represents one distinct thing—a book, an author, a subject. The card might have details written on it (properties), and it might be filed in multiple categories (labels). But fundamentally, it's a single, identifiable item.

(See? Same concept, two different explanations. This is that repetition thing we mentioned!)

Edges: Where the Magic Happens

Here's where graph databases get interesting. Edges (also called relationships or links) connect nodes to represent how things relate to each other. Unlike traditional databases where relationships are implied through foreign keys and JOIN operations, edges are first-class citizens—they're real, tangible things you can see, query, and give properties to.

This is kind of a big deal. Let me say it again: relationships in a graph database are first-class citizens. They're not hidden in junction tables or implied by matching IDs. They're explicit, named, directional connections with their own identity and attributes.

What makes an edge?

  1. Two nodes: An edge connects exactly two nodes (a "from" node and a "to" node)
  2. Direction: Edges point from one node to another (though you can traverse them in either direction)
  3. A type/label: Every edge has a name describing the relationship (FRIEND_OF, PURCHASED, DEPENDS_ON)
  4. Optional properties: Edges can have attributes just like nodes

Social network edges:

1
2
3
(Alice)-[:FRIEND_OF {since: "2020-05-12", closeness: 0.8}]->(Bob)
(Alice)-[:POSTED {timestamp: "2025-01-18T10:30:00"}]->(post1)
(Bob)-[:LIKED {timestamp: "2025-01-18T10:35:00"}]->(post1)

E-commerce edges:

1
2
3
(Alice)-[:PURCHASED {date: "2025-01-15", quantity: 1, price: 899.99}]->(product1)
(product1)-[:IN_CATEGORY]->(category1)
(product1)-[:MANUFACTURED_BY]->(vendor1)

Do you see what's happening? The edges aren't just connections—they tell a story. Alice became friends with Bob on a specific date. Alice purchased a product on a specific date for a specific price. Each edge carries meaning and context.

First-Class Relationships: Why This Matters

When we say relationships are first-class, we mean they're treated as important as the entities they connect. In a relational database, you'd have:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
-- Separate tables, relationships implied by foreign keys
Users table: (id, name, email)
Friendships table: (user1_id, user2_id, since)

-- To find friends, you JOIN tables
SELECT u2.*
FROM Users u1
JOIN Friendships f ON u1.id = f.user1_id
JOIN Users u2 ON f.user2_id = u2.id
WHERE u1.name = 'Alice';

In a graph database, you have:

1
2
3
// Relationships are explicit, queryable entities
MATCH (alice:Person {name: 'Alice'})-[:FRIEND_OF]->(friend)
RETURN friend;

The difference isn't just syntax—it's architectural. Graph databases store edges as physical pointers between nodes, making traversal instant. We'll talk more about why that matters when we get to "index-free adjacency" (don't worry, we'll explain that one!).

Edge Direction: Following the Arrows

Every edge has a direction, shown by the arrow in our notation: ->. But here's the cool part: you can traverse an edge in either direction when querying, regardless of how it's stored.

1
(Alice)-[:FRIEND_OF]->(Bob)  // Stored with direction

You can query: - "Who are Alice's friends?" (follow the arrow forward) - "Who is friends with Bob?" (follow the arrow backward) - "Are Alice and Bob friends?" (check for a connection in either direction)

Direction matters semantically (Alice purchased a product is different from a product purchased Alice), but query-wise, you're flexible. This will make more sense when we get to actual queries later in the chapter.

The abstract concept: Edges reify relationships—they make connections between entities tangible, queryable, and capable of carrying their own information.

The concrete analogy: Think of edges like labeled arrows drawn between items on a whiteboard. If you're diagramming your company's org chart, edges are the lines connecting "Manager" to "Employee" with labels like "MANAGES" or "REPORTS_TO." Each line is a real thing you can point to and say "this specific connection exists and has these properties."

Properties: The Details That Matter

Both nodes and edges can have properties—key-value pairs that store actual data about the entity or relationship. Properties are how you record specific details like names, dates, prices, or any other attribute you care about.

Node properties:

1
2
3
4
5
6
7
(Alice:Person {
  name: "Alice Johnson",
  email: "alice@example.com",
  age: 28,
  city: "Seattle",
  premium: true
})

Edge properties:

1
2
3
4
5
6
(Alice)-[:PURCHASED {
  date: "2025-01-15",
  price: 899.99,
  quantity: 1,
  payment_method: "credit_card"
}]->(laptop)

Properties are simple key-value pairs. The key is a string (like "name" or "price"), and the value can be a string, number, boolean, date, or even a list of values.

Different nodes can have different properties:

1
2
3
(Alice:Person {name: "Alice", age: 28, city: "Seattle"})
(Bob:Person {name: "Bob", age: 35, occupation: "Engineer"})
(Charlie:Person {name: "Charlie", email: "charlie@example.com"})

Notice how: - Alice has age and city - Bob has age and occupation - Charlie has email but not age or city

This is totally fine! One person can have different attributes than another person. This flexibility is part of what makes graphs powerful—you don't need to define every possible field upfront or fill in NULL values for missing data.

The abstract concept: Properties add specificity to otherwise generic entities and relationships. They transform "a person" into "Alice Johnson, age 28, from Seattle."

The concrete analogy: Properties are like the information you'd write on a business card or profile. The card is the node (the physical thing representing you), and the properties are all the details printed on it—name, title, phone number, email, LinkedIn URL. Different cards might have different details depending on context.

Labels: Organizing the Chaos

Labels are categories or types that help organize nodes and edges. Think of labels as tags that say "this is a Person" or "this is a Product" or "this is a FRIEND_OF relationship."

Nodes and edges can have: - No labels (though this is rare and usually not useful) - One label (most common) - Multiple labels (for things that fit multiple categories)

Node labels:

1
2
3
(alice:Person)                          // One label
(bob:Person:Employee)                    // Multiple labels
(acmeWidget:Product:PhysicalGood)        // Multiple labels

Bob is both a Person AND an Employee. That widget is both a Product AND a PhysicalGood. Labels let you query "give me all Employees" or "give me all PhysicalGoods" without caring about their other labels.

Edge labels (relationship types):

1
2
3
(Alice)-[:FRIEND_OF]->(Bob)
(Alice)-[:PURCHASED]->(laptop)
(Alice)-[:WORKS_AT]->(company)

Edge labels (the part after the colon in the square brackets) describe the type of relationship. Unlike node labels, edges typically have just one label/type because a relationship usually represents one specific kind of connection.

Why labels matter:

  1. Querying: You can filter by label ("find all Person nodes" or "find all PURCHASED edges")
  2. Performance: Databases can index by label for faster lookups
  3. Semantics: Labels make your graph self-documenting—you can look at the structure and understand what everything means

The abstract concept: Labels provide categorical information that transcends individual properties. They answer "what kind of thing is this?" rather than "what specific details does this have?"

The concrete analogy: Labels are like the sections in a library—Fiction, Non-Fiction, Reference, Children's. A book might belong to multiple sections (a graphic novel might be both Fiction and Children's), but the labels help you navigate and filter the collection.

Okay, pause. How are we doing? We've covered the four fundamental building blocks of a Labeled Property Graph: 1. ✅ Nodes (entities) 2. ✅ Edges (relationships) 3. ✅ Properties (attributes) 4. ✅ Labels (categories)

If those don't feel 100% solid yet, don't worry. We're going to see them in action throughout the rest of this chapter, and the repetition will solidify the concepts. Let's keep going!

Putting It All Together: The Graph Data Model

Now that we've met all the pieces, let's see how they combine into a complete graph data model. A graph data model describes the structure of your data—what types of nodes exist, what types of edges connect them, what properties each might have.

Here's a small social network graph model:

Node types: - Person: Properties: name, email, age, city - Post: Properties: content, timestamp, likes_count - Company: Properties: name, industry, founded

Edge types: - FRIEND_OF: Connects Person to Person, properties: since, closeness - POSTED: Connects Person to Post, properties: timestamp - LIKED: Connects Person to Post, properties: timestamp - WORKS_AT: Connects Person to Company, properties: title, start_date

Example data following this model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
(alice:Person {name: "Alice", age: 28, city: "Seattle"})
(bob:Person {name: "Bob", age: 35, city: "Portland"})
(post1:Post {content: "Graph databases are cool!", timestamp: "2025-01-18T10:00:00"})
(techcorp:Company {name: "TechCorp", industry: "Software", founded: 1995})

(alice)-[:FRIEND_OF {since: "2020-05-12"}]->(bob)
(alice)-[:POSTED {timestamp: "2025-01-18T10:00:00"}]->(post1)
(bob)-[:LIKED {timestamp: "2025-01-18T10:05:00"}]->(post1)
(alice)-[:WORKS_AT {title: "Engineer", start_date: "2022-03-01"}]->(techcorp)
(bob)-[:WORKS_AT {title: "Manager", start_date: "2019-06-15"}]->(techcorp)

This data model is flexible but structured. You know what types of things can exist and how they can connect, but you have freedom within those constraints. Alice doesn't need to have exactly the same properties as Bob, and you can add new nodes and edges that follow the model as your data grows.

The beauty of the graph data model: It mirrors how you think about the world. You don't think "Alice is row 42 in the Users table with a foreign key to row 17 in the Posts table." You think "Alice is a person who posted something and works at TechCorp." The graph model reflects that natural mental model.

Schema-Optional vs. Schema-Enforced: Choose Your Adventure

Here's where graph databases get really flexible. You can choose between two approaches:

Schema-Optional Modeling

Schema-optional (sometimes called schema-free or schema-less) means you don't define structure upfront. You just start creating nodes and edges, and the graph adapts to whatever you throw at it.

Pros: - Agile development: Start coding immediately, figure out structure as you go - Evolutionary design: Add new properties or node types without migrations - Heterogeneous data: Different nodes of the same label can have different properties - Fast iteration: Change your mind? Just start storing different fields!

Cons: - Inconsistency risk: Nothing prevents typos (is it "email" or "e-mail"?) - Data quality issues: No guarantee all Person nodes have required fields - Application complexity: Your code must handle missing or unexpected properties - Harder to document: What properties should a Person have?

Example:

1
2
3
4
// No schema defined, just create whatever you want
CREATE (alice:Person {name: "Alice", age: 28, email: "alice@example.com"})
CREATE (bob:Person {name: "Bob", occupation: "Engineer"})  // Different properties!
CREATE (charlie:Person {name: "Charlie", age: "thirty"})   // Age is a string! (Probably a mistake)

Schema-Enforced Modeling

Schema-enforced (sometimes called schema-constrained) means you define rules upfront: what properties are required, what types they must be, what edges are allowed. The database enforces these rules.

Pros: - Data quality: Required fields must be present, types must match - Consistency: All Person nodes have the same structure - Documentation: Schema serves as spec for what data looks like - Validation: Errors caught on write, not discovered later during queries

Cons: - Upfront design: Must think through structure before coding - Migration overhead: Changing schema requires careful planning - Less flexibility: Can't easily add one-off properties - Some databases lack support: Not all graph databases offer schema enforcement

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Schema definition (pseudo-code, syntax varies by database)
DEFINE NODE Person {
  PROPERTIES {
    name: STRING (required),
    age: INTEGER (required, min: 0, max: 150),
    email: STRING (required, format: email)
  }
}

// Now creation follows schema
CREATE (alice:Person {name: "Alice", age: 28, email: "alice@example.com"})  // ✅ Valid
CREATE (bob:Person {name: "Bob"})  // ❌ Error: missing required properties
CREATE (charlie:Person {name: "Charlie", age: "thirty"})  // ❌ Error: age must be INTEGER

Which Approach Should You Use?

It depends on your use case:

  • Schema-optional: Prototyping, evolving domains, heterogeneous data, document-like flexibility
  • Schema-enforced: Production systems, financial data, regulated industries, team coordination

Many graph databases let you mix approaches—enforce schemas for critical data (Person must have name and email) while allowing flexibility elsewhere (Person can optionally have any additional properties).

The key takeaway: graph databases give you the choice. RDBMS forces schema-first; document databases force schema-less; graphs let you decide what makes sense for your data.

Index-Free Adjacency: The Performance Secret

Okay, this is where things get a bit technical, but stick with me—this concept explains why graph databases are so fast at traversing relationships.

Index-free adjacency means that each node physically stores references (pointers) to its directly connected nodes. When you ask "What are Alice's friends?", the database doesn't search an index or scan a table—it follows the pointers stored in Alice's node directly to Bob, Charlie, and Diana.

How it works (simplified):

Imagine each node is a filing cabinet drawer. Inside Alice's drawer, there are literal pointers (references) to all the nodes she's connected to: - FRIEND_OF → [pointer to Bob, pointer to Charlie, pointer to Diana] - WORKS_AT → [pointer to TechCorp] - POSTED → [pointer to Post1, pointer to Post2]

When you query "Who are Alice's friends?", the database: 1. Finds Alice's node (one index lookup: O(log n)) 2. Looks inside Alice's node at the FRIEND_OF pointers 3. Follows those pointers directly to Bob, Charlie, Diana (three pointer dereferences: O(1) each)

Total time: O(log n) + O(3) ≈ constant time for practical purposes, regardless of how many total nodes exist in the database.

Contrast with RDBMS (index-based):

In a relational database: 1. Find Alice in Users table (index lookup: O(log n)) 2. Scan Friendships table for all rows where user1_id = Alice's ID (full table scan or index range scan: O(m) where m = number of friendships) 3. For each friendship, look up the friend in Users table (m index lookups: O(m log n))

As the number of friendships grows, performance degrades linearly or worse.

Why "index-free"?

Traditional databases use indexes—separate data structures that map values to locations. Finding Alice's friends requires looking up Alice's ID in a Friendships index, then following those references.

Graph databases don't need this intermediate step. The adjacency information is built into the node itself. Hence: index-free adjacency.

The abstract concept: By storing adjacency information (which nodes connect to which) directly in the nodes themselves, graph databases achieve constant-time neighbor access.

The concrete analogy: Imagine a massive hotel where every room has a list pinned to its door of all rooms it's connected to via hallways. If you're in room 42 and want to know which rooms you can reach directly, you just read the list on your door—instant answer. You don't need to consult a central directory or search floor plans. That's index-free adjacency.

Constant-Time Neighbor Access

This is the payoff of index-free adjacency. Constant-time neighbor access means that finding a node's direct neighbors takes the same amount of time regardless of: - How many total nodes exist in the graph - How many total edges exist in the graph - How connected other parts of the graph are

Whether your graph has 100 nodes or 100 million nodes, finding Alice's friends takes the same amount of time: find Alice (O(log n)), follow pointers (O(1) per friend).

This is why graph databases can handle multi-hop queries efficiently—each hop is constant-time, so three hops is 3× constant time, not exponential like with JOINs.

(Yes, we're repeating the performance story from earlier chapters. That's intentional! Repetition with new context helps solidify understanding.)

Traversal: Walking the Graph

Now that we know how graphs are structured and why they're fast, let's talk about actually using them. Traversal is the process of walking through a graph, following edges from node to node.

Think of traversal like following a trail through a forest, where each node is a waypoint and each edge is a path connecting waypoints. You start at one node and follow edges to reach other nodes.

Simple traversal example:

1
2
3
4
Start at Alice
→ Follow FRIEND_OF edge to Bob
→ Follow FRIEND_OF edge from Bob to Charlie
→ Follow WORKS_AT edge from Charlie to TechCorp

Traversals can be: - Single-hop: Follow one edge (Alice's direct friends) - Multi-hop: Follow multiple edges (friends of Alice's friends) - Filtered: Only follow certain edge types (FRIEND_OF but not WORKS_AT) - Conditional: Follow edges only if they meet criteria (friendships since 2020) - Depth-limited: Stop after N hops - Shortest path: Find the shortest route between two nodes

Traversal is fundamental to graphs. Almost every graph query involves some form of traversal—starting at one or more nodes and exploring outward along edges.

Graph Query: Asking Questions of Your Data

A graph query is a question you ask your data by specifying patterns to match and conditions to filter. Graph query languages (like Cypher, GSQL, or Gremlin) let you express complex traversals and pattern matching in readable code.

Example queries (in Cypher syntax):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
// Find Alice's direct friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->(friend)
RETURN friend.name;

// Find friends of Alice's friends (2-hop)
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF*2]-(friendOfFriend)
RETURN friendOfFriend.name;

// Find people who work at the same company as Alice
MATCH (alice:Person {name: "Alice"})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
RETURN coworker.name;

// Find posts liked by Alice's friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->(friend)-[:LIKED]->(post)
RETURN post.content, friend.name;

Don't worry if the syntax looks unfamiliar. The key point is that graph queries express patterns visually: - (alice:Person {name: "Alice"}) - Find a Person node named Alice - -[:FRIEND_OF]-> - Follow a FRIEND_OF edge - (friend) - To another node (call it "friend")

Graph queries feel more like describing what you're looking for ("people Alice is friends with") than instructing the database how to find it (JOIN this table to that table on this key).

Pattern Matching: Finding Shapes in the Graph

Pattern matching is the heart of graph querying. Instead of specifying "scan this table, join to that table," you describe a pattern—a shape or structure—and the database finds all instances of that pattern in the graph.

Patterns can be:

1. Simple paths:

1
2
// Alice → friend
(alice)-[:FRIEND_OF]->(friend)

2. Multi-hop paths:

1
2
// Alice → friend → friend of friend
(alice)-[:FRIEND_OF]->(friend)-[:FRIEND_OF]->(fof)

3. Complex shapes:

1
2
// Alice and Bob both like the same post
(alice:Person)-[:LIKED]->(post)<-[:LIKED]-(bob:Person)

4. Variable-length paths:

1
2
// Anyone connected to Alice via 1-3 FRIEND_OF hops
(alice)-[:FRIEND_OF*1..3]-(connected)

5. Triangles, cycles, and other structures:

1
2
// Find triangles: A → B → C → A
(a)-[:FRIEND_OF]->(b)-[:FRIEND_OF]->(c)-[:FRIEND_OF]->(a)

Pattern matching is declarative: you describe what you want, and the query planner figures out the efficient way to find it. This is similar to SQL's declarative nature, but pattern syntax is more intuitive for relationship queries.

The abstract concept: Pattern matching treats subgraphs as first-class entities you can search for, like using Ctrl+F to find text in a document, but for graph structures instead of strings.

The concrete analogy: Pattern matching is like describing a constellation to a stargazing app: "Find me three bright stars forming a triangle with a dimmer star in the middle." The app searches the sky and highlights all instances of that pattern. Similarly, graph pattern matching searches your data and returns all matching structures.

Multi-Hop Queries: Going Deep

Multi-hop queries traverse multiple edges in sequence, like following a chain: Alice → Bob → Charlie → Diana. Each "hop" is one edge traversal.

We've seen this before (remember the performance cliff from Chapters 1 and 2?), but let's revisit it with our new vocabulary.

1-hop query:

1
2
3
// Alice's direct friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->(friend)
RETURN friend;

2-hop query:

1
2
3
// Friends of Alice's friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->()-[:FRIEND_OF]->(fof)
RETURN fof;

3-hop query:

1
2
3
// Friends of friends of friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF*3]-(third_degree)
RETURN third_degree;

Variable-length multi-hop:

1
2
3
// Anyone connected to Alice via 1-5 friendship hops
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF*1..5]-(connected)
RETURN connected;

Why multi-hop queries matter:

In social networks: "Show me friends of friends for recommendations" In supply chains: "What happens if this supplier fails?" (5+ hop impact analysis) In fraud detection: "Find accounts within 3 hops of this suspicious account" In knowledge graphs: "How is concept A related to concept Z?"

Multi-hop queries are where graph databases shine. Thanks to index-free adjacency and constant-time neighbor access, these queries run in milliseconds even with millions of nodes, while the equivalent SQL queries would timeout or take hours.

(There's that repetition again. Starting to feel familiar, right?)

Path Patterns: Expressing Complex Routes

Path patterns are a way to specify sequences of nodes and edges with varying levels of specificity. They're the building blocks of graph queries.

Types of path patterns:

1. Fixed-length paths:

1
(a)-[:FRIEND_OF]->(b)-[:FRIEND_OF]->(c)  // Exactly 2 hops

2. Variable-length paths:

1
2
(a)-[:FRIEND_OF*1..5]->(connected)  // 1 to 5 hops
(a)-[:FRIEND_OF*]-(connected)       // Any number of hops (use carefully!)

3. Mixed edge types:

1
(alice)-[:FRIEND_OF]->(friend)-[:WORKS_AT]->(company)  // Friend who works at a company

4. Undirected paths:

1
(alice)-[:FRIEND_OF]-(connected)  // Friends in either direction

5. Shortest path:

1
2
MATCH p = shortestPath((alice)-[:FRIEND_OF*]-(bob))
RETURN p;  // Find shortest route from Alice to Bob

Path patterns let you express complex relationship queries concisely. Instead of writing nested loops and JOIN logic, you just describe the path structure you're looking for.

Aggregation: Computing Over Results

Like SQL, graph query languages support aggregation—computing statistics, sums, averages, counts over query results.

Common aggregations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
// Count Alice's friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->(friend)
RETURN count(friend);

// Average age of Alice's friends
MATCH (alice:Person {name: "Alice"})-[:FRIEND_OF]->(friend)
RETURN avg(friend.age);

// Total amount spent by Alice
MATCH (alice:Person {name: "Alice"})-[p:PURCHASED]->(product)
RETURN sum(p.price);

// Most popular posts (by likes)
MATCH (post:Post)<-[like:LIKED]-()
RETURN post.content, count(like) AS like_count
ORDER BY like_count DESC
LIMIT 10;

Aggregations work after traversals and pattern matches, letting you compute metrics over the results. This combines the power of graph traversal with the analytical capabilities of SQL.

Graph Schema: Optional Structure

A graph schema defines the expected structure of your graph—what node labels exist, what edge types connect them, what properties each has. As we discussed earlier, schemas can be enforced or just documented.

Example schema definition (pseudo-code):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
NODE LABELS:
  Person {name: string, age: integer, email: string}
  Post {content: string, timestamp: datetime}
  Company {name: string, industry: string}

EDGE TYPES:
  FRIEND_OF: Person → Person {since: date}
  POSTED: Person → Post {timestamp: datetime}
  LIKED: Person → Post {timestamp: datetime}
  WORKS_AT: Person → Company {title: string, start_date: date}

Why schemas are helpful even if not enforced:

  • Documentation: New developers can see expected structure
  • Tooling: Graph visualization tools can use schema to render nicely
  • Validation: Application code can validate before inserting
  • Query optimization: Database can optimize queries based on known structure

Some graph databases (like Neo4j with APOC procedures or TigerGraph with GSQL) let you define and enforce schemas. Others (like Neo4j's core) are schema-optional but let you create constraints (e.g., "every Person must have a unique email").

Metadata Representation: Data About Data

Metadata is data about your data. In graphs, metadata can exist at multiple levels:

Node metadata:

1
2
3
4
5
6
7
(user:Person {
  name: "Alice",
  created_at: "2020-01-15T10:00:00",
  created_by: "admin_user",
  last_modified: "2025-01-18T14:30:00",
  version: 3
})

Edge metadata:

1
2
3
4
5
6
(alice)-[:FRIEND_OF {
  since: "2020-05-12",
  confirmed_by: "alice",
  confidence: 0.95,
  source: "facebook_import"
}]->(bob)

Graph-level metadata: Some systems support graph-wide metadata (when the database was created, who owns it, what application version uses it).

Metadata is just properties, but with a special purpose: tracking provenance, versioning, audit trails, data quality metrics, or other information about how the data came to be.

Graph Validation, Document Validation, and Rule Systems

As graphs grow, you might want to enforce quality and consistency. This is where validation and rules come in.

Graph Validation

Graph validation checks that your graph adheres to structural rules:

  • "Every Person node must have a name property"
  • "Every PURCHASED edge must connect a Person to a Product (not to another Person)"
  • "No FRIEND_OF edges can form a loop (person can't be their own friend)"

Some databases support validation constraints natively:

1
2
3
4
5
6
// Neo4j constraint examples
CREATE CONSTRAINT person_name IF NOT EXISTS
FOR (p:Person) REQUIRE p.name IS NOT NULL;

CREATE CONSTRAINT person_email_unique IF NOT EXISTS
FOR (p:Person) REQUIRE p.email IS UNIQUE;

Document Validation

Document validation treats nodes like JSON documents and validates their structure:

1
2
3
4
5
6
7
// Validation schema (conceptual)
PersonSchema = {
  name: { type: "string", required: true },
  age: { type: "integer", min: 0, max: 150 },
  email: { type: "string", format: "email", required: true },
  friends: { type: "array", items: { type: "reference", to: "Person" } }
}

This is similar to JSON Schema or MongoDB's document validation, applied to graph nodes.

Rule Systems

Rule systems let you define application logic that executes when data changes:

Example rules:

  • "When a PURCHASED edge is created, increment the product's sales_count property"
  • "When two people become FRIEND_OF each other, create a MUTUAL_FRIENDS edge"
  • "When a person's age reaches 18, add the Adult label"

Some graph databases support rules/triggers directly; others require application-level implementation.

These systems help maintain data integrity, enforce business logic, and automate derived data updates.

Bringing It All Together: A Complete Example

Let's pull everything together with a comprehensive example that uses all the concepts we've covered.

Scenario: A simplified LinkedIn-like professional network

Graph data model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
NODES:
  Person {name, email, age, city, headline}
  Company {name, industry, size, founded}
  Skill {name, category}
  Post {content, timestamp}

EDGES:
  FRIEND_OF: Person → Person {since, closeness}
  WORKS_AT: Person → Company {title, start_date, current}
  HAS_SKILL: Person → Skill {proficiency, years_experience}
  POSTED: Person → Post {timestamp}
  LIKED: Person → Post {timestamp}
  ENDORSED: Person → Person {skill, timestamp} // Person endorses Person for a skill

Sample data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// Create nodes
CREATE (alice:Person {name: "Alice Johnson", email: "alice@example.com", age: 28, city: "Seattle", headline: "Software Engineer"})
CREATE (bob:Person {name: "Bob Smith", email: "bob@example.com", age: 35, city: "Portland", headline: "Engineering Manager"})
CREATE (techcorp:Company {name: "TechCorp", industry: "Software", size: 500, founded: 1995})
CREATE (python:Skill {name: "Python", category: "Programming"})
CREATE (leadership:Skill {name: "Leadership", category: "Soft Skills"})
CREATE (post1:Post {content: "Excited to share our new product launch!", timestamp: "2025-01-18T10:00:00"})

// Create edges
CREATE (alice)-[:FRIEND_OF {since: "2020-05-12", closeness: 0.8}]->(bob)
CREATE (alice)-[:WORKS_AT {title: "Software Engineer", start_date: "2022-03-01", current: true}]->(techcorp)
CREATE (bob)-[:WORKS_AT {title: "Engineering Manager", start_date: "2019-06-15", current: true}]->(techcorp)
CREATE (alice)-[:HAS_SKILL {proficiency: 9, years_experience: 6}]->(python)
CREATE (bob)-[:HAS_SKILL {proficiency: 7, years_experience: 3}]->(python)
CREATE (bob)-[:HAS_SKILL {proficiency: 8, years_experience: 10}]->(leadership)
CREATE (bob)-[:POSTED {timestamp: "2025-01-18T10:00:00"}]->(post1)
CREATE (alice)-[:LIKED {timestamp: "2025-01-18T10:05:00"}]->(post1)
CREATE (alice)-[:ENDORSED {skill: "Leadership", timestamp: "2025-01-15"}]->(bob)

Interesting queries:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// 1. Find Alice's coworkers
MATCH (alice:Person {name: "Alice Johnson"})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
WHERE coworker <> alice
RETURN coworker.name, coworker.headline;

// 2. Find people Alice knows who have Python skills
MATCH (alice:Person {name: "Alice Johnson"})-[:FRIEND_OF]-(friend)-[:HAS_SKILL]->(skill:Skill {name: "Python"})
RETURN friend.name, skill.name;

// 3. Find posts from Alice's network (friends and coworkers)
MATCH (alice:Person {name: "Alice Johnson"})
MATCH (alice)-[:FRIEND_OF|WORKS_AT*1..2]-(connection)-[:POSTED]->(post)
RETURN post.content, connection.name, post.timestamp
ORDER BY post.timestamp DESC;

// 4. Skill recommendations: Skills that Alice's coworkers have but Alice doesn't
MATCH (alice:Person {name: "Alice Johnson"})-[:WORKS_AT]->(company)<-[:WORKS_AT]-(coworker)
MATCH (coworker)-[:HAS_SKILL]->(skill)
WHERE NOT (alice)-[:HAS_SKILL]->(skill)
RETURN skill.name, count(coworker) AS coworker_count
ORDER BY coworker_count DESC;

// 5. Find people within Alice's 2nd-degree network in Seattle
MATCH (alice:Person {name: "Alice Johnson"})-[:FRIEND_OF*1..2]-(connection:Person {city: "Seattle"})
RETURN DISTINCT connection.name, connection.headline;

Do you see how all the concepts come together? We have: - ✅ Nodes (Person, Company, Skill, Post) - ✅ Edges (FRIEND_OF, WORKS_AT, HAS_SKILL, POSTED, LIKED, ENDORSED) - ✅ Properties (on both nodes and edges) - ✅ Labels (organizing nodes and edge types) - ✅ Traversal (following edges from Alice outward) - ✅ Pattern matching (finding specific structures like "coworkers") - ✅ Multi-hop queries (friends of friends, 1-2 degrees) - ✅ Aggregation (counting skills, ordering by timestamp)

This is a real graph data model! You could build an actual application on this structure.

Taking a Breath: What We've Covered

Okay, that was a lot. Let's recap the major concepts:

Core building blocks: 1. ✅ Labeled Property Graph - The overall model 2. ✅ Nodes - Entities (things that exist) 3. ✅ Edges - Relationships (connections between things) 4. ✅ Properties - Attributes (details about nodes and edges) 5. ✅ Labels - Categories (organizing nodes and edges into types)

Advanced concepts: 6. ✅ First-Class Relationships - Edges are real, queryable entities 7. ✅ Edge Direction - Relationships have direction but can be traversed either way 8. ✅ Index-Free Adjacency - How graphs achieve constant-time neighbor access 9. ✅ Constant-Time Neighbor Access - Why graphs are fast 10. ✅ Graph Data Model - The structure describing your graph 11. ✅ Schema-Optional Modeling - No schema required, flexible structure 12. ✅ Schema-Enforced Modeling - Schema required, guaranteed consistency 13. ✅ Graph Schema - Defining expected structure 14. ✅ Traversal - Walking through the graph 15. ✅ Graph Query - Asking questions of your data 16. ✅ Pattern Matching - Finding structures/shapes in the graph 17. ✅ Path Patterns - Expressing complex routes 18. ✅ Multi-Hop Queries - Traversing multiple edges 19. ✅ Aggregation - Computing statistics over results 20. ✅ Metadata Representation - Data about data 21. ✅ Graph Validation - Enforcing structural rules 22. ✅ Document Validation - Validating node/edge properties 23. ✅ Rule Systems - Automating logic and maintaining integrity

All 23 concepts, covered!

If your head is spinning, that's completely normal. You've just been introduced to an entire data model in one chapter. The key isn't to memorize every definition—it's to understand the big picture:

Graphs represent connected data naturally using nodes (things) and edges (connections), with properties (details) and labels (categories). This structure enables fast traversal and intuitive querying through pattern matching. Schemas are optional, giving you flexibility to choose between agility and consistency. The result: a database that mirrors how you think about relationships.

Building Confidence Through Repetition

Remember how we promised repetition would help these concepts click? Here's the same model explained three different ways:

Version 1 (Concrete analogy): Imagine a corkboard full of index cards (nodes) connected by colored strings (edges). Each card has information written on it (properties) and a category sticker (label). You can trace along the strings to see how cards connect (traversal), and you can search for specific patterns like "cards connected by red strings" (pattern matching).

Version 2 (Abstract model): A labeled property graph consists of vertices (V) and edges (E), where E ⊆ V × V, with both vertices and edges possessing arbitrary key-value properties (P) and categorical labels (L). Traversal operations navigate E relationships in O(1) time via index-free adjacency, enabling efficient pattern matching across variable-length paths.

Version 3 (Practical example): Think about Facebook: your profile is a node, your friendship with someone is an edge, your name and age are properties, and "Person" is a label. When you view a friend's profile, the database doesn't search a table—it follows a direct pointer from your node to theirs. When you see "people you may know," that's a 2-hop traversal (friends of your friends) running in milliseconds.

Same concepts, three angles. Which explanation resonates with you? Probably one more than the others, and that's fine—people learn differently. The important thing is that the core ideas are getting reinforced.

Moving Forward

In the next chapter, we'll dive into query languages—specifically Cypher and GSQL—and you'll get hands-on experience writing real graph queries. All the concepts we introduced here (nodes, edges, traversal, pattern matching) will show up again in concrete query syntax.

And you know what? When you see (alice:Person)-[:FRIEND_OF]->(bob) in the next chapter, it won't feel weird anymore. You'll think "Okay, that's a Person node named alice, connected by a FRIEND_OF edge to another Person node named bob." It'll feel natural.

That's the power of repetition and varied examples. The concepts we introduced here will keep showing up throughout the rest of this course, and each time you'll understand them a little better.

Key Takeaways

  1. The Labeled Property Graph model has four components: nodes (entities), edges (relationships), properties (attributes), and labels (categories)

  2. Relationships are first-class citizens in graph databases—they're real, queryable entities with their own properties and identities

  3. Index-free adjacency means nodes store direct pointers to connected nodes, enabling constant-time neighbor access regardless of graph size

  4. Schema flexibility lets you choose: schema-optional for agility and evolution, or schema-enforced for consistency and data quality

  5. Traversal and pattern matching make graph queries intuitive—you describe the structure you're looking for, not how to find it

  6. Multi-hop queries are where graphs shine, maintaining performance even at 5+ hops while RDBMS queries timeout

  7. Validation and rules help maintain data quality as graphs grow, though approaches vary by database

Most importantly: Don't worry if this all feels overwhelming. These concepts will appear again and again throughout this course, in queries, in examples, in case studies. Each repetition will strengthen your understanding. By the end of the course, you'll be thinking in graphs naturally.

Ready for the next step? Chapter 4 awaits, where all this theory becomes practice through actual query languages.


Graph databases ask you to think differently about data, but that difference is actually more natural—it's how you already think about the connected world around you. The learning curve isn't steep; it's just unfamiliar. Give it time, and it'll click.