Performance, Metrics, and Benchmarking
Summary
This chapter explores the performance characteristics that make graph databases excel at relationship-heavy workloads. You'll learn about index-free adjacency's constant-time neighbor access, understand key performance metrics like hop count and node degree, and master indexing strategies including vector indexes and composite indexes. The chapter covers industry-standard benchmarks like LDBC SNB and Graph 500, teaching you how to measure query latency, throughput, and scalability while comparing traversal costs against traditional join operations.
Concepts Covered
This chapter covers the following 20 concepts from the learning graph:
- Hop Count
- Degree of Node
- Indegree
- Outdegree
- Edge-to-Node Ratio
- Graph Indexes
- Vector Indexes
- Full-Text Search
- Composite Indexes
- Graph Metrics
- Performance Benchmarking
- Synthetic Benchmarks
- Single-Node Benchmarks
- Multi-Node Benchmarks
- LDBC SNB Benchmark
- Graph 500
- Query Cost Analysis
- Join Operations
- Traversal Cost
- Scalability
Prerequisites
This chapter builds on concepts from:
Show Me the Numbers: Why Skepticism Makes You a Better Engineer
Let's get something straight: If your colleagues are skeptical when you propose using a graph database, that's a good sign. It means you work with real engineers who don't just chase the latest trend or trust vendor marketing. Skepticism is the foundation of good engineering.
You've probably heard the claims: "Graph databases are 1000× faster for connected data!" "Traversals that take minutes in SQL run in milliseconds!" "This will revolutionize our data layer!"
And you should absolutely, positively, 100% question those claims.
Here's the thing though: good engineers don't just stay skeptical—they test. They measure. They benchmark. They gather data. And then they make informed decisions based on evidence, not hype.
This chapter is about becoming that engineer. The one who walks into a meeting with actual benchmark results. The one who can say, "I tested this on our data model, and here's what I found." The one whose opinion carries weight because it's backed by numbers, not vendor promises.
When you propose a graph database (or any technology) and someone says, "Prove it," you'll be able to say, "I did. Here are the benchmarks." That's how you build a reputation as someone who evaluates objectively, measures rigorously, and makes data-driven decisions.
Your skeptical colleagues will respect that. And honestly? You should be skeptical too. Trust, but verify. Or better yet: verify, then trust.
Ready to learn how to benchmark graph databases properly? Let's dive into the metrics that matter, the tests that prove (or disprove) the claims, and how to build your reputation as the engineer who brings receipts.
Graph Metrics: The Numbers That Tell the Story
Before we can benchmark anything, we need to understand what to measure. Graph databases have some unique performance characteristics tied to graph structure. Let's break down the key metrics.
Hop Count: Measuring Relationship Distance
Hop count is the number of edges you traverse to get from one node to another. It's one of the most important performance metrics for graph databases.
Why it matters: In graph databases, query performance is heavily influenced by how many hops you need to traverse. Remember the performance cliff from Chapter 1? That's hop count in action.
Examples:
- 1-hop: Alice's direct friends → (alice)-[:FRIEND_OF]->(friend)
- 2-hop: Friends of friends → (alice)-[:FRIEND_OF]->()-[:FRIEND_OF]->(fof)
- 3-hop: Friends of friends of friends → (alice)-[:FRIEND_OF*3]-(connection)
Performance relationship: - Graph databases: Linear or near-linear scaling with hop count - RDBMS with JOINs: Exponential degradation with hop count
Measuring hop count in queries:
1 2 3 4 | |
Benchmark insight: When you benchmark, always measure performance across different hop counts (1, 2, 3, 4, 5). This shows whether the database maintains linear performance or hits the exponential cliff.
Degree of Node: Measuring Connectivity
Degree is the number of relationships connected to a node. High-degree nodes (hubs) can significantly impact query performance.
Why it matters: Traversing from a high-degree node can be expensive if you're following all relationships. A celebrity with 10 million followers is a very different query than a regular user with 200 friends.
Types of degree:
Degree (total):
1 2 3 | |
Use case: Understanding overall connectivity, finding hub nodes, identifying influencers
Indegree: Incoming Relationships
Indegree is the number of incoming relationships (edges pointing to the node).
1 2 3 | |
Examples: - Social media: Follower count (indegree) vs following count (outdegree) - Citation graphs: How many papers cite this paper (indegree) - Supply chain: How many suppliers feed into this factory (indegree)
Benchmark insight: High-indegree nodes can slow down certain queries. When benchmarking, test queries that traverse from high-indegree nodes to see if performance degrades.
Outdegree: Outgoing Relationships
Outdegree is the number of outgoing relationships (edges pointing from the node).
1 2 3 | |
Examples: - Social media: Following count - Citation graphs: How many papers this paper cites - Supply chain: How many customers this supplier serves
Performance consideration: Expanding from a high-outdegree node (e.g., a user who follows 50,000 people) can be slow if you need to process all relationships.
Degree Distribution: The Shape of Your Graph
Real-world graphs often follow a power-law distribution: most nodes have low degree, a few nodes have very high degree (the "hubs").
Why this matters for benchmarking: - Queries that hit low-degree nodes perform differently than queries hitting hubs - Benchmarks should test both average-degree and high-degree scenarios - Your synthetic test data should match the degree distribution of your real data
Finding high-degree nodes:
1 2 3 4 5 | |
Benchmark best practice: When generating synthetic data, make sure degree distribution matches production. If your production graph has power-law distribution, your benchmark should too.
Edge-to-Node Ratio: Graph Density
Edge-to-node ratio measures how densely connected your graph is.
Formula:
1 | |
Examples: - Sparse graph: E/N = 2 (average node has 2 connections) - Medium density: E/N = 10 (average node has 10 connections) - Dense graph: E/N = 100 (average node has 100 connections)
Why it matters: Dense graphs behave differently than sparse graphs. Variable-length path queries on dense graphs can explode combinatorially.
Measuring your graph:
1 2 3 4 5 6 7 | |
Benchmark insight: Always report edge-to-node ratio when publishing benchmark results. A benchmark on a sparse graph (E/N = 2) vs dense graph (E/N = 50) tells very different stories.
Graph Indexes: Making Queries Fast
Indexes are critical for graph database performance. You can't benchmark properly without understanding how indexes work.
Graph Indexes: Finding Starting Points
Graph indexes help you find nodes quickly to start your traversal. Without indexes, you're scanning every node—slow and painful.
Why they matter: Most Cypher queries start with a MATCH that finds specific nodes:
1 2 | |
That {name: "Alice"} lookup needs an index, or the database scans all Person nodes.
Creating indexes in Neo4j:
1 2 3 4 5 6 7 8 | |
Benchmark impact: - Without index: Query scans all nodes → O(n) time - With index: Query uses index seek → O(log n) time
On a 10-million-node graph: - Without index: ~5 seconds - With index: ~5 milliseconds
That's a 1000× difference!
Benchmark best practice: When comparing graph databases to RDBMS, make sure both have appropriate indexes. An unfair comparison (indexed RDBMS vs non-indexed graph DB) proves nothing.
Vector Indexes: Similarity Search at Scale
Vector indexes enable similarity search and nearest-neighbor queries, crucial for recommendation systems and AI applications.
What they do: Store high-dimensional vectors (embeddings) and enable fast similarity search.
Example use case:
1 2 3 4 5 6 | |
Why this matters for graph databases: Modern applications combine graph relationships with vector similarity. You might: - Find friends who like similar content (graph + vectors) - Recommend products based on purchase patterns AND product similarity - Detect fraud rings using network structure AND behavioral embeddings
Benchmark consideration: If your use case involves similarity search, benchmark vector index performance alongside graph traversal. Many graph databases now support vector indexes natively (Neo4j, Amazon Neptune, TigerGraph).
Full-Text Search: Finding Nodes by Content
Full-text search indexes allow searching node properties using text matching, wildcards, and fuzzy search.
Creating full-text index:
1 2 | |
Using full-text search:
1 2 3 4 5 | |
Benchmark scenario: If your application needs text search (e.g., "find all customers who mentioned 'defect' in support tickets"), include full-text queries in your benchmark suite.
Composite Indexes: Multi-Property Lookups
Composite indexes index multiple properties together, enabling fast lookups on property combinations.
Example:
1 2 3 4 5 6 7 | |
Why composite indexes matter: Single-property indexes only help when filtering on that property. If you filter on city AND age, a composite index can use both properties for the lookup.
Benchmark insight: When benchmarking, test realistic query patterns. If your application frequently filters on multiple properties, composite indexes can dramatically improve performance.
Performance Benchmarking: Measuring What Matters
Now we get to the heart of this chapter: actually measuring performance. This is where you stop trusting vendor claims and start generating your own data.
Why Benchmark?
Good reasons to benchmark: 1. Verify vendor claims: Is it really 1000× faster? Let's find out. 2. Compare alternatives: RDBMS vs Graph DB vs Document DB—which fits your data? 3. Capacity planning: How many QPS can this system handle? 4. Optimize queries: Which query pattern is fastest? 5. Build credibility: Walk into meetings with data, not opinions
Bad reasons to benchmark: 1. ❌ "I want graphs to win" (confirmation bias) 2. ❌ "I need to justify a decision I already made" (motivated reasoning) 3. ❌ "The vendor showed me a benchmark" (trusting without verifying)
The mindset: A good benchmark should be designed to find the truth, even if that truth is "graph databases aren't right for this use case."
What to Measure
Key performance metrics:
- Query latency: How long does one query take?
- Median (p50): Typical case
- 95th percentile (p95): Most users' experience
- 99th percentile (p99): Worst typical case
-
Max: Absolute worst case
-
Query throughput: How many queries per second (QPS) can the system handle?
- Under light load (10 concurrent users)
- Under realistic load (100 concurrent users)
-
Under stress (1000+ concurrent users)
-
Scalability: How does performance change as data grows?
- 1 million nodes
- 10 million nodes
- 100 million nodes
-
1 billion nodes
-
Resource usage:
- CPU utilization
- Memory consumption
- Disk I/O
- Network bandwidth (for distributed systems)
Measuring latency in Cypher:
1 2 3 4 | |
External benchmarking: For accurate measurements, use external tools that measure end-to-end latency including network overhead: - Apache JMeter - Gatling - Custom scripts with time measurements
Synthetic Benchmarks: Controlled Testing
Synthetic benchmarks use generated data and predefined queries to test specific performance characteristics.
Advantages: - Controlled: You control data size, structure, and query patterns - Reproducible: Others can replicate your results - Scalable: Easy to generate 1M, 10M, 100M node datasets - Comparable: Industry-standard benchmarks let you compare across systems
Disadvantages: - Artificial: May not reflect your real workload - Optimizable: Vendors can tune for specific benchmarks - Misleading: High synthetic performance doesn't guarantee real-world performance
When to use synthetic benchmarks: - Comparing different graph databases - Testing specific features (shortest path, variable-length paths) - Capacity planning (how does it scale?) - Proving/disproving specific performance claims
Example synthetic workload:
1 2 3 4 5 6 7 8 | |
Single-Node Benchmarks: Testing One Server
Single-node benchmarks test performance on a single database server (not distributed).
What they measure: - Raw query performance - Index efficiency - Memory management - Single-machine limits
Example setup: - Hardware: 64GB RAM, 16 CPU cores, SSD storage - Database: Neo4j Community Edition, single instance - Dataset: LDBC SNB SF10 (10GB scale factor) - Workload: Interactive queries (1-7 hops)
Why single-node benchmarks matter: Most applications start with a single database server. Understanding single-node performance helps you know when you need to scale out.
Benchmark best practice: Always specify hardware specs when publishing results. "Graph DB is faster" means nothing without context. "Graph DB on 64GB machine vs PostgreSQL on 64GB machine" is actionable data.
Multi-Node Benchmarks: Testing Distributed Systems
Multi-node benchmarks test performance on distributed database clusters.
What they measure: - Horizontal scalability (add more servers = more performance?) - Network overhead - Distributed query coordination - Fault tolerance
Example setup: - Cluster: 10 nodes × 64GB RAM each - Database: TigerGraph distributed cluster - Dataset: Graph500 scale 30 (1 billion edges) - Workload: BFS, PageRank, Connected Components
Key metrics: - Speedup: How much faster is 10 nodes vs 1 node? - Linear speedup: 10× faster (ideal) - Sub-linear: 5× faster (realistic) - Worse: 2× faster (poor scaling)
- Efficiency: Speedup / Number of nodes
- 100% efficiency: 10 nodes = 10× faster
- 50% efficiency: 10 nodes = 5× faster
- 20% efficiency: 10 nodes = 2× faster (not good)
Why multi-node benchmarks matter: If you're considering a distributed graph database for billion-node graphs, you need to know if it actually scales or just adds operational complexity.
Industry-Standard Benchmarks: The Tests Everyone Uses
Instead of creating benchmarks from scratch, you can use industry-standard benchmarks that have been carefully designed and widely adopted.
LDBC SNB: The Social Network Benchmark
LDBC Social Network Benchmark (SNB) is the gold standard for graph database benchmarking. It simulates a social network like Facebook or LinkedIn.
What it includes: - Data generator: Creates synthetic social network data at various scales - Workloads: - Interactive (BI): Short, frequent queries (OLTP-style) - Business Intelligence: Complex analytical queries (OLAP-style) - Graph Analytics: Algorithm benchmarks
Scale factors: - SF1: ~1GB data (~1M persons) - SF10: ~10GB data (~10M persons) - SF100: ~100GB data (~100M persons) - SF1000: ~1TB data (~1B persons)
Example queries: - IC1: Find friends and recent posts (1-2 hops) - IC2: Find recent messages from friends (1-hop + filtering) - IC3: Find friends who know specific people in specific locations (2-3 hops) - IC13: Find shortest path between two people
Why LDBC SNB is valuable: 1. Realistic: Based on actual social network patterns 2. Comprehensive: Tests many query types 3. Scalable: Multiple scale factors 4. Comparable: Published results from Neo4j, TigerGraph, Amazon Neptune, etc. 5. Audited: Results can be officially audited for fairness
Running LDBC SNB:
1 2 3 4 5 6 7 8 | |
Benchmark credibility: If you publish LDBC SNB results, people will take them seriously. It's the industry standard.
Graph 500: The Supercomputer Benchmark
Graph 500 is a benchmark for very large-scale graph processing, originally designed for supercomputers and HPC systems.
What it measures: - Breadth-First Search (BFS): Starting from a source vertex, visit all reachable vertices - Single-Source Shortest Paths (SSSP): Find shortest paths from one vertex to all others - Performance metric: Traversed Edges Per Second (TEPS)
Scale: - Scale 20: ~1 million vertices, ~10 million edges - Scale 30: ~1 billion vertices, ~17 billion edges - Scale 40: ~1 trillion vertices, ~17 trillion edges
Example problem:
1 2 3 4 5 6 | |
Why Graph 500 matters: It shows the absolute limits of graph processing. Supercomputers achieve billions of TEPS. Graph databases achieve millions of TEPS but with much more query flexibility.
When to use Graph 500: - Testing distributed graph databases - Comparing graph algorithms (BFS, PageRank, etc.) - Understanding theoretical limits - Not useful for typical CRUD operations
Benchmark insight: Graph 500 shows that specialized HPC systems are faster for pure graph algorithms, but graph databases offer rich query languages, ACID transactions, and operational features that HPC systems don't.
Query Cost Analysis: Understanding Performance Trade-offs
Let's get quantitative about why graph databases outperform RDBMS for relationship queries.
Join Operations: The RDBMS Approach
Join operations are how RDBMS systems traverse relationships. Each hop requires a JOIN.
Example: 3-hop friend query in SQL
1 2 3 4 5 6 7 8 9 10 | |
Cost analysis:
Per JOIN operation: 1. Find rows in first table: O(log n) with index, O(n) without 2. For each row, look up matching rows in second table: O(m log k) where m = rows from first table, k = rows in second table 3. Build result set: O(m × r) where r = average matches per row
For 3 hops: - First JOIN: O(n₁ log n₂) - Second JOIN: O(intermediate₁ log n₃) - Third JOIN: O(intermediate₂ log n₄)
Problem: Intermediate result sets grow exponentially with hops. On a social network: - 1 hop: 200 friends (200 rows) - 2 hops: 200 × 200 = 40,000 rows (intermediate set) - 3 hops: 40,000 × 200 = 8,000,000 rows (intermediate set)
Database must: - Build intermediate result sets in memory (or spill to disk—even slower) - Perform nested loop joins or hash joins (both expensive) - Filter duplicates at the end
Why it's slow: The work grows exponentially with hops, even with indexes.
Traversal Cost: The Graph Database Approach
Traversal in graph databases follows pointers directly from node to node.
Same 3-hop query in Cypher:
1 2 | |
Cost analysis:
Per hop: 1. Find starting node: O(log n) with index 2. Read relationship pointers: O(1) per relationship (index-free adjacency) 3. Follow pointer to next node: O(1) per hop 4. Check filters: O(1) per node
For 3 hops with 200 friends per person: - Find Alice: O(log n) ≈ 20 operations (with index on 1M nodes) - Traverse 1st hop: 200 pointer lookups = 200 operations - Traverse 2nd hop: 200 × 200 = 40,000 operations - Traverse 3rd hop: 40,000 × 200 = 8,000,000 operations
Wait, that's the same number! Why is it faster?
The difference: 1. No intermediate result sets: Graph DBs don't build 40,000-row tables in memory 2. Pointer lookups vs index lookups: O(1) pointer dereference vs O(log n) index seek 3. Locality of reference: Graph data stored together, better cache performance 4. No JOIN overhead: No hash tables, no nested loops, no sort-merge
In practice: - RDBMS 3-hop: 3-10 seconds (with indexes) - Graph DB 3-hop: 10-50 milliseconds
That's 100-1000× faster.
Why? Constant factors matter. O(1) pointer lookup vs O(log n) index lookup, multiplied by millions of operations, creates huge differences.
Comparative Benchmark: RDBMS vs Graph DB
Let's look at real benchmark numbers (these are representative of published results):
Dataset: Social network, 10 million people, 100 million friendships
Query: Find friends up to N hops away
| Hops | RDBMS (PostgreSQL) | Graph DB (Neo4j) | Speedup |
|---|---|---|---|
| 1 | 12 ms | 5 ms | 2.4× |
| 2 | 185 ms | 7 ms | 26× |
| 3 | 3,400 ms | 11 ms | 309× |
| 4 | 58,000 ms | 14 ms | 4,142× |
| 5 | timeout (>10 min) | 18 ms | >33,000× |
Observations: 1. 1-hop: Graph DB only 2× faster (both are fast) 2. 2-3 hops: Graph DB 10-300× faster (noticeable difference) 3. 4+ hops: Graph DB 1000s of times faster (RDBMS becomes unusable)
The performance cliff: Around 2-3 hops, RDBMS performance falls off a cliff. Graph databases maintain near-constant performance.
Benchmark credibility: These numbers match published LDBC results and vendor benchmarks. You can replicate them yourself.
Scalability: How Systems Grow
Scalability measures how performance changes as data size or load increases.
Types of Scalability
Vertical scaling (scale up): - Add more resources to one server (CPU, RAM, SSD) - Limits: Single machine limits (~1TB RAM, ~100 CPU cores) - Cost: Expensive (high-end servers cost $$$$)
Horizontal scaling (scale out): - Add more servers to a cluster - Limits: Coordination overhead, network bandwidth - Cost: Cheaper per node, but operational complexity
Graph databases support both: - Single-node: Neo4j, Amazon Neptune (serverless), Memgraph - Distributed: TigerGraph, Neo4j Aura Enterprise, JanusGraph, Dgraph
Measuring Scalability
Ideal scaling: - 2× data → same query time (constant performance) - 2× servers → 2× throughput (linear horizontal scaling)
Real-world scaling: - 2× data → 1.5× query time (sub-linear degradation—acceptable) - 2× servers → 1.6× throughput (80% efficiency—good)
Scalability benchmark:
1 2 3 4 5 6 7 8 9 10 | |
Log-scale growth: Query time grows logarithmically with data size (due to index lookups). This is excellent scalability.
Poor scalability example:
1 2 3 4 5 6 | |
Linear growth: Query time grows linearly or worse with data size. This doesn't scale.
Benchmark Best Practices: Doing It Right
Now that you understand the metrics and benchmarks, let's talk about how to run fair, credible benchmarks.
1. Match the hardware: - Same CPU, RAM, SSD across all systems tested - Document specs: "64GB RAM, 16 cores, 1TB NVMe SSD"
2. Use appropriate indexes: - Don't compare indexed Graph DB to non-indexed RDBMS - Create indexes that match query patterns - Report which indexes you created
3. Warm up the cache: - Run queries multiple times before measuring - Report cold-cache and warm-cache performance separately
4. Measure realistic workloads: - Don't just test best-case queries - Include complex queries that stress the system - Test at realistic concurrency levels
5. Report distributions, not just averages: - Median (p50), 95th percentile (p95), 99th percentile (p99) - Max latency matters for user experience
6. Test at realistic data sizes: - If production will have 100M nodes, test at 100M nodes - Synthetic benchmarks at 1M nodes don't predict 100M node performance
7. Document everything: - Database versions - Configuration settings (cache size, query timeout, etc.) - Data model (schema, indexes) - Query text - Hardware specs - Methodology (how you generated load, measured latency, etc.)
8. Share your results: - Publish to GitHub - Write a blog post - Present at meetups - Let others replicate and verify
Building Your Reputation: The Data-Driven Engineer
Here's why all this benchmarking knowledge matters for your career:
Scenario 1: The Skeptical Tech Lead
You propose using a graph database for the new recommendation engine.
Tech Lead: "Graph databases are just hype. SQL works fine."
You (without benchmarks): "But... the vendor said it's 1000× faster..."
Result: Dismissed as naive.
You (with benchmarks): "I benchmarked our data model—10M users, 100M relationships. Here's what I found:"
| Query | PostgreSQL | Neo4j | Speedup |
|---|---|---|---|
| 2-hop friends | 180 ms | 7 ms | 25× |
| 3-hop recommendations | 3.2 sec | 11 ms | 290× |
"I ran LDBC SNB at SF10 scale. Graph DB stays under 50ms for all queries. PostgreSQL timeouts at 3 hops. Here's the GitHub repo with my test scripts."
Result: Tech lead respects you. You get to run a proof-of-concept.
Scenario 2: The Budget Discussion
CFO: "Why do we need to spend $50k/year on a graph database?"
You (without data): "It's faster...?"
Result: Budget denied.
You (with data): "Our current recommendation engine runs overnight batch jobs—12-hour runtime. Users see stale recommendations. I benchmarked a graph database: same recommendations in 50 milliseconds, real-time. We can retire 20 batch processing servers ($80k/year cost). ROI is positive year one."
Result: Budget approved. You're a hero.
Scenario 3: The Conference Talk
You: "I compared graph databases to relational databases for social network queries. Here are my LDBC SNB results, reproduced on identical hardware. You can replicate my benchmarks using this GitHub repo."
Result: Your talk gets accepted. Your blog post gets shared. Recruiters message you. Your reputation as a rigorous, data-driven engineer grows.
The pattern: Benchmarking isn't just about measuring databases. It's about building credibility. Engineers who bring data to discussions are respected. Engineers who trust vendor marketing are not.
Your reputation is built on moments like these. When you can say, "I tested it, here are the numbers," you become someone whose opinion matters.
Running Your Own Benchmarks: A Practical Guide
Ready to actually run some benchmarks? Here's how to get started.
Step 1: Define Your Questions
Don't benchmark randomly. Start with specific questions:
Good questions: - "Can a graph database handle our 3-hop recommendation query in under 100ms?" - "How does Neo4j compare to PostgreSQL for our specific data model?" - "Will query performance degrade as we grow from 10M to 100M nodes?"
Bad questions: - "Which database is faster?" (too vague) - "Is Neo4j good?" (not measurable)
Step 2: Model Your Data
Create a representative data model:
1 2 3 4 5 6 7 | |
Scale it: Use LDBC data generator or write scripts to generate millions of nodes.
Step 3: Write Benchmark Queries
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Step 4: Measure Performance
Use Python with the Neo4j driver:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
Step 5: Compare Alternatives
Run the same queries on PostgreSQL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | |
Step 6: Analyze and Report
Create a comparison table:
| Query | PostgreSQL P50 | Neo4j P50 | Speedup |
|---|---|---|---|
| Find customer | 2.1 ms | 1.8 ms | 1.2× |
| Customer purchases (1-hop) | 8.5 ms | 3.2 ms | 2.7× |
| Similar customers (2-hop) | 145 ms | 6.8 ms | 21× |
| Recommendations (3-hop) | 2,800 ms | 12 ms | 233× |
Visualize the results: - Graph showing latency vs hop count - Bar chart comparing databases - Line graph showing scalability (latency vs data size)
Step 7: Share Your Findings
Write it up: - Blog post - GitHub repo with scripts - Internal tech doc
Include: - Methodology (how you tested) - Data model (schema) - Hardware specs - Database versions and configuration - Full results (not just cherry-picked wins) - Limitations (what you didn't test)
Be honest: If RDBMS won on some queries, say so. Credibility comes from honesty, not cherry-picking.
Key Takeaways
- Skepticism is good engineering: Question claims, test assumptions, measure reality
- Hop count predicts performance: Graph DBs scale linearly, RDBMS hits exponential cliff at 2-3 hops
- Degree matters: High-degree nodes (hubs) can slow queries—test both average and extreme cases
- Indexes are critical: Don't benchmark without appropriate indexes on both systems
- Use standard benchmarks: LDBC SNB and Graph 500 provide credible, comparable results
- Join operations are expensive: O(n log n) per hop with growing intermediate sets
- Traversal is constant-time: O(1) pointer lookups via index-free adjacency
- Measure distributions, not averages: P50, P95, P99 tell the real story
- Test at realistic scale: 1M node benchmarks don't predict 100M node performance
- Document everything: Reproducibility = credibility
- Build your reputation: Data-driven engineers earn respect
- Share your results: Transparency builds trust
The bottom line: Don't trust vendor claims. Don't trust this textbook. Run your own benchmarks. Measure your data, your queries, your workload. Make decisions based on evidence, not marketing.
When you walk into a meeting with benchmark results—your own data, reproducible methodology, honest reporting—people listen. That's how you build a reputation as an engineer who makes smart, data-driven decisions.
And that's worth more than any technology choice.
Remember: The best engineers aren't the ones who know the right answer. They're the ones who know how to find the right answer through rigorous testing and measurement. Go benchmark something.