1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123 | Type: workflow
Purpose: Demonstrate the challenges of evolving a relational schema when adding new, heterogeneous entity types with different attributes
Scenario: An IT asset database initially tracks only servers. Business requirements expand to include network switches, storage arrays, and IoT devices—each with unique attributes.
Visual style: Horizontal timeline with schema diagrams at each stage, showing table structures evolving
Stage 1: "Initial schema - Servers only"
Date: January 2020
Servers table:
- server_id (PK)
- hostname
- ip_address
- cpu_count
- ram_gb
- os_version
Note: Clean, simple schema for homogeneous entity type
Stage 2: "Requirement: Add network switches"
Date: June 2020
Problem: Switches have different attributes (port_count, vlan_support, switch_type) that don't apply to servers
Two possible approaches shown as decision branches:
Approach A: "Single table with NULLs" (selected)
Infrastructure table:
- device_id (PK)
- device_type (server|switch)
- hostname
- ip_address
- cpu_count (NULL for switches)
- ram_gb (NULL for switches)
- os_version (NULL for switches)
- port_count (NULL for servers)
- vlan_support (NULL for servers)
- switch_type (NULL for servers)
Issues: Many NULL values, sparse table, unclear which columns apply to which device types
Approach B: "Separate tables" (not chosen)
Servers table (original) + Switches table (new)
Issues: Querying all infrastructure requires UNION, can't easily add shared attributes
Stage 3: "Requirement: Add storage arrays"
Date: December 2020
Infrastructure table grows:
- ... (all previous columns)
- storage_capacity_tb (NULL for servers and switches)
- raid_level (NULL for servers and switches)
- disk_count (NULL for servers and switches)
Note: Table becoming increasingly sparse, with ~60% NULL values across all rows
Stage 4: "Requirement: Add IoT sensors"
Date: June 2021
Infrastructure table grows further:
- ... (all previous columns)
- sensor_type (NULL for servers, switches, storage)
- battery_level (NULL for servers, switches, storage)
- last_reading_timestamp (NULL for servers, switches, storage)
Migration challenge: ALTER TABLE on 100,000-row table takes 4 hours, requires maintenance window
Note: NULL values now 75% of table content, queries becoming complex with device_type filtering
Stage 5: "Crisis: Performance degradation"
Date: January 2022
Problems identified:
- Query performance declining due to table size and sparsity
- Indexes on device-type-specific columns ineffective (too many NULLs)
- Application logic complicated with device type conditionals
- Adding new device types requires coordinated schema changes and application deployments
Decision: Consider alternative architectures
Stage 6: "Solution: Refactor to graph database"
Date: June 2022
Graph model:
- Device nodes with common properties (id, hostname, ip_address)
- Node labels by type: :Server, :Switch, :StorageArray, :IoTSensor
- Type-specific properties stored directly on nodes without NULL padding
- New device types added without schema migration
Result:
- 75% reduction in NULL values
- Query performance improvement (type-specific queries faster)
- New device types deployable without database migrations
- Schema flexibility maintained while keeping type safety
Interactive elements:
- Hover over each stage to see code examples (CREATE TABLE, ALTER TABLE statements)
- Click decision points to see detailed pros/cons analysis
- Toggle view: "Show actual queries" displays SQL at each stage, showing increasing complexity
- Metrics panel: Shows NULL percentage, query times, schema change deployment time at each stage
Annotations:
- "Each ALTER TABLE requires testing, migration scripts, coordination with application teams"
- "EAV (Entity-Attribute-Value) pattern could help but sacrifices query performance"
- "This is why heterogeneous IT infrastructure struggles in relational schemas"
- "Graph databases excel at heterogeneous, evolving schemas"
Color coding:
- Blue: Clean schema stages
- Yellow: Growing complexity warnings
- Red: Crisis/performance problems
- Green: Solution stage
Swimlanes:
- Database schema layer
- Application impact layer
- Operations/deployment layer
Implementation: Horizontal timeline with expandable stages, interactive SQL code examples, metrics visualization showing degradation over time
|