Taxonomy and Data Formats
Summary
This chapter explores how to add taxonomy information to your learning graph and convert it to various formats for visualization and processing. You'll learn about the TaxonomyID field in CSV files and the process of adding taxonomy categorization to existing concept graphs. The chapter provides comprehensive coverage of the vis-network JSON format, including its schema structure with metadata, groups, nodes, and edges sections.
You'll learn about Dublin Core metadata standards and how to properly populate metadata fields including title, description, creator, date, version, format, and license. The chapter also covers color coding strategies for visualizations and font color selection for readability. Finally, you'll be introduced to Python scripting for learning graph processing, including key scripts like analyze-graph.py and csv-to-json.py.
Concepts Covered
This chapter covers the following 22 concepts from the learning graph:
- TaxonomyID Field in CSV
- Adding Taxonomy to Graph
- vis-network JSON Format
- JSON Schema for Learning Graphs
- Metadata Section in JSON
- Groups Section in JSON
- Nodes Section in JSON
- Edges Section in JSON
- Dublin Core Metadata
- Title Metadata Field
- Description Metadata Field
- Creator Metadata Field
- Date Metadata Field
- Version Metadata Field
- Format Metadata Field
- License Metadata Field
- Color Coding in Visualizations
- Font Colors for Readability
- Python
- Python Scripts for Processing
- analyze-graph.py Script
- csv-to-json.py Script
Prerequisites
This chapter builds on concepts from:
Introduction to Data Formats for Learning Graphs
Learning graphs exist as data structures that must be stored, processed, and visualized effectively. While the conceptual model of a learning graph—concepts connected by dependency relationships—is straightforward, implementing that model requires careful attention to data formats and transformation pipelines. This chapter explores the complete data workflow from CSV-based graph authoring through JSON conversion to interactive visualization.
You'll learn how taxonomy information enriches your learning graph with categorical structure, enabling color-coded visualizations and category-based filtering. The chapter provides comprehensive coverage of the vis-network JSON format, which serves as the intermediate representation for browser-based graph visualization. Understanding JSON schema design, metadata standards, and color coding strategies will enable you to create professional, accessible learning graph visualizations.
The chapter culminates with practical Python scripting for learning graph processing. You'll explore the implementation details of scripts that validate, transform, and analyze your learning graph data, empowering you to customize the toolchain for your specific needs.
The TaxonomyID Field in CSV Format
The learning graph CSV format introduced in Chapter 5 includes four essential columns: ConceptID, ConceptLabel, Dependencies, and TaxonomyID. While the first three columns define graph structure, the TaxonomyID column provides categorical metadata that enhances both organization and visualization.
A TaxonomyID is a short (3-5 letter) abbreviation representing a conceptual category or domain. Examples include:
- FOUND: Foundational concepts
- TOOL: Tools and technologies
- IMPL: Implementation techniques
- ARCH: Architecture and design
- EVAL: Evaluation and assessment
The TaxonomyID field serves multiple purposes in the learning graph ecosystem:
- Visual grouping: Concepts with the same TaxonomyID display in the same color in visualizations
- Filtering: Users can filter graph views to show only specific categories
- Balance analysis: Distribution reports identify over- or under-represented categories
- Conceptual organization: Related concepts cluster naturally during authoring
In the CSV format, TaxonomyID appears as the fourth column:
1 2 3 4 5 | |
Adding Taxonomy to Existing Graphs
If you created a learning graph without TaxonomyID information, you can add it retroactively using a multi-step process:
- Identify natural categories: Review your concept list and identify 5-10 logical groupings based on topic similarity, complexity level, or knowledge domain
- Design TaxonomyID abbreviations: Create distinctive, memorable 3-5 letter codes for each category
- Add TaxonomyID column to CSV: Insert a new column header "TaxonomyID" as the fourth column
- Categorize concepts: Assign each concept to its most appropriate category
- Validate distribution: Run
taxonomy-distribution.pyto check for balanced categorization
The add-taxonomy.py helper script can semi-automate this process by suggesting categories based on concept labels using keyword matching:
1 2 | |
The script prompts for taxonomy rules (keyword → TaxonomyID mappings) and applies them systematically, flagging ambiguous cases for manual review.
Diagram: Adding Taxonomy to CSV Workflow Diagram
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
MicroSim Generator Recommendations:
- mermaid-generator (94/100) - Flowchart with decision diamonds and process boxes is core Mermaid strength
- microsim-p5 (75/100) - Custom flowchart rendering possible with manual layout and interaction
- vis-network (45/100) - Can represent workflow as directed graph but less intuitive than flowchart
vis-network JSON Format
The vis-network JavaScript library provides powerful, interactive graph visualization in web browsers. To leverage vis-network for learning graph visualization, you must convert your CSV data into the vis-network JSON format—a structured representation that defines nodes, edges, visual styling, and metadata.
The vis-network format organizes graph data into four primary sections:
- metadata: Information about the graph itself (title, creator, date, etc.)
- groups: Visual styling definitions for each TaxonomyID category
- nodes: Array of concept objects with id, label, and group properties
- edges: Array of dependency objects with from and to properties
This hierarchical structure separates content (what concepts exist) from presentation (how concepts should be displayed), following best practices for data interchange formats.
JSON Schema for Learning Graphs
A JSON schema defines the expected structure, data types, and constraints for JSON documents. For learning graphs, the schema ensures that generated JSON files conform to vis-network requirements and include all necessary metadata.
The learning graph JSON schema specifies:
Top-level structure:
1 2 3 4 5 6 | |
Data type constraints:
metadata: Object with string values for title, description, etc.groups: Object with group names as keys, styling objects as valuesnodes: Array of objects, each with requiredid(number),label(string),group(string)edges: Array of objects, each with requiredfrom(number),to(number)
Validation rules:
- All node IDs must be unique within the nodes array
- All edge
fromandtovalues must reference existing node IDs - All node
groupvalues must have corresponding entries in thegroupsobject - Metadata fields should follow Dublin Core standards (covered in next section)
The csv-to-json.py script implements this schema validation automatically, rejecting CSV data that would produce invalid JSON and providing detailed error messages for corrections.
Diagram: Learning Graph JSON Schema Diagram
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | |
MicroSim Generator Recommendations:
- mermaid-generator (92/100) - Tree/hierarchical diagrams with nested structures well-supported
- microsim-p5 (70/100) - Custom tree layout requires recursive positioning algorithms
- vis-network (65/100) - Can display hierarchical graphs with physics-based layouts
Metadata Section in JSON
The metadata section contains descriptive information about the learning graph as a whole, following Dublin Core metadata standards. This section enables proper attribution, versioning, and documentation of your learning graph dataset.
Example metadata section:
1 2 3 4 5 6 7 8 9 10 11 | |
While metadata doesn't affect graph visualization directly, it provides essential context for:
- Attribution: Identifying who created or maintains the learning graph
- Versioning: Tracking changes over time and ensuring correct versions are used
- Documentation: Describing the graph's purpose, scope, and educational context
- Licensing: Clarifying usage rights and redistribution terms
Groups Section in JSON
The groups section defines visual styling for each TaxonomyID category, enabling consistent color-coded visualization across the learning graph. Each group specifies:
- color: Background color for nodes in this category
- font: Text color and size for labels
- shape: Node shape (circle, box, diamond, etc.)
Example groups section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
Consistent group styling creates visual coherence and aids comprehension by allowing users to quickly identify concept categories by color.
Nodes Section in JSON
The nodes section contains an array of concept objects representing the vertices of your learning graph. Each node object requires three properties:
- id: Unique numeric identifier (matches ConceptID from CSV)
- label: Human-readable concept name (matches ConceptLabel from CSV)
- group: TaxonomyID category for visual styling
Example nodes section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | |
The nodes array typically contains 150-250 objects for a comprehensive learning graph. vis-network uses this array to render graph vertices, applying styling from the groups section based on each node's group property.
Edges Section in JSON
The edges section contains an array of dependency relationship objects representing the directed edges of your learning graph. Each edge object requires two properties:
- from: Node ID of the prerequisite concept
- to: Node ID of the dependent concept
Example edges section:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
The edges array defines the directed acyclic graph structure. vis-network renders these as arrows pointing from prerequisite to dependent concepts, creating the visual flow of the learning progression.
For a 200-concept learning graph with an average of 3 dependencies per concept, expect approximately 600 edge objects in this array.
Diagram: CSV to JSON Conversion Mapping Diagram
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | |
MicroSim Generator Recommendations:
- mermaid-generator (90/100) - Data flow diagrams with transformation steps supported via flowchart syntax
- microsim-p5 (78/100) - Custom visualization with tables and arrows achievable with careful layout
- chartjs-generator (20/100) - Not designed for data transformation diagrams
Dublin Core Metadata Standard
Dublin Core is an internationally recognized metadata standard (ISO 15836) for describing digital resources. Originally developed for library catalog systems, Dublin Core provides a simple yet powerful vocabulary for resource description that translates well to learning graph documentation.
The core Dublin Core elements most relevant to learning graphs include:
| Element | Purpose | Example |
|---|---|---|
| Title | Name of the resource | "Graph Databases Learning Graph" |
| Description | Summary of content and scope | "200-concept graph covering Neo4j..." |
| Creator | Primary author or maintainer | "Dr. Jane Smith" |
| Date | Creation or modification date | "2024-09-15" (ISO 8601) |
| Version | Version number | "1.2.0" (semantic versioning) |
| Format | File format specification | "vis-network JSON v9.1" |
| License | Usage rights | "CC-BY-4.0" or "MIT" |
Using Dublin Core metadata ensures your learning graphs are properly documented, discoverable, and interoperable with academic and educational resource repositories.
Title Metadata Field
The title field provides the primary name for your learning graph. Effective titles are:
- Descriptive: Clearly indicate the subject matter
- Specific: Distinguish from other learning graphs
- Concise: Typically 5-10 words maximum
Examples of effective titles:
- "Introduction to Graph Databases Learning Graph"
- "Python Programming Fundamentals Concept Map"
- "ITIL Service Management Dependency Graph"
Avoid generic titles like "Learning Graph" or "Course Concepts" that provide no information about content.
Description Metadata Field
The description field offers a 1-3 sentence summary of the learning graph's scope, audience, and purpose:
1 2 3 | |
Effective descriptions answer:
- What: Topic and scope
- Who: Target audience and prerequisites
- How many: Number of concepts
- When/Where: Course duration or context
Creator Metadata Field
The creator field identifies the primary author or team responsible for developing the learning graph:
1 2 3 | |
For multiple creators, use semicolon-separated list:
1 2 3 | |
Proper attribution ensures:
- Academic credit for intellectual work
- Contact information for questions or collaborations
- Provenance tracking in educational repositories
Date Metadata Field
The date field records when the learning graph was created or last significantly updated. Use ISO 8601 format (YYYY-MM-DD) for unambiguous, machine-parseable dates:
1 2 3 | |
For resources with multiple relevant dates, use qualified Dublin Core:
1 2 3 4 5 | |
Accurate dating enables versioning, change tracking, and temporal queries in learning resource repositories.
Version Metadata Field
The version field tracks revisions using semantic versioning (MAJOR.MINOR.PATCH):
1 2 3 | |
Version numbering conventions:
- MAJOR: Increment for incompatible changes (e.g., restructuring categories, removing concepts)
- MINOR: Increment for backwards-compatible additions (e.g., adding concepts, refining dependencies)
- PATCH: Increment for corrections (e.g., fixing typos, correcting metadata)
Examples:
1.0.0: Initial release1.1.0: Added 15 new concepts on advanced topics1.1.1: Fixed typo in concept label2.0.0: Restructured taxonomy from 8 to 12 categories (breaking change)
Format Metadata Field
The format field specifies the file format and version:
1 2 3 | |
For learning graphs, useful format specifications include:
- Technical format: "vis-network JSON v9.1"
- MIME type: "application/json"
- Schema version: "Learning Graph Schema v2.0"
Explicit format declaration enables:
- Validation against correct schemas
- Compatibility checking with visualization tools
- Automated format conversion pipelines
License Metadata Field
The license field clarifies usage rights using standard license identifiers:
1 2 3 | |
Common licenses for educational resources:
| License | Meaning | Usage Rights |
|---|---|---|
| CC-BY-4.0 | Attribution required | Commercial and derivative works allowed |
| CC-BY-SA-4.0 | Attribution + Share-Alike | Derivatives must use same license |
| CC-BY-NC-4.0 | Attribution + Non-Commercial | No commercial use |
| MIT | Permissive open source | Minimal restrictions |
| All Rights Reserved | Traditional copyright | No use without permission |
Clear licensing enables:
- Legal sharing and remixing
- Inclusion in open educational resource repositories
- Compliance with institutional policies
Diagram: Dublin Core Metadata Field Reference Card
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
MicroSim Generator Recommendations:
- markdown table (best) - Static reference card doesn't require interactivity, markdown table is simplest
- microsim-p5 (85/100) - If interactivity needed, p5.js with DOM elements supports card grid layout
- chartjs-generator (15/100) - Not designed for reference card layouts or metadata display
Color Coding in Visualizations
Color coding transforms abstract graph data into intuitive visual representations where patterns emerge naturally. For learning graphs, color serves as a primary visual variable encoding taxonomy categories, enabling users to identify concept domains at a glance.
Effective color coding schemes for learning graphs follow several design principles:
Color Palette Selection
Choose colors that are:
- Distinctive: Easily distinguished from one another
- Meaningful: Associate naturally with category semantics when possible
- Accessible: Visible to users with color vision deficiencies
- Consistent: Use same colors across all visualizations
Recommended palette strategies:
Rainbow gradient (for sequential categories):
- FOUND: Red (#FF6B6B)
- BASIC: Orange (#FFA94D)
- ARCH: Yellow (#FFD43B)
- IMPL: Light Green (#8CE99A)
- DATA: Green (#51CF66)
- TOOL: Light Blue (#74C0FC)
- QUAL: Blue (#4C6EF5)
- ADV: Purple (#9775FA)
Categorical palette (for non-sequential categories):
Use palettes designed for categorical data with maximum perceptual distance:
- ColorBrewer qualitative schemes (Set1, Set2, Set3)
- Tableau categorical palettes
- Okabe-Ito colorblind-safe palette
Font Colors for Readability
Node label text must be readable against the background color. The W3C Web Content Accessibility Guidelines (WCAG) specify minimum contrast ratios:
- Normal text: 4.5:1 contrast ratio (AA level)
- Large text (18pt+): 3:1 contrast ratio (AA level)
- Enhanced (AAA level): 7:1 for normal, 4.5:1 for large
General rules for font color selection:
| Background Lightness | Recommended Font Color | Hex Code |
|---|---|---|
| Dark (L < 50%) | White or very light gray | #FFFFFF or #F8F9FA |
| Light (L > 50%) | Black or very dark gray | #000000 or #212529 |
| Medium (L ≈ 50%) | Test both; choose higher contrast | Depends on specific color |
The csv-to-json.py script can calculate optimal font colors automatically using the relative luminance formula:
1 | |
If luminance > 0.5, use black text; otherwise, use white text.
Diagram: Color Accessibility Checker MicroSim
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | |
MicroSim Generator Recommendations:
- microsim-p5 (95/100) - Interactive color pickers, contrast calculation, and live preview are p5.js + DOM strengths
- chartjs-generator (25/100) - Not designed for color accessibility checking tools
- vis-network (10/100) - Not applicable to color contrast validation interfaces
Python for Learning Graph Processing
Python serves as the primary scripting language for learning graph validation, transformation, and analysis. Its rich ecosystem of libraries for data processing (csv, json, pandas) and graph analysis (networkx) makes it ideal for implementing the learning graph toolchain.
The learning graph workflow uses Python for three main tasks:
- Validation: Checking structural integrity and quality metrics
- Transformation: Converting between formats (CSV → JSON)
- Analysis: Generating quality reports and distribution statistics
Python scripts follow consistent patterns:
Command-line interface:
1 2 3 4 5 6 7 8 | |
CSV reading with error handling:
1 2 3 4 5 6 7 8 9 | |
JSON writing with formatting:
1 2 3 4 | |
Python Scripts for Processing
The learning graph toolkit includes three core Python scripts, each focused on a specific processing task:
| Script | Input | Output | Purpose |
|---|---|---|---|
| analyze-graph.py | learning-graph.csv | quality-metrics.md | Validate structure, calculate quality score |
| csv-to-json.py | learning-graph.csv | learning-graph.json | Convert to vis-network format |
| taxonomy-distribution.py | learning-graph.csv | taxonomy-distribution.md | Analyze category balance |
All scripts follow similar architectural patterns:
- Argument parsing: Accept input/output filenames via command line
- File reading: Load CSV data with error handling
- Data validation: Check format, detect errors
- Processing: Perform core transformation or analysis
- Output generation: Write results to file
- Status reporting: Print summary to console
This consistency makes scripts easy to understand, maintain, and extend.
analyze-graph.py Script Implementation
The analyze-graph.py script performs comprehensive learning graph validation and quality analysis. Its implementation illustrates key graph algorithms and quality metric calculations.
Core functionality:
- CSV parsing: Reads four-column format, creates graph data structure
- Dependency parsing: Splits pipe-delimited dependencies into integer lists
- Graph construction: Builds adjacency list representation for traversal
- Cycle detection: DFS-based algorithm with three-color marking
- Connectivity analysis: Identifies disconnected components
- Metric calculation: Computes indegree, outdegree, chain lengths
- Quality scoring: Aggregates metrics into overall score
- Report generation: Outputs formatted Markdown
Key implementation details:
Cycle detection using DFS:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | |
Quality score calculation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
csv-to-json.py Script Implementation
The csv-to-json.py script transforms CSV learning graphs into vis-network JSON format. Its implementation demonstrates data format conversion and JSON schema construction.
Core functionality:
- CSV reading: Parses four-column format
- Nodes array construction: Creates objects with id, label, group
- Edges array construction: Parses dependencies, creates from/to objects
- Groups object construction: Defines color schemes for each TaxonomyID
- Metadata population: Adds Dublin Core fields
- JSON serialization: Outputs formatted vis-network JSON
Key implementation details:
Node creation:
1 2 3 4 5 6 7 8 | |
Edge creation from dependencies:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Groups generation with color palette:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
Complete JSON structure assembly:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Diagram: Python Learning Graph Processing Pipeline
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | |
MicroSim Generator Recommendations:
- mermaid-generator (93/100) - Pipeline flowcharts with sequential stages and decision points well-supported
- vis-network (70/100) - Can model pipeline as directed graph with custom node shapes
- microsim-p5 (72/100) - Custom flowchart rendering with manual stage positioning and arrows
Summary and Next Steps
This chapter provided comprehensive coverage of data formats and processing pipelines for learning graphs. You learned how the TaxonomyID field enables categorical organization and color-coded visualization, how the vis-network JSON format structures graph data for web-based visualization, and how Dublin Core metadata standards ensure proper documentation.
The Python scripting coverage demonstrated practical implementation patterns for graph validation, format conversion, and analysis. These scripts form a reusable toolkit that processes learning graph data from authoring through quality validation to visualization-ready JSON.
Key takeaways:
- TaxonomyID is the fourth column in learning graph CSV, providing categorical metadata
- vis-network JSON has four sections: metadata, groups, nodes, edges
- Dublin Core metadata ensures proper attribution, versioning, and licensing
- Color accessibility matters: Use WCAG contrast ratios for readable text
- Python scripts automate processing: Validation, conversion, and analysis in consistent pipelines
- Data flows CSV → validation → JSON → visualization: Each stage builds on the previous
With validated learning graphs converted to visualization-ready JSON format, you're prepared to deploy interactive graph viewers that enable students and instructors to explore concept dependencies visually. The next chapters will cover visualization implementation, chapter structure generation, and content creation workflows that transform your learning graph into a complete intelligent textbook.
References
-
vis-network documentation - 2024 - vis.js - Official documentation for the vis-network JavaScript library used to create interactive, customizable network visualizations in browsers, supporting thousands of nodes with clustering for larger datasets, essential for implementing learning graph viewers.
-
DCMI: Using Dublin Core - 2024 - Dublin Core Metadata Initiative - Official usage guide for Dublin Core metadata standards, explaining how to create descriptive records for information resources with the fifteen core metadata elements, ensuring professional metadata quality in learning graph JSON files.
-
Working with CSV and JSON Files in Python - 2024-10-15 - DEV Community - Tutorial covering CSV and JSON file handling in Python using built-in libraries and pandas, with practical examples for data conversion workflows directly applicable to learning graph processing scripts.