Taxonomy Distribution Report
Overview
- Total Concepts: 480
- Number of Taxonomies: 14
- Average Concepts per Taxonomy: 34.3
Distribution Summary
| Category | TaxonomyID | Count | Percentage | Status |
|---|---|---|---|---|
| Pathways and Disease | PATH | 53 | 11.0% | ✅ |
| Graph Theory | GRTH | 52 | 10.8% | ✅ |
| Graph Databases | GRDB | 45 | 9.4% | ✅ |
| Knowledge Graphs | KNOW | 40 | 8.3% | ✅ |
| Sequence Analysis | SEQA | 34 | 7.1% | ✅ |
| Phylogenetics | PHYL | 34 | 7.1% | ✅ |
| Structural Bioinformatics | STRU | 34 | 7.1% | ✅ |
| Tools and Capstone | TOOL | 31 | 6.5% | ✅ |
| Foundation Concepts | FOUND | 30 | 6.2% | ✅ |
| Protein Interactions | PPIS | 29 | 6.0% | ✅ |
| Genomics | GENO | 29 | 6.0% | ✅ |
| Transcriptomics | TRNS | 29 | 6.0% | ✅ |
| Biological Databases | DBAS | 25 | 5.2% | ✅ |
| Data Formats | DFMT | 15 | 3.1% | ✅ |
Visual Distribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Balance Analysis
✅ No Over-Represented Categories
All categories are under the 30% threshold. Good balance!
Category Details
Pathways and Disease (PATH)
Count: 53 concepts (11.0%)
Concepts:
-
- Metabolic Network
-
- Metabolite
-
- Enzyme
-
- Enzyme Kinetics
-
- Metabolic Pathway
-
- Bipartite Metabolic Graph
-
- KEGG Pathways
-
- Reactome Pathways
-
- BioCyc Pathways
-
- Flux Balance Analysis
-
- Constraint-Based Modeling
-
- Stoichiometric Matrix
-
- Objective Function
-
- Metabolic Flux
-
- Genome-Scale Model
- ...and 38 more
Graph Theory (GRTH)
Count: 52 concepts (10.8%)
Concepts:
-
- Graph Theory
-
- Nodes and Edges
-
- Directed Graphs
-
- Undirected Graphs
-
- Weighted Graphs
-
- Bipartite Graphs
-
- Labeled Property Graph
-
- Multigraph
-
- Hypergraph
-
- Subgraph
-
- Graph Properties
-
- Degree Distribution
-
- In-Degree
-
- Out-Degree
-
- Clustering Coefficient
- ...and 37 more
Graph Databases (GRDB)
Count: 45 concepts (9.4%)
Concepts:
-
- Graph Database
-
- Relational Database
-
- Graph vs Relational Model
-
- Neo4j
-
- Memgraph
-
- Cypher Query Language
-
- GQL Query Language
-
- MATCH Clause
-
- WHERE Clause
-
- RETURN Clause
-
- CREATE Clause
-
- MERGE Clause
-
- Graph Pattern Matching
-
- Variable-Length Paths
-
- Path Queries
- ...and 30 more
Knowledge Graphs (KNOW)
Count: 40 concepts (8.3%)
Concepts:
-
- Knowledge Graph
-
- Biomedical Knowledge Graph
-
- Gene Ontology
-
- GO Molecular Function
-
- GO Biological Process
-
- GO Cellular Component
-
- GO Term Enrichment
-
- Disease Ontology
-
- Human Phenotype Ontology
-
- Ontology Structure
-
- Ontology Reasoning
-
- Semantic Similarity
-
- Heterogeneous Data
-
- Data Integration
-
- Schema Mapping
- ...and 25 more
Sequence Analysis (SEQA)
Count: 34 concepts (7.1%)
Concepts:
-
- Sequence Alignment
-
- Pairwise Alignment
-
- Global Alignment
-
- Local Alignment
-
- Smith-Waterman Algorithm
-
- Needleman-Wunsch Algorithm
-
- Dynamic Programming
-
- Scoring Matrices
-
- BLOSUM Matrix
-
- PAM Matrix
-
- Substitution Model
-
- Gap Penalties
-
- Affine Gap Penalty
-
- BLAST
-
- BLAST E-Value
- ...and 19 more
Phylogenetics (PHYL)
Count: 34 concepts (7.1%)
Concepts:
-
- Phylogenetic Tree
-
- Phylogenetics
-
- Molecular Phylogenetics
-
- Distance Matrix
-
- Neighbor-Joining Method
-
- UPGMA Method
-
- Maximum Parsimony
-
- Maximum Likelihood Method
-
- Bayesian Inference
-
- Markov Chain Monte Carlo
-
- Bootstrap Analysis
-
- Branch Support Values
-
- Molecular Clock
-
- Substitution Rate
-
- Trees as DAGs
- ...and 19 more
Structural Bioinformatics (STRU)
Count: 34 concepts (7.1%)
Concepts:
-
- Primary Structure
-
- Secondary Structure
-
- Alpha Helix
-
- Beta Sheet
-
- Tertiary Structure
-
- Quaternary Structure
-
- Protein Folding
-
- Protein Folding Problem
-
- Homology Modeling
-
- Threading
-
- Ab Initio Prediction
-
- AlphaFold
-
- AlphaFold Database
-
- Protein Contact Map
-
- Contact Map as Graph
- ...and 19 more
Tools and Capstone (TOOL)
Count: 31 concepts (6.5%)
Concepts:
-
- Python for Bioinformatics
-
- Biopython
-
- NetworkX
-
- Pandas for Bioinformatics
-
- Scikit-Learn
-
- Jupyter Notebooks
-
- Matplotlib
-
- Seaborn
-
- Neo4j Python Driver
-
- Cytoscape API
-
- Data Wrangling
-
- Reproducible Analysis
-
- Version Control for Science
-
- Workflow Managers
-
- Conda Environments
- ...and 16 more
Foundation Concepts (FOUND)
Count: 30 concepts (6.2%)
Concepts:
-
- Bioinformatics
-
- Computational Biology
-
- Central Dogma
-
- DNA Structure
-
- RNA Structure
-
- Protein Structure
-
- Amino Acids
-
- Nucleotides
-
- Codons
-
- Gene
-
- Genome
-
- Transcription
-
- Translation
-
- Gene Expression
-
- Sequence Data
- ...and 15 more
Protein Interactions (PPIS)
Count: 29 concepts (6.0%)
Concepts:
-
- Protein Interaction Network
-
- Interactome
-
- Yeast Two-Hybrid
-
- Co-Immunoprecipitation
-
- Affinity Purification MS
-
- Cross-Linking Mass Spec
-
- PPI Confidence Scoring
-
- Binary vs Complex PPIs
-
- Network Hubs
-
- Network Bottlenecks
-
- Network Modules
-
- Hub-and-Spoke Topology
-
- Date Hubs vs Party Hubs
-
- Essential Proteins
-
- Protein Complex Detection
- ...and 14 more
Genomics (GENO)
Count: 29 concepts (6.0%)
Concepts:
-
- Genome Assembly
-
- De Bruijn Graph
-
- K-mer
-
- K-mer Spectrum
-
- Contig
-
- Scaffold
-
- N50 Metric
-
- Assembly Quality Metrics
-
- Reference Genome
-
- Reference Bias
-
- Pangenome
-
- Pangenome Graph
-
- Variation Graph
-
- VG Toolkit
-
- Read Mapping to Graphs
- ...and 14 more
Transcriptomics (TRNS)
Count: 29 concepts (6.0%)
Concepts:
-
- Transcriptome
-
- RNA-Seq Pipeline
-
- Read Quality Trimming
-
- Read Alignment
-
- Transcript Quantification
-
- Differential Expression
-
- Fold Change
-
- Statistical Testing for DE
-
- False Discovery Rate
-
- Transcription Factor
-
- Promoter Region
-
- Enhancer Region
-
- Cis-Regulatory Element
-
- Operon
-
- Gene Regulatory Network
- ...and 14 more
Biological Databases (DBAS)
Count: 25 concepts (5.2%)
Concepts:
-
- Biological Databases
-
- NCBI
-
- GenBank Database
-
- UniProt
-
- Swiss-Prot
-
- TrEMBL
-
- Protein Data Bank
-
- Ensembl
-
- KEGG Database
-
- Reactome Database
-
- BioGRID Database
-
- STRING Database
-
- IntAct Database
-
- COSMIC Database
-
- Gene Ontology Database
- ...and 10 more
Data Formats (DFMT)
Count: 15 concepts (3.1%)
Concepts:
-
- FASTA Format
-
- FASTQ Format
-
- GenBank Format
-
- GFF3 Format
-
- OWL Format
-
- PDB File Format
-
- VCF Format
-
- SAM and BAM Format
-
- BED Format
-
- SBML Format
-
- BioPAX Format
-
- CSV for Bioinformatics
-
- JSON for Bioinformatics
-
- Data Format Conversion
-
- Data Quality Control
Recommendations
- ✅ Excellent balance: Categories are evenly distributed (spread: 7.9%)
- ✅ MISC category minimal: Good categorization specificity
Educational Use Recommendations
- Use taxonomy categories for color-coding in graph visualizations
- Design curriculum modules based on taxonomy groupings
- Create filtered views for focused learning paths
- Use categories for assessment organization
- Enable navigation by topic area in interactive tools
Report generated by learning-graph-reports/taxonomy_distribution.py