Taxonomy Distribution Report

Overview

Total Concepts: 475
Number of Taxonomies: 15
Average Concepts per Taxonomy: 31.7

Distribution Summary

Category	TaxonomyID	Count	Percentage	Status
OBS	OBS	65	13.7%	✅
Foundation Concepts - Prerequisites	FOUND	50	10.5%	✅
OPT	OPT	45	9.5%	✅
RAG	RAG	40	8.4%	✅
BUDG	BUDG	35	7.4%	✅
ROUT	ROUT	30	6.3%	✅
HARN	HARN	28	5.9%	✅
ECON	ECON	27	5.7%	✅
ANTH	ANTH	25	5.3%	✅
OAI	OAI	25	5.3%	✅
SKIL	SKIL	25	5.3%	✅
AB	AB	25	5.3%	✅
GOOG	GOOG	20	4.2%	✅
CAP	CAP	20	4.2%	✅
PRIV	PRIV	15	3.2%	✅

Visual Distribution

OBS                       ██████  65 ( 13.7%)
Foundation Concepts - Pre █████  50 ( 10.5%)
OPT                       ████  45 (  9.5%)
RAG                       ████  40 (  8.4%)
BUDG                      ███  35 (  7.4%)
ROUT                      ███  30 (  6.3%)
HARN                      ██  28 (  5.9%)
ECON                      ██  27 (  5.7%)
ANTH                      ██  25 (  5.3%)
OAI                       ██  25 (  5.3%)
SKIL                      ██  25 (  5.3%)
AB                        ██  25 (  5.3%)
GOOG                      ██  20 (  4.2%)
CAP                       ██  20 (  4.2%)
PRIV                      █  15 (  3.2%)

Balance Analysis

✅ No Over-Represented Categories

All categories are under the 30% threshold. Good balance!

Category Details

OBS (OBS)

Count: 65 concepts (13.7%)

Concepts:

1. Structured Logging
1. Log Schema Design
1. Log Line
1. JSON Log Format
1. Log Field
1. Required Log Field
1. Optional Log Field
1. Model Field
1. Prompt Hash
1. Input Token Field
1. Output Token Field
1. Cached Token Field
1. Latency Field
1. Cost Field
1. Feature Tag
...and 50 more

Foundation Concepts - Prerequisites (FOUND)

Count: 50 concepts (10.5%)

Concepts:

1. Generative AI
1. Large Language Model
1. Foundation Model
1. Transformer Architecture
1. Autoregressive Generation
1. Token
1. Input Token
1. Output Token
1. Cached Token
1. Reasoning Token
1. Token Count
1. Tokenizer
1. Byte Pair Encoding
1. SentencePiece
1. Vocabulary Size
...and 35 more

OPT (OPT)

Count: 45 concepts (9.5%)

Concepts:

1. Prompt Engineering
1. System Prompt Hygiene
1. Instruction Compression
1. Few-Shot Example
1. Few-Shot Pruning
1. Zero-Shot Prompting
1. Chain Of Thought
1. Dead Context
1. Redundant Instruction
1. Verbose Boilerplate
1. Prompt Template
1. Template Versioning
1. Prompt Variable
1. Variable Interpolation
1. Prompt Compression Tool
...and 30 more

RAG (RAG)

Count: 40 concepts (8.4%)

Concepts:

1. Retrieval Augmented Generation
1. Embedding
1. Vector Database
1. Chunking
1. Chunk Size
1. Chunk Overlap
1. Top-K Retrieval
1. Reranker
1. Cross-Encoder Reranker
1. Retrieval Score
1. Context Injection
1. Retrieved Context Bloat
1. Context Pruning
1. Hybrid Retrieval
1. BM25 Retrieval
...and 25 more

BUDG (BUDG)

Count: 35 concepts (7.4%)

Concepts:

1. Agent Budget Policy
1. Per-Session Token Budget
1. Per-Session Tool Call Budget
1. Loop Iteration Limit
1. Wall Clock Limit
1. Cost Cap
1. Graceful Degradation
1. Budget Exhaustion Handling
1. Runaway Detection
1. Circuit Breaker Pattern
1. Tool Call Throttling
1. Subtask Budget Allocation
1. Budget Audit Log
1. Budget Reporting
1. Per-Engineer Budget
...and 20 more

ROUT (ROUT)

Count: 30 concepts (6.3%)

Concepts:

1. Model Routing
1. Cheap-First Cascade
1. Escalation Trigger
1. Confidence Threshold
1. Quality Gate
1. Fallback Model
1. Cross-Vendor Routing
1. Task Classifier
1. Difficulty Estimation
1. Routing Policy
1. Routing Cost Savings
1. Routing Quality Risk
1. Per-Task Model Selection
1. Vendor Lock-In Risk
1. Vendor-Neutral Abstraction
...and 15 more

HARN (HARN)

Count: 28 concepts (5.9%)

Concepts:

1. AI Coding Harness
1. Agentic Loop
1. Tool Use Loop
1. Claude Code
1. Claude Code Session
1. Claude Code Hooks
1. OpenAI Codex CLI
1. Codex Session
1. Google Antigravity
1. Antigravity Workspace
1. Harness System Prompt
1. Harness Token Overhead
1. Session Token Accumulation
1. Per-Session Token Cost
1. Conversation Compaction
...and 13 more

ECON (ECON)

Count: 27 concepts (5.7%)

Concepts:

1. Per-Million-Token Price
1. Input Token Price
1. Output Token Price
1. Cached Input Price
1. Output Premium
1. Unit Economics
1. Cost Per Request
1. Cost Per Feature
1. Cost Per User
1. Cost Per Outcome
1. Cost Attribution
1. Token Budget
1. Monthly Token Spend
1. Forecasting Token Cost
1. Cost-Quality Tradeoff
...and 12 more

ANTH (ANTH)

Count: 25 concepts (5.3%)

Concepts:

1. Anthropic API
1. Claude Messages API
1. Claude Model Family
1. Claude Opus
1. Claude Sonnet
1. Claude Haiku
1. Anthropic SDK
1. API Key Management
1. Anthropic Prompt Caching
1. Cache Control Parameter
1. Cache Breakpoint
1. Cache TTL
1. Cache Read Tokens
1. Cache Write Tokens
1. Extended Thinking
...and 10 more

OAI (OAI)

Count: 25 concepts (5.3%)

Concepts:

1. OpenAI API
1. Chat Completions API
1. OpenAI Responses API
1. OpenAI Model Family
1. GPT Model Series
1. OpenAI O Series
1. Reasoning Model
1. OpenAI SDK
1. Function Calling
1. Tool Choice Parameter
1. JSON Mode
1. Structured Outputs
1. Response Format
1. OpenAI Streaming
1. OpenAI Batch API
...and 10 more

SKIL (SKIL)

Count: 25 concepts (5.3%)

Concepts:

1. Skill
1. Skill Description
1. Skill Body
1. Skill Trigger
1. Skill Invocation
1. Skill Frontmatter
1. Skill Bundle
1. Bundled Script
1. Skill Asset File
1. Lazy Skill Loading
1. Eager Skill Listing
1. Task Decomposition
1. Task-Skill Binding
1. Skill Selection
1. Skill Misfire
...and 10 more

AB (AB)

Count: 25 concepts (5.3%)

Concepts:

1. A/B Testing
1. Hypothesis
1. Null Hypothesis
1. Control Group
1. Treatment Group
1. Traffic Split
1. Random Assignment
1. Stratified Assignment
1. Primary Metric
1. Guardrail Metric
1. Quality Metric
1. Cost Metric
1. Latency Metric
1. Satisfaction Metric
1. Sample Size Calculation
...and 10 more

GOOG (GOOG)

Count: 20 concepts (4.2%)

Concepts:

1. Google Gemini API
1. Gemini Model Family
1. Gemini Pro
1. Gemini Flash
1. Gemini Ultra
1. Gemini SDK
1. Long Context Window
1. One Million Context
1. Gemini Function Calling
1. Gemini Tool Config
1. Gemini Streaming
1. Gemini Batch Mode
1. Gemini Caching
1. Vertex AI
1. Google AI Studio
...and 5 more

CAP (CAP)

Count: 20 concepts (4.2%)

Concepts:

1. Baseline Cost Measurement
1. Optimization Hypothesis
1. Quality Regression Detection
1. Before-After Report
1. Optimization Backlog
1. Cost Reduction Target
1. Pilot Rollout
1. Canary Deployment
1. Token Dashboard Project
1. Vendor-Neutral Logging Project
1. Skill Refactor Project
1. Budget Policy Document
1. Engineering Manager Review
1. Cost Reduction Postmortem
1. Reproducible Benchmark
...and 5 more

PRIV (PRIV)

Count: 15 concepts (3.2%)

Concepts:

1. Data Privacy
1. PII Detection
1. Sensitive Field Redaction
1. Compliance Risk
1. GDPR
1. HIPAA
1. SOC2 Audit
1. Data Residency
1. Vendor Data Retention
1. Opt-Out Of Training
1. Logging Privacy Risk
1. Hashing Sensitive Strings
1. Tokenized Identifier
1. Audit Trail
1. Anonymization Strategy

Recommendations

✅ Excellent balance: Categories are evenly distributed (spread: 10.5%)
✅ MISC category minimal: Good categorization specificity

Educational Use Recommendations

Use taxonomy categories for color-coding in graph visualizations
Design curriculum modules based on taxonomy groupings
Create filtered views for focused learning paths
Use categories for assessment organization
Enable navigation by topic area in interactive tools

Report generated by learning-graph-reports/taxonomy_distribution.py