Taxonomy Distribution Report
Overview
- Total Concepts: 300
- Number of Taxonomies: 15
- Average Concepts per Taxonomy: 20.0
Distribution Summary
| Category | TaxonomyID | Count | Percentage | Status |
|---|---|---|---|---|
| Statistics | STATS | 30 | 10.0% | OK |
| Advanced Regression | ADVR | 30 | 10.0% | OK |
| Visualization | VIZ | 25 | 8.3% | OK |
| Regression | REGR | 25 | 8.3% | OK |
| Model Evaluation | EVAL | 25 | 8.3% | OK |
| Data Structures | DSTRC | 20 | 6.7% | OK |
| Data Cleaning | CLEAN | 20 | 6.7% | OK |
| Machine Learning | ML | 20 | 6.7% | OK |
| Neural Networks | NN | 20 | 6.7% | OK |
| PyTorch | TORCH | 20 | 6.7% | OK |
| Foundation Concepts | FOUND | 19 | 6.3% | OK |
| Python Environment | PYENV | 15 | 5.0% | OK |
| NumPy Computing | NUMPY | 15 | 5.0% | OK |
| Best Practices | BEST | 11 | 3.7% | OK |
| Projects | PROJ | 5 | 1.7% | Under |
Visual Distribution
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
Balance Analysis
No Over-Represented Categories
All categories are under the 30% threshold. Good balance!
Under-Represented Categories (<3%)
- Projects (PROJ): 5 concepts (1.7%)
- Note: Small categories are acceptable for specialized topics
Category Details
Statistics (STATS)
Count: 30 concepts (10.0%)
Concepts:
-
- Descriptive Statistics
-
- Mean
-
- Median
-
- Mode
-
- Range
-
- Variance
-
- Standard Deviation
-
- Quartiles
-
- Percentiles
-
- Interquartile Range
-
- Skewness
-
- Kurtosis
-
- Distribution
-
- Normal Distribution
-
- Probability
- ...and 15 more
Advanced Regression (ADVR)
Count: 30 concepts (10.0%)
Concepts:
-
- Multiple Linear Regression
-
- Multiple Predictors
-
- Multicollinearity
-
- Variance Inflation Factor
-
- Feature Selection
-
- Forward Selection
-
- Backward Elimination
-
- Stepwise Selection
-
- Categorical Variables
-
- Dummy Variables
-
- One-Hot Encoding
-
- Interaction Terms
-
- Polynomial Features
-
- Feature Engineering
-
- Feature Importance
- ...and 15 more
Visualization (VIZ)
Count: 25 concepts (8.3%)
Concepts:
-
- Data Visualization
-
- Matplotlib Library
-
- Figure
-
- Axes
-
- Plot Function
-
- Line Plot
-
- Scatter Plot
-
- Bar Chart
-
- Histogram
-
- Box Plot
-
- Pie Chart
-
- Subplot
-
- Figure Size
-
- Title
-
- Axis Labels
- ...and 10 more
Regression (REGR)
Count: 25 concepts (8.3%)
Concepts:
-
- Regression Analysis
-
- Linear Regression
-
- Simple Linear Regression
-
- Regression Line
-
- Slope
-
- Intercept
-
- Least Squares Method
-
- Residuals
-
- Sum of Squared Errors
-
- Ordinary Least Squares
-
- Regression Coefficients
-
- Coefficient Interpretation
-
- Prediction
-
- Fitted Values
-
- Regression Equation
- ...and 10 more
Model Evaluation (EVAL)
Count: 25 concepts (8.3%)
Concepts:
-
- Model Performance
-
- Training Data
-
- Testing Data
-
- Train Test Split
-
- Validation Data
-
- R-Squared
-
- Adjusted R-Squared
-
- Mean Squared Error
-
- Root Mean Squared Error
-
- Mean Absolute Error
-
- Residual Analysis
-
- Residual Plot
-
- Overfitting
-
- Underfitting
-
- Bias
- ...and 10 more
Data Structures (DSTRC)
Count: 20 concepts (6.7%)
Concepts:
-
- Lists
-
- Dictionaries
-
- Tuples
-
- Arrays
-
- Pandas Library
-
- DataFrame
-
- Series
-
- Index
-
- Column
-
- Row
-
- Data Loading
-
- CSV Files
-
- Read CSV
-
- Data Inspection
-
- Head Method
- ...and 5 more
Data Cleaning (CLEAN)
Count: 20 concepts (6.7%)
Concepts:
-
- Missing Values
-
- NaN
-
- Null Detection
-
- Dropna Method
-
- Fillna Method
-
- Imputation
-
- Data Type Conversion
-
- Duplicate Detection
-
- Duplicate Removal
-
- Outliers
-
- Outlier Detection
-
- Data Validation
-
- String Cleaning
-
- Column Renaming
-
- Data Filtering
- ...and 5 more
Machine Learning (ML)
Count: 20 concepts (6.7%)
Concepts:
-
- Machine Learning
-
- Supervised Learning
-
- Unsupervised Learning
-
- Classification
-
- Clustering
-
- Training Process
-
- Learning Algorithm
-
- Model Training
-
- Generalization
-
- Training Error
-
- Test Error
-
- Prediction Error
-
- Loss Function
-
- Cost Function
-
- Optimization
- ...and 5 more
Neural Networks (NN)
Count: 20 concepts (6.7%)
Concepts:
-
- Neural Networks
-
- Artificial Neuron
-
- Perceptron
-
- Activation Function
-
- Sigmoid Function
-
- ReLU Function
-
- Input Layer
-
- Hidden Layer
-
- Output Layer
-
- Weights
-
- Biases
-
- Forward Propagation
-
- Backpropagation
-
- Deep Learning
-
- Network Architecture
- ...and 5 more
PyTorch (TORCH)
Count: 20 concepts (6.7%)
Concepts:
-
- PyTorch Library
-
- Tensors
-
- Tensor Operations
-
- Autograd
-
- Automatic Differentiation
-
- Computational Graph
-
- Neural Network Module
-
- Sequential Model
-
- Linear Layer
-
- Loss Functions PyTorch
-
- Optimizer
-
- SGD Optimizer
-
- Adam Optimizer
-
- Training Loop
-
- Model Evaluation PyTorch
- ...and 5 more
Foundation Concepts (FOUND)
Count: 19 concepts (6.3%)
Concepts:
-
- Data Science
-
- Python Programming
-
- Data
-
- Variables
-
- Data Types
-
- Numerical Data
-
- Categorical Data
-
- Ordinal Data
-
- Nominal Data
-
- Measurement Scales
-
- Independent Variable
-
- Dependent Variable
-
- Dataset
-
- Observation
-
- Feature
- ...and 4 more
Python Environment (PYENV)
Count: 15 concepts (5.0%)
Concepts:
-
- Jupyter Notebooks
-
- Python Installation
-
- Package Management
-
- Pip
-
- Conda Environment
-
- Virtual Environment
-
- IDE Setup
-
- VS Code
-
- Notebook Cells
-
- Code Cell
-
- Markdown Cell
-
- Cell Execution
-
- Kernel
-
- Import Statement
-
- Python Libraries
NumPy Computing (NUMPY)
Count: 15 concepts (5.0%)
Concepts:
-
- NumPy Library
-
- NumPy Array
-
- Array Creation
-
- Array Shape
-
- Array Indexing
-
- Array Slicing
-
- Broadcasting
-
- Vectorized Operations
-
- Element-wise Operations
-
- Matrix Operations
-
- Dot Product
-
- Matrix Multiplication
-
- Transpose
-
- Linear Algebra
-
- Computational Efficiency
Best Practices (BEST)
Count: 11 concepts (3.7%)
Concepts:
-
- Documentation
-
- Explainable AI
-
- Model Interpretability
-
- Feature Importance Analysis
-
- SHAP Values
-
- Model Documentation
-
- Reproducibility
-
- Random Seed
-
- Version Control
-
- Git
-
- Data Ethics
Projects (PROJ)
Count: 5 concepts (1.7%)
Concepts:
-
- Capstone Project
-
- End-to-End Pipeline
-
- Model Deployment
-
- Results Communication
-
- Data-Driven Decisions
Recommendations
- Excellent balance: Categories are evenly distributed (spread: 8.3%)
- MISC category minimal: Good categorization specificity
Educational Use Recommendations
- Use taxonomy categories for color-coding in graph visualizations
- Design curriculum modules based on taxonomy groupings
- Create filtered views for focused learning paths
- Use categories for assessment organization
- Enable navigation by topic area in interactive tools
Report generated by taxonomy-distribution.py