Skip to content

Taxonomy Distribution Report

Overview

  • Total Concepts: 300
  • Number of Taxonomies: 15
  • Average Concepts per Taxonomy: 20.0

Distribution Summary

Category TaxonomyID Count Percentage Status
Statistics STATS 30 10.0% OK
Advanced Regression ADVR 30 10.0% OK
Visualization VIZ 25 8.3% OK
Regression REGR 25 8.3% OK
Model Evaluation EVAL 25 8.3% OK
Data Structures DSTRC 20 6.7% OK
Data Cleaning CLEAN 20 6.7% OK
Machine Learning ML 20 6.7% OK
Neural Networks NN 20 6.7% OK
PyTorch TORCH 20 6.7% OK
Foundation Concepts FOUND 19 6.3% OK
Python Environment PYENV 15 5.0% OK
NumPy Computing NUMPY 15 5.0% OK
Best Practices BEST 11 3.7% OK
Projects PROJ 5 1.7% Under

Visual Distribution

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
STATS  #####  30 ( 10.0%)
ADVR   #####  30 ( 10.0%)
VIZ    ####  25 (  8.3%)
REGR   ####  25 (  8.3%)
EVAL   ####  25 (  8.3%)
DSTRC  ###  20 (  6.7%)
CLEAN  ###  20 (  6.7%)
ML     ###  20 (  6.7%)
NN     ###  20 (  6.7%)
TORCH  ###  20 (  6.7%)
FOUND  ###  19 (  6.3%)
PYENV  ##  15 (  5.0%)
NUMPY  ##  15 (  5.0%)
BEST   #  11 (  3.7%)
PROJ      5 (  1.7%)

Balance Analysis

No Over-Represented Categories

All categories are under the 30% threshold. Good balance!

Under-Represented Categories (<3%)

  • Projects (PROJ): 5 concepts (1.7%)
  • Note: Small categories are acceptable for specialized topics

Category Details

Statistics (STATS)

Count: 30 concepts (10.0%)

Concepts:

    1. Descriptive Statistics
    1. Mean
    1. Median
    1. Mode
    1. Range
    1. Variance
    1. Standard Deviation
    1. Quartiles
    1. Percentiles
    1. Interquartile Range
    1. Skewness
    1. Kurtosis
    1. Distribution
    1. Normal Distribution
    1. Probability
  • ...and 15 more

Advanced Regression (ADVR)

Count: 30 concepts (10.0%)

Concepts:

    1. Multiple Linear Regression
    1. Multiple Predictors
    1. Multicollinearity
    1. Variance Inflation Factor
    1. Feature Selection
    1. Forward Selection
    1. Backward Elimination
    1. Stepwise Selection
    1. Categorical Variables
    1. Dummy Variables
    1. One-Hot Encoding
    1. Interaction Terms
    1. Polynomial Features
    1. Feature Engineering
    1. Feature Importance
  • ...and 15 more

Visualization (VIZ)

Count: 25 concepts (8.3%)

Concepts:

    1. Data Visualization
    1. Matplotlib Library
    1. Figure
    1. Axes
    1. Plot Function
    1. Line Plot
    1. Scatter Plot
    1. Bar Chart
    1. Histogram
    1. Box Plot
    1. Pie Chart
    1. Subplot
    1. Figure Size
    1. Title
    1. Axis Labels
  • ...and 10 more

Regression (REGR)

Count: 25 concepts (8.3%)

Concepts:

    1. Regression Analysis
    1. Linear Regression
    1. Simple Linear Regression
    1. Regression Line
    1. Slope
    1. Intercept
    1. Least Squares Method
    1. Residuals
    1. Sum of Squared Errors
    1. Ordinary Least Squares
    1. Regression Coefficients
    1. Coefficient Interpretation
    1. Prediction
    1. Fitted Values
    1. Regression Equation
  • ...and 10 more

Model Evaluation (EVAL)

Count: 25 concepts (8.3%)

Concepts:

    1. Model Performance
    1. Training Data
    1. Testing Data
    1. Train Test Split
    1. Validation Data
    1. R-Squared
    1. Adjusted R-Squared
    1. Mean Squared Error
    1. Root Mean Squared Error
    1. Mean Absolute Error
    1. Residual Analysis
    1. Residual Plot
    1. Overfitting
    1. Underfitting
    1. Bias
  • ...and 10 more

Data Structures (DSTRC)

Count: 20 concepts (6.7%)

Concepts:

    1. Lists
    1. Dictionaries
    1. Tuples
    1. Arrays
    1. Pandas Library
    1. DataFrame
    1. Series
    1. Index
    1. Column
    1. Row
    1. Data Loading
    1. CSV Files
    1. Read CSV
    1. Data Inspection
    1. Head Method
  • ...and 5 more

Data Cleaning (CLEAN)

Count: 20 concepts (6.7%)

Concepts:

    1. Missing Values
    1. NaN
    1. Null Detection
    1. Dropna Method
    1. Fillna Method
    1. Imputation
    1. Data Type Conversion
    1. Duplicate Detection
    1. Duplicate Removal
    1. Outliers
    1. Outlier Detection
    1. Data Validation
    1. String Cleaning
    1. Column Renaming
    1. Data Filtering
  • ...and 5 more

Machine Learning (ML)

Count: 20 concepts (6.7%)

Concepts:

    1. Machine Learning
    1. Supervised Learning
    1. Unsupervised Learning
    1. Classification
    1. Clustering
    1. Training Process
    1. Learning Algorithm
    1. Model Training
    1. Generalization
    1. Training Error
    1. Test Error
    1. Prediction Error
    1. Loss Function
    1. Cost Function
    1. Optimization
  • ...and 5 more

Neural Networks (NN)

Count: 20 concepts (6.7%)

Concepts:

    1. Neural Networks
    1. Artificial Neuron
    1. Perceptron
    1. Activation Function
    1. Sigmoid Function
    1. ReLU Function
    1. Input Layer
    1. Hidden Layer
    1. Output Layer
    1. Weights
    1. Biases
    1. Forward Propagation
    1. Backpropagation
    1. Deep Learning
    1. Network Architecture
  • ...and 5 more

PyTorch (TORCH)

Count: 20 concepts (6.7%)

Concepts:

    1. PyTorch Library
    1. Tensors
    1. Tensor Operations
    1. Autograd
    1. Automatic Differentiation
    1. Computational Graph
    1. Neural Network Module
    1. Sequential Model
    1. Linear Layer
    1. Loss Functions PyTorch
    1. Optimizer
    1. SGD Optimizer
    1. Adam Optimizer
    1. Training Loop
    1. Model Evaluation PyTorch
  • ...and 5 more

Foundation Concepts (FOUND)

Count: 19 concepts (6.3%)

Concepts:

    1. Data Science
    1. Python Programming
    1. Data
    1. Variables
    1. Data Types
    1. Numerical Data
    1. Categorical Data
    1. Ordinal Data
    1. Nominal Data
    1. Measurement Scales
    1. Independent Variable
    1. Dependent Variable
    1. Dataset
    1. Observation
    1. Feature
  • ...and 4 more

Python Environment (PYENV)

Count: 15 concepts (5.0%)

Concepts:

    1. Jupyter Notebooks
    1. Python Installation
    1. Package Management
    1. Pip
    1. Conda Environment
    1. Virtual Environment
    1. IDE Setup
    1. VS Code
    1. Notebook Cells
    1. Code Cell
    1. Markdown Cell
    1. Cell Execution
    1. Kernel
    1. Import Statement
    1. Python Libraries

NumPy Computing (NUMPY)

Count: 15 concepts (5.0%)

Concepts:

    1. NumPy Library
    1. NumPy Array
    1. Array Creation
    1. Array Shape
    1. Array Indexing
    1. Array Slicing
    1. Broadcasting
    1. Vectorized Operations
    1. Element-wise Operations
    1. Matrix Operations
    1. Dot Product
    1. Matrix Multiplication
    1. Transpose
    1. Linear Algebra
    1. Computational Efficiency

Best Practices (BEST)

Count: 11 concepts (3.7%)

Concepts:

    1. Documentation
    1. Explainable AI
    1. Model Interpretability
    1. Feature Importance Analysis
    1. SHAP Values
    1. Model Documentation
    1. Reproducibility
    1. Random Seed
    1. Version Control
    1. Git
    1. Data Ethics

Projects (PROJ)

Count: 5 concepts (1.7%)

Concepts:

    1. Capstone Project
    1. End-to-End Pipeline
    1. Model Deployment
    1. Results Communication
    1. Data-Driven Decisions

Recommendations

  • Excellent balance: Categories are evenly distributed (spread: 8.3%)
  • MISC category minimal: Good categorization specificity

Educational Use Recommendations

  • Use taxonomy categories for color-coding in graph visualizations
  • Design curriculum modules based on taxonomy groupings
  • Create filtered views for focused learning paths
  • Use categories for assessment organization
  • Enable navigation by topic area in interactive tools

Report generated by taxonomy-distribution.py