Concept Taxonomy
This document defines the categorical taxonomy for organizing the 300 concepts in the learning graph.
Taxonomy Categories
| Category Name | TaxonomyID | Description |
|---|---|---|
| Foundation Concepts | FOUND | Core data science concepts, definitions, and fundamental terminology |
| Python Environment | PYENV | Python installation, setup, IDE, package management, and development tools |
| Data Structures | DSTRC | Python and pandas data structures including DataFrames, lists, arrays |
| Data Cleaning | CLEAN | Data preprocessing, handling missing values, validation, and transformation |
| Visualization | VIZ | Data visualization concepts, matplotlib, plotting techniques |
| Statistics | STATS | Statistical concepts, measures, distributions, and probability |
| Regression | REGR | Linear regression, model fitting, coefficients, and assumptions |
| Model Evaluation | EVAL | Model performance metrics, validation, overfitting, cross-validation |
| Advanced Regression | ADVR | Multiple regression, feature selection, regularization techniques |
| NumPy Computing | NUMPY | NumPy arrays, matrix operations, linear algebra, vectorization |
| Machine Learning | ML | Machine learning concepts, training, optimization, gradient descent |
| Neural Networks | NN | Neural network architecture, layers, activation functions, deep learning |
| PyTorch | TORCH | PyTorch library, tensors, autograd, training loops |
| Best Practices | BEST | Explainability, reproducibility, ethics, documentation, version control |
| Projects | PROJ | Capstone projects, end-to-end pipelines, deployment, communication |
Category Descriptions
FOUND - Foundation Concepts
Core concepts that introduce data science terminology and fundamental ideas. These are typically the first concepts students encounter and form the basis for all other learning.
PYENV - Python Environment
Concepts related to setting up and managing the Python development environment, including installation, package managers, virtual environments, and IDEs.
DSTRC - Data Structures
Python native data structures (lists, dictionaries, tuples) and pandas structures (DataFrame, Series). These are essential for data manipulation.
CLEAN - Data Cleaning
Techniques for preparing data for analysis including handling missing values, detecting outliers, removing duplicates, and transforming data.
VIZ - Visualization
Data visualization concepts using matplotlib and seaborn, including various plot types, customization, and best practices for visual communication.
STATS - Statistics
Statistical foundations including descriptive statistics, probability, distributions, correlation, and hypothesis testing.
REGR - Regression
Linear regression concepts from simple to multiple regression, including model fitting, interpretation, and assumptions.
EVAL - Model Evaluation
Techniques for assessing model performance, including metrics, train/test splits, cross-validation, and understanding overfitting/underfitting.
ADVR - Advanced Regression
Advanced modeling techniques including multiple regression, feature engineering, regularization (Ridge, Lasso), and non-linear models.
NUMPY - NumPy Computing
NumPy library concepts including array operations, broadcasting, vectorization, and linear algebra for efficient computation.
ML - Machine Learning
Introduction to machine learning paradigms, supervised/unsupervised learning, optimization algorithms, and gradient descent.
NN - Neural Networks
Artificial neural network concepts including architecture, activation functions, forward/backward propagation, and deep learning.
TORCH - PyTorch
PyTorch-specific concepts including tensors, autograd, neural network modules, optimizers, and training workflows.
BEST - Best Practices
Professional practices including explainability, reproducibility, documentation, version control, and ethical considerations.
PROJ - Projects
Applied concepts for real-world projects including end-to-end pipelines, model deployment, and communicating results.