Skip to content

Concept Taxonomy

This document defines the categorical taxonomy for organizing the 300 concepts in the learning graph.

Taxonomy Categories

Category Name TaxonomyID Description
Foundation Concepts FOUND Core data science concepts, definitions, and fundamental terminology
Python Environment PYENV Python installation, setup, IDE, package management, and development tools
Data Structures DSTRC Python and pandas data structures including DataFrames, lists, arrays
Data Cleaning CLEAN Data preprocessing, handling missing values, validation, and transformation
Visualization VIZ Data visualization concepts, matplotlib, plotting techniques
Statistics STATS Statistical concepts, measures, distributions, and probability
Regression REGR Linear regression, model fitting, coefficients, and assumptions
Model Evaluation EVAL Model performance metrics, validation, overfitting, cross-validation
Advanced Regression ADVR Multiple regression, feature selection, regularization techniques
NumPy Computing NUMPY NumPy arrays, matrix operations, linear algebra, vectorization
Machine Learning ML Machine learning concepts, training, optimization, gradient descent
Neural Networks NN Neural network architecture, layers, activation functions, deep learning
PyTorch TORCH PyTorch library, tensors, autograd, training loops
Best Practices BEST Explainability, reproducibility, ethics, documentation, version control
Projects PROJ Capstone projects, end-to-end pipelines, deployment, communication

Category Descriptions

FOUND - Foundation Concepts

Core concepts that introduce data science terminology and fundamental ideas. These are typically the first concepts students encounter and form the basis for all other learning.

PYENV - Python Environment

Concepts related to setting up and managing the Python development environment, including installation, package managers, virtual environments, and IDEs.

DSTRC - Data Structures

Python native data structures (lists, dictionaries, tuples) and pandas structures (DataFrame, Series). These are essential for data manipulation.

CLEAN - Data Cleaning

Techniques for preparing data for analysis including handling missing values, detecting outliers, removing duplicates, and transforming data.

VIZ - Visualization

Data visualization concepts using matplotlib and seaborn, including various plot types, customization, and best practices for visual communication.

STATS - Statistics

Statistical foundations including descriptive statistics, probability, distributions, correlation, and hypothesis testing.

REGR - Regression

Linear regression concepts from simple to multiple regression, including model fitting, interpretation, and assumptions.

EVAL - Model Evaluation

Techniques for assessing model performance, including metrics, train/test splits, cross-validation, and understanding overfitting/underfitting.

ADVR - Advanced Regression

Advanced modeling techniques including multiple regression, feature engineering, regularization (Ridge, Lasso), and non-linear models.

NUMPY - NumPy Computing

NumPy library concepts including array operations, broadcasting, vectorization, and linear algebra for efficient computation.

ML - Machine Learning

Introduction to machine learning paradigms, supervised/unsupervised learning, optimization algorithms, and gradient descent.

NN - Neural Networks

Artificial neural network concepts including architecture, activation functions, forward/backward propagation, and deep learning.

TORCH - PyTorch

PyTorch-specific concepts including tensors, autograd, neural network modules, optimizers, and training workflows.

BEST - Best Practices

Professional practices including explainability, reproducibility, documentation, version control, and ethical considerations.

PROJ - Projects

Applied concepts for real-world projects including end-to-end pipelines, model deployment, and communicating results.