Skip to content

Course Chapters - Table of Contents

This course provides a comprehensive introduction to data science using Python, progressing from foundational concepts to advanced machine learning techniques. Each chapter includes interactive MicroSims to reinforce learning through hands-on experience.

Course Structure Overview

Duration: 10 weeks
Target Audience: Advanced high school students and college freshmen
Prerequisites: Basic algebra and introductory programming experience

Chapter Progression

Foundation Phase (Weeks 1-3)

Chapter 0: Setup

  • Python environment and Jupyter notebooks setup
  • Conda virtual environment configuration
  • Required package installation
  • Development environment best practices

Chapter 1: Foundations of Data Science

  • Introduction to data science and its applications
  • Setting up Python environment and Jupyter notebooks
  • First MicroSim: Exploring sample datasets
  • Basic data types and structures in Python
  • Understanding the data science workflow
  • Data science roles and career paths
  • Ethics and best practices in data science

Chapter 2: Data Exploration and Visualization

  • Loading and examining datasets with pandas
  • Creating basic plots with matplotlib
  • MicroSim: Interactive data visualization
  • Identifying patterns in data through visual exploration
  • Data profiling and quality assessment
  • Handling missing values and outliers
  • Exploratory data analysis techniques

Chapter 3: Data Visualization Techniques

  • Principles of effective data visualization
  • Matplotlib fundamentals and customization
  • Plotly for interactive visualizations
  • Statistical plots and distributions
  • Time series visualization
  • Multi-dimensional data representation
  • MicroSim: Visualization parameter explorer

Statistical Foundation Phase (Weeks 4-5)

Chapter 4: Statistical Foundations

  • Descriptive statistics and summary measures
  • Understanding distributions and variability
  • MicroSim: Statistical parameter exploration
  • Introduction to probability concepts
  • Central limit theorem and sampling distributions
  • Hypothesis testing fundamentals
  • Correlation vs. causation

Chapter 5: Simple Linear Regression

  • Mathematical foundations of linear regression
  • Implementing regression from scratch
  • MicroSim: Interactive regression line fitting
  • Interpreting coefficients and model output
  • Assumptions of linear regression
  • Residual analysis and diagnostics
  • Making predictions with linear models

Model Development Phase (Weeks 6-8)

Chapter 6: Model Evaluation and Validation

  • Measuring model performance (R², MSE, MAE)
  • Training and testing data splits
  • MicroSim: Cross-validation simulation
  • Understanding overfitting and underfitting
  • Bias-variance trade-off
  • Model selection criteria
  • Performance metrics for different problem types

Chapter 7: Multiple Linear Regression

  • Extending to multiple predictor variables
  • Feature selection and engineering
  • MicroSim: Multi-dimensional regression explorer
  • Handling categorical variables
  • Interaction effects and polynomial terms
  • Multicollinearity detection and treatment
  • Model interpretation in multiple dimensions

Chapter 8: Introduction to NumPy and Advanced Computation

  • NumPy arrays and vectorized operations
  • Matrix operations for regression
  • MicroSim: Linear algebra visualization
  • Computational efficiency in data science
  • Broadcasting and array manipulation
  • Mathematical functions and statistics
  • Integration with pandas and matplotlib

Advanced Modeling Phase (Weeks 9-10)

Chapter 9: Non-linear Models and Feature Engineering

  • Polynomial regression and feature transformation
  • Understanding non-linear relationships
  • MicroSim: Polynomial degree explorer
  • Feature engineering techniques
  • Basis functions and kernel methods
  • Model complexity and interpretation trade-offs
  • When to use non-linear approaches

Chapter 10: Regularization Techniques

  • Ridge and Lasso regularization
  • MicroSim: Bias-variance trade-off explorer
  • Model selection strategies
  • Cross-validation for hyperparameter tuning
  • Elastic Net and other regularization methods
  • Feature selection through regularization
  • Preventing overfitting in complex models

Machine Learning Phase (Advanced Topics)

Chapter 11: Introduction to Machine Learning

  • Supervised vs. unsupervised learning
  • Classification and regression problems
  • Decision trees and ensemble methods
  • MicroSim: Algorithm comparison explorer
  • Feature importance and selection
  • Model interpretability techniques
  • Introduction to scikit-learn

Chapter 12: Neural Networks and Deep Learning

  • Neural networks and deep learning concepts
  • Perceptrons and multi-layer networks
  • Activation functions and backpropagation
  • MicroSim: Neural network playground
  • Training neural networks
  • Common architectures and applications
  • When to use neural networks vs. traditional methods

Chapter 13: Introduction to Machine Learning with PyTorch

  • Building simple networks with PyTorch
  • Tensors and automatic differentiation
  • Creating and training models
  • MicroSim: PyTorch model builder
  • Comparing traditional and deep learning approaches
  • GPU acceleration and optimization
  • Model saving and deployment

Chapter 14: Advanced Model Evaluation

  • Comprehensive performance metrics
  • ROC curves and AUC analysis
  • Confusion matrices and classification reports
  • MicroSim: Metric comparison explorer
  • Statistical significance testing
  • Model comparison techniques
  • Reporting and communicating results

Chapter 15: Capstone Project and Model Deployment

  • End-to-end data science project planning
  • Model interpretation and communication
  • MicroSim: Model comparison dashboard
  • Best practices and ethical considerations
  • Model deployment strategies
  • Documentation and reproducibility
  • Presenting data science findings

Special Topics

Matplotlib vs Plotly Comparison

Detailed comparison of visualization libraries for AI-generated plots and animations, including pros and cons for different use cases.

Learning Methodology

Each chapter incorporates: - Interactive MicroSims for hands-on parameter exploration - Real-world datasets and practical applications - Progressive complexity building from simple to advanced concepts - Explainable AI focus emphasizing model interpretability - Code examples with complete implementations

Course Philosophy

This course emphasizes the balance between model explainability and predictive accuracy, guiding students to identify the simplest effective solutions to data-driven problems. The integration of interactive simulations ensures abstract mathematical concepts become concrete and intuitive.