Course Chapters - Table of Contents
This course provides a comprehensive introduction to data science using Python, progressing from foundational concepts to advanced machine learning techniques. Each chapter includes interactive MicroSims to reinforce learning through hands-on experience.
Course Structure Overview
Duration: 10 weeks
Target Audience: Advanced high school students and college freshmen
Prerequisites: Basic algebra and introductory programming experience
Chapter Progression
Foundation Phase (Weeks 1-3)
Chapter 0: Setup
- Python environment and Jupyter notebooks setup
- Conda virtual environment configuration
- Required package installation
- Development environment best practices
Chapter 1: Foundations of Data Science
- Introduction to data science and its applications
- Setting up Python environment and Jupyter notebooks
- First MicroSim: Exploring sample datasets
- Basic data types and structures in Python
- Understanding the data science workflow
- Data science roles and career paths
- Ethics and best practices in data science
Chapter 2: Data Exploration and Visualization
- Loading and examining datasets with pandas
- Creating basic plots with matplotlib
- MicroSim: Interactive data visualization
- Identifying patterns in data through visual exploration
- Data profiling and quality assessment
- Handling missing values and outliers
- Exploratory data analysis techniques
Chapter 3: Data Visualization Techniques
- Principles of effective data visualization
- Matplotlib fundamentals and customization
- Plotly for interactive visualizations
- Statistical plots and distributions
- Time series visualization
- Multi-dimensional data representation
- MicroSim: Visualization parameter explorer
Statistical Foundation Phase (Weeks 4-5)
Chapter 4: Statistical Foundations
- Descriptive statistics and summary measures
- Understanding distributions and variability
- MicroSim: Statistical parameter exploration
- Introduction to probability concepts
- Central limit theorem and sampling distributions
- Hypothesis testing fundamentals
- Correlation vs. causation
Chapter 5: Simple Linear Regression
- Mathematical foundations of linear regression
- Implementing regression from scratch
- MicroSim: Interactive regression line fitting
- Interpreting coefficients and model output
- Assumptions of linear regression
- Residual analysis and diagnostics
- Making predictions with linear models
Model Development Phase (Weeks 6-8)
Chapter 6: Model Evaluation and Validation
- Measuring model performance (R², MSE, MAE)
- Training and testing data splits
- MicroSim: Cross-validation simulation
- Understanding overfitting and underfitting
- Bias-variance trade-off
- Model selection criteria
- Performance metrics for different problem types
Chapter 7: Multiple Linear Regression
- Extending to multiple predictor variables
- Feature selection and engineering
- MicroSim: Multi-dimensional regression explorer
- Handling categorical variables
- Interaction effects and polynomial terms
- Multicollinearity detection and treatment
- Model interpretation in multiple dimensions
Chapter 8: Introduction to NumPy and Advanced Computation
- NumPy arrays and vectorized operations
- Matrix operations for regression
- MicroSim: Linear algebra visualization
- Computational efficiency in data science
- Broadcasting and array manipulation
- Mathematical functions and statistics
- Integration with pandas and matplotlib
Advanced Modeling Phase (Weeks 9-10)
Chapter 9: Non-linear Models and Feature Engineering
- Polynomial regression and feature transformation
- Understanding non-linear relationships
- MicroSim: Polynomial degree explorer
- Feature engineering techniques
- Basis functions and kernel methods
- Model complexity and interpretation trade-offs
- When to use non-linear approaches
Chapter 10: Regularization Techniques
- Ridge and Lasso regularization
- MicroSim: Bias-variance trade-off explorer
- Model selection strategies
- Cross-validation for hyperparameter tuning
- Elastic Net and other regularization methods
- Feature selection through regularization
- Preventing overfitting in complex models
Machine Learning Phase (Advanced Topics)
Chapter 11: Introduction to Machine Learning
- Supervised vs. unsupervised learning
- Classification and regression problems
- Decision trees and ensemble methods
- MicroSim: Algorithm comparison explorer
- Feature importance and selection
- Model interpretability techniques
- Introduction to scikit-learn
Chapter 12: Neural Networks and Deep Learning
- Neural networks and deep learning concepts
- Perceptrons and multi-layer networks
- Activation functions and backpropagation
- MicroSim: Neural network playground
- Training neural networks
- Common architectures and applications
- When to use neural networks vs. traditional methods
Chapter 13: Introduction to Machine Learning with PyTorch
- Building simple networks with PyTorch
- Tensors and automatic differentiation
- Creating and training models
- MicroSim: PyTorch model builder
- Comparing traditional and deep learning approaches
- GPU acceleration and optimization
- Model saving and deployment
Chapter 14: Advanced Model Evaluation
- Comprehensive performance metrics
- ROC curves and AUC analysis
- Confusion matrices and classification reports
- MicroSim: Metric comparison explorer
- Statistical significance testing
- Model comparison techniques
- Reporting and communicating results
Chapter 15: Capstone Project and Model Deployment
- End-to-end data science project planning
- Model interpretation and communication
- MicroSim: Model comparison dashboard
- Best practices and ethical considerations
- Model deployment strategies
- Documentation and reproducibility
- Presenting data science findings
Special Topics
Matplotlib vs Plotly Comparison
Detailed comparison of visualization libraries for AI-generated plots and animations, including pros and cons for different use cases.
Learning Methodology
Each chapter incorporates: - Interactive MicroSims for hands-on parameter exploration - Real-world datasets and practical applications - Progressive complexity building from simple to advanced concepts - Explainable AI focus emphasizing model interpretability - Code examples with complete implementations
Course Philosophy
This course emphasizes the balance between model explainability and predictive accuracy, guiding students to identify the simplest effective solutions to data-driven problems. The integration of interactive simulations ensures abstract mathematical concepts become concrete and intuitive.