Course Description for Introduction to Data Science with Python
Course Duration: 10 weeks
Target Audience: Advanced high school students and college freshmen
Prerequisites: Basic algebra and introductory programming experience recommended
Course Overview
This introductory course provides students with foundational knowledge and practical skills in data science using Python. Through hands-on experience with interactive simulations MicroSims and real-world datasets, students will develop competency in data analysis, visualization, and predictive modeling. The course emphasizes the critical balance between model explainability and predictive accuracy, guiding students to identify the simplest effective solutions to data-driven problems.
Learning Objectives
By the end of this course, students will be able to:
Remember (Knowledge) - Recall fundamental data science terminology and concepts - Identify key Python libraries for data science (NumPy, pandas, matplotlib, PyTorch) - Recognize different types of data and measurement scales - List the steps in the data science workflow
Understand (Comprehension) - Explain the relationship between independent and dependent variables - Describe how linear regression models make predictions - Interpret basic statistical measures and visualizations - Summarize the trade-offs between model complexity and interpretability
Apply (Application) - Implement basic data cleaning and preprocessing techniques - Create visualizations using Python libraries - Build simple linear regression models - Execute standard data science workflows on new datasets
Analyze (Analysis) - Examine datasets to identify patterns and relationships - Compare different modeling approaches for the same problem - Distinguish between correlation and causation in data relationships - Evaluate model performance using appropriate metrics
Evaluate (Evaluation) - Assess the quality and reliability of data sources - Critique model assumptions and limitations - Judge the appropriateness of different models for specific problems - Validate model performance and identify potential overfitting
Create (Synthesis) - Design experiments to test hypotheses using data - Construct predictive models for real-world scenarios - Develop data-driven solutions to complex problems - Generate original insights from exploratory data analysis
Sample Weekly Schedule
Week 1: Foundations of Data Science
- Introduction to data science and its applications
- Setting up Python environment and Jupyter notebooks
- First MicroSim: Exploring sample datasets
- Basic data types and structures in Python
Week 2: Data Exploration and Visualization
- Loading and examining datasets with pandas
- Creating basic plots with matplotlib
- MicroSim: Interactive data visualization
- Identifying patterns in data through visual exploration
Week 3: Statistical Foundations
- Descriptive statistics and summary measures
- Understanding distributions and variability
- MicroSim: Statistical parameter exploration
- Introduction to probability concepts
Week 4: Simple Linear Regression
- Mathematical foundations of linear regression
- Implementing regression from scratch
- MicroSim: Interactive regression line fitting
- Interpreting coefficients and model output
Week 5: Model Evaluation and Validation
- Measuring model performance (R², MSE, MAE)
- Training and testing data splits
- MicroSim: Cross-validation simulation
- Understanding overfitting and underfitting
Week 6: Multiple Linear Regression
- Extending to multiple predictor variables
- Feature selection and engineering
- MicroSim: Multi-dimensional regression explorer
- Handling categorical variables
Week 7: Introduction to NumPy and Advanced Computation
- NumPy arrays and vectorized operations
- Matrix operations for regression
- MicroSim: Linear algebra visualization
- Computational efficiency in data science
Week 8: Non-linear Models and Regularization
- Polynomial regression and feature transformation
- Ridge and Lasso regularization
- MicroSim: Bias-variance trade-off explorer
- Model selection strategies
Week 9: Introduction to Machine Learning with PyTorch
- Neural networks and deep learning concepts
- Building simple networks with PyTorch
- MicroSim: Neural network playground
- Comparing traditional and deep learning approaches
Week 10: Capstone Project and Model Deployment
- End-to-end data science project
- Model interpretation and communication
- MicroSim: Model comparison dashboard
- Best practices and ethical considerations
Assessment Methods
Formative Assessment (60%) - Weekly MicroSim exercises and reflections (30%) - Homework assignments applying concepts to new datasets (20%) - Peer review activities and collaborative problem-solving (10%)
Summative Assessment (40%) - Midterm project: Complete data analysis report (15%) - Final capstone project: Original predictive modeling solution (20%) - Final examination covering theoretical concepts (5%)
Required Materials
- Computer with Python 3.8+ installed
- Access to interactive online textbook with MicroSims
- Jupyter Notebook environment
- Required Python packages: pandas, NumPy, matplotlib, scikit-learn, PyTorch
Key Learning Principles
Interactive Learning: Each week features hands-on MicroSims that allow students to manipulate parameters and observe results in real-time, reinforcing theoretical concepts through experiential learning.
Scaffolded Complexity: The course progresses systematically from simple linear relationships to complex neural networks, ensuring students build confidence before tackling advanced topics.
Explainable AI Focus: Throughout the course, emphasis is placed on understanding and interpreting models rather than simply achieving high accuracy, preparing students for ethical and transparent data science practice.
Real-world Applications: All examples and projects use authentic datasets and scenarios, helping students connect academic learning to practical problem-solving.
Course Philosophy
This course is built on the principle that effective data science requires both technical competence and critical thinking. Students will learn not just how to build predictive models, but when to use them, how to interpret their results, and how to communicate findings to diverse audiences. The integration of interactive simulations ensures that abstract mathematical concepts become concrete and intuitive, while the progression from simple to complex models helps students appreciate the value of parsimony in modeling.
By the end of this course, students will have developed both the technical skills and analytical mindset necessary for success in advanced data science coursework or entry-level positions in data-driven fields.