Concept List for Introduction to Data Science with Python
This document contains 200 concepts for the learning graph. Each concept is numbered with a unique ConceptID.
Foundational Concepts (1-20)
- Data Science
- Python Programming
- Jupyter Notebooks
- Data
- Variables
- Data Types
- Numerical Data
- Categorical Data
- Ordinal Data
- Nominal Data
- Measurement Scales
- Independent Variable
- Dependent Variable
- Dataset
- Observation
- Feature
- Target Variable
- Data Science Workflow
- Problem Definition
- Data Collection
Python Environment (21-35)
- Python Installation
- Package Management
- Pip
- Conda Environment
- Virtual Environment
- IDE Setup
- VS Code
- Notebook Cells
- Code Cell
- Markdown Cell
- Cell Execution
- Kernel
- Import Statement
- Python Libraries
- Documentation
Data Structures (36-55)
- Lists
- Dictionaries
- Tuples
- Arrays
- Pandas Library
- DataFrame
- Series
- Index
- Column
- Row
- Data Loading
- CSV Files
- Read CSV
- Data Inspection
- Head Method
- Tail Method
- Shape Attribute
- Info Method
- Describe Method
- Data Selection
Data Cleaning (56-75)
- Missing Values
- NaN
- Null Detection
- Dropna Method
- Fillna Method
- Imputation
- Data Type Conversion
- Duplicate Detection
- Duplicate Removal
- Outliers
- Outlier Detection
- Data Validation
- String Cleaning
- Column Renaming
- Data Filtering
- Boolean Indexing
- Query Method
- Data Transformation
- Feature Scaling
- Normalization
Visualization Basics (76-100)
- Data Visualization
- Matplotlib Library
- Figure
- Axes
- Plot Function
- Line Plot
- Scatter Plot
- Bar Chart
- Histogram
- Box Plot
- Pie Chart
- Subplot
- Figure Size
- Title
- Axis Labels
- Legend
- Color
- Markers
- Line Styles
- Grid
- Annotations
- Save Figure
- Plot Customization
- Seaborn Library
- Statistical Plots
Statistics Foundations (101-130)
- Descriptive Statistics
- Mean
- Median
- Mode
- Range
- Variance
- Standard Deviation
- Quartiles
- Percentiles
- Interquartile Range
- Skewness
- Kurtosis
- Distribution
- Normal Distribution
- Probability
- Random Variables
- Expected Value
- Sample
- Population
- Sampling
- Central Limit Theorem
- Confidence Interval
- Hypothesis Testing
- P-Value
- Statistical Significance
- Correlation
- Covariance
- Pearson Correlation
- Spearman Correlation
- Correlation Matrix
Linear Regression (131-155)
- Regression Analysis
- Linear Regression
- Simple Linear Regression
- Regression Line
- Slope
- Intercept
- Least Squares Method
- Residuals
- Sum of Squared Errors
- Ordinary Least Squares
- Regression Coefficients
- Coefficient Interpretation
- Prediction
- Fitted Values
- Regression Equation
- Line of Best Fit
- Assumptions of Regression
- Linearity Assumption
- Homoscedasticity
- Independence Assumption
- Normality of Residuals
- Scikit-learn Library
- LinearRegression Class
- Fit Method
- Predict Method
Model Evaluation (156-180)
- Model Performance
- Training Data
- Testing Data
- Train Test Split
- Validation Data
- R-Squared
- Adjusted R-Squared
- Mean Squared Error
- Root Mean Squared Error
- Mean Absolute Error
- Residual Analysis
- Residual Plot
- Overfitting
- Underfitting
- Bias
- Variance
- Bias-Variance Tradeoff
- Model Complexity
- Cross-Validation
- K-Fold Cross-Validation
- Leave One Out CV
- Holdout Method
- Model Selection
- Hyperparameters
- Model Comparison
Multiple Regression (181-195)
- Multiple Linear Regression
- Multiple Predictors
- Multicollinearity
- Variance Inflation Factor
- Feature Selection
- Forward Selection
- Backward Elimination
- Stepwise Selection
- Categorical Variables
- Dummy Variables
- One-Hot Encoding
- Interaction Terms
- Polynomial Features
- Feature Engineering
- Feature Importance
NumPy (196-210)
- NumPy Library
- NumPy Array
- Array Creation
- Array Shape
- Array Indexing
- Array Slicing
- Broadcasting
- Vectorized Operations
- Element-wise Operations
- Matrix Operations
- Dot Product
- Matrix Multiplication
- Transpose
- Linear Algebra
- Computational Efficiency
Non-linear Models (211-225)
- Non-linear Regression
- Polynomial Regression
- Degree of Polynomial
- Curve Fitting
- Transformation
- Log Transformation
- Feature Transformation
- Model Flexibility
- Regularization
- Ridge Regression
- Lasso Regression
- Elastic Net
- Regularization Parameter
- Lambda Parameter
- Shrinkage
Machine Learning Intro (226-245)
- Machine Learning
- Supervised Learning
- Unsupervised Learning
- Classification
- Clustering
- Training Process
- Learning Algorithm
- Model Training
- Generalization
- Training Error
- Test Error
- Prediction Error
- Loss Function
- Cost Function
- Optimization
- Gradient Descent
- Learning Rate
- Convergence
- Local Minimum
- Global Minimum
Neural Networks (246-265)
- Neural Networks
- Artificial Neuron
- Perceptron
- Activation Function
- Sigmoid Function
- ReLU Function
- Input Layer
- Hidden Layer
- Output Layer
- Weights
- Biases
- Forward Propagation
- Backpropagation
- Deep Learning
- Network Architecture
- Epochs
- Batch Size
- Mini-batch
- Stochastic Gradient
- Vanishing Gradient
PyTorch (266-285)
- PyTorch Library
- Tensors
- Tensor Operations
- Autograd
- Automatic Differentiation
- Computational Graph
- Neural Network Module
- Sequential Model
- Linear Layer
- Loss Functions PyTorch
- Optimizer
- SGD Optimizer
- Adam Optimizer
- Training Loop
- Model Evaluation PyTorch
- GPU Computing
- CUDA
- Model Saving
- Model Loading
- Transfer Learning
Advanced Topics (286-295)
- Explainable AI
- Model Interpretability
- Feature Importance Analysis
- SHAP Values
- Model Documentation
- Reproducibility
- Random Seed
- Version Control
- Git
- Data Ethics
Projects and Applications (296-300)
- Capstone Project
- End-to-End Pipeline
- Model Deployment
- Results Communication
- Data-Driven Decisions