Skip to content

Concept List for Introduction to Data Science with Python

This document contains 200 concepts for the learning graph. Each concept is numbered with a unique ConceptID.

Foundational Concepts (1-20)

  1. Data Science
  2. Python Programming
  3. Jupyter Notebooks
  4. Data
  5. Variables
  6. Data Types
  7. Numerical Data
  8. Categorical Data
  9. Ordinal Data
  10. Nominal Data
  11. Measurement Scales
  12. Independent Variable
  13. Dependent Variable
  14. Dataset
  15. Observation
  16. Feature
  17. Target Variable
  18. Data Science Workflow
  19. Problem Definition
  20. Data Collection

Python Environment (21-35)

  1. Python Installation
  2. Package Management
  3. Pip
  4. Conda Environment
  5. Virtual Environment
  6. IDE Setup
  7. VS Code
  8. Notebook Cells
  9. Code Cell
  10. Markdown Cell
  11. Cell Execution
  12. Kernel
  13. Import Statement
  14. Python Libraries
  15. Documentation

Data Structures (36-55)

  1. Lists
  2. Dictionaries
  3. Tuples
  4. Arrays
  5. Pandas Library
  6. DataFrame
  7. Series
  8. Index
  9. Column
  10. Row
  11. Data Loading
  12. CSV Files
  13. Read CSV
  14. Data Inspection
  15. Head Method
  16. Tail Method
  17. Shape Attribute
  18. Info Method
  19. Describe Method
  20. Data Selection

Data Cleaning (56-75)

  1. Missing Values
  2. NaN
  3. Null Detection
  4. Dropna Method
  5. Fillna Method
  6. Imputation
  7. Data Type Conversion
  8. Duplicate Detection
  9. Duplicate Removal
  10. Outliers
  11. Outlier Detection
  12. Data Validation
  13. String Cleaning
  14. Column Renaming
  15. Data Filtering
  16. Boolean Indexing
  17. Query Method
  18. Data Transformation
  19. Feature Scaling
  20. Normalization

Visualization Basics (76-100)

  1. Data Visualization
  2. Matplotlib Library
  3. Figure
  4. Axes
  5. Plot Function
  6. Line Plot
  7. Scatter Plot
  8. Bar Chart
  9. Histogram
  10. Box Plot
  11. Pie Chart
  12. Subplot
  13. Figure Size
  14. Title
  15. Axis Labels
  16. Legend
  17. Color
  18. Markers
  19. Line Styles
  20. Grid
  21. Annotations
  22. Save Figure
  23. Plot Customization
  24. Seaborn Library
  25. Statistical Plots

Statistics Foundations (101-130)

  1. Descriptive Statistics
  2. Mean
  3. Median
  4. Mode
  5. Range
  6. Variance
  7. Standard Deviation
  8. Quartiles
  9. Percentiles
  10. Interquartile Range
  11. Skewness
  12. Kurtosis
  13. Distribution
  14. Normal Distribution
  15. Probability
  16. Random Variables
  17. Expected Value
  18. Sample
  19. Population
  20. Sampling
  21. Central Limit Theorem
  22. Confidence Interval
  23. Hypothesis Testing
  24. P-Value
  25. Statistical Significance
  26. Correlation
  27. Covariance
  28. Pearson Correlation
  29. Spearman Correlation
  30. Correlation Matrix

Linear Regression (131-155)

  1. Regression Analysis
  2. Linear Regression
  3. Simple Linear Regression
  4. Regression Line
  5. Slope
  6. Intercept
  7. Least Squares Method
  8. Residuals
  9. Sum of Squared Errors
  10. Ordinary Least Squares
  11. Regression Coefficients
  12. Coefficient Interpretation
  13. Prediction
  14. Fitted Values
  15. Regression Equation
  16. Line of Best Fit
  17. Assumptions of Regression
  18. Linearity Assumption
  19. Homoscedasticity
  20. Independence Assumption
  21. Normality of Residuals
  22. Scikit-learn Library
  23. LinearRegression Class
  24. Fit Method
  25. Predict Method

Model Evaluation (156-180)

  1. Model Performance
  2. Training Data
  3. Testing Data
  4. Train Test Split
  5. Validation Data
  6. R-Squared
  7. Adjusted R-Squared
  8. Mean Squared Error
  9. Root Mean Squared Error
  10. Mean Absolute Error
  11. Residual Analysis
  12. Residual Plot
  13. Overfitting
  14. Underfitting
  15. Bias
  16. Variance
  17. Bias-Variance Tradeoff
  18. Model Complexity
  19. Cross-Validation
  20. K-Fold Cross-Validation
  21. Leave One Out CV
  22. Holdout Method
  23. Model Selection
  24. Hyperparameters
  25. Model Comparison

Multiple Regression (181-195)

  1. Multiple Linear Regression
  2. Multiple Predictors
  3. Multicollinearity
  4. Variance Inflation Factor
  5. Feature Selection
  6. Forward Selection
  7. Backward Elimination
  8. Stepwise Selection
  9. Categorical Variables
  10. Dummy Variables
  11. One-Hot Encoding
  12. Interaction Terms
  13. Polynomial Features
  14. Feature Engineering
  15. Feature Importance

NumPy (196-210)

  1. NumPy Library
  2. NumPy Array
  3. Array Creation
  4. Array Shape
  5. Array Indexing
  6. Array Slicing
  7. Broadcasting
  8. Vectorized Operations
  9. Element-wise Operations
  10. Matrix Operations
  11. Dot Product
  12. Matrix Multiplication
  13. Transpose
  14. Linear Algebra
  15. Computational Efficiency

Non-linear Models (211-225)

  1. Non-linear Regression
  2. Polynomial Regression
  3. Degree of Polynomial
  4. Curve Fitting
  5. Transformation
  6. Log Transformation
  7. Feature Transformation
  8. Model Flexibility
  9. Regularization
  10. Ridge Regression
  11. Lasso Regression
  12. Elastic Net
  13. Regularization Parameter
  14. Lambda Parameter
  15. Shrinkage

Machine Learning Intro (226-245)

  1. Machine Learning
  2. Supervised Learning
  3. Unsupervised Learning
  4. Classification
  5. Clustering
  6. Training Process
  7. Learning Algorithm
  8. Model Training
  9. Generalization
  10. Training Error
  11. Test Error
  12. Prediction Error
  13. Loss Function
  14. Cost Function
  15. Optimization
  16. Gradient Descent
  17. Learning Rate
  18. Convergence
  19. Local Minimum
  20. Global Minimum

Neural Networks (246-265)

  1. Neural Networks
  2. Artificial Neuron
  3. Perceptron
  4. Activation Function
  5. Sigmoid Function
  6. ReLU Function
  7. Input Layer
  8. Hidden Layer
  9. Output Layer
  10. Weights
  11. Biases
  12. Forward Propagation
  13. Backpropagation
  14. Deep Learning
  15. Network Architecture
  16. Epochs
  17. Batch Size
  18. Mini-batch
  19. Stochastic Gradient
  20. Vanishing Gradient

PyTorch (266-285)

  1. PyTorch Library
  2. Tensors
  3. Tensor Operations
  4. Autograd
  5. Automatic Differentiation
  6. Computational Graph
  7. Neural Network Module
  8. Sequential Model
  9. Linear Layer
  10. Loss Functions PyTorch
  11. Optimizer
  12. SGD Optimizer
  13. Adam Optimizer
  14. Training Loop
  15. Model Evaluation PyTorch
  16. GPU Computing
  17. CUDA
  18. Model Saving
  19. Model Loading
  20. Transfer Learning

Advanced Topics (286-295)

  1. Explainable AI
  2. Model Interpretability
  3. Feature Importance Analysis
  4. SHAP Values
  5. Model Documentation
  6. Reproducibility
  7. Random Seed
  8. Version Control
  9. Git
  10. Data Ethics

Projects and Applications (296-300)

  1. Capstone Project
  2. End-to-End Pipeline
  3. Model Deployment
  4. Results Communication
  5. Data-Driven Decisions