Data Matrix Structure Visualizer
Run the Data Matrix Structure Visualizer Fullscreen
Edit the Data Matrix Structure Visualizer with the p5.js editor
You can include this MicroSim on your website using the following iframe:
1 | |
Description
This MicroSim helps students understand the fundamental structure of data matrices used in machine learning and data science. A data matrix organizes observations (samples) in rows and measurements (features) in columns, forming the foundation for most ML algorithms.
Key Features:
- Real Dataset Examples: Explore Iris (botanical measurements), MNIST (image pixels), and Housing (socioeconomic factors)
- Heat Map Visualization: Colors indicate relative values across the matrix
- Row Highlighting: Click any row to highlight it as a feature vector representing one sample
- Column Highlighting: Click column headers to see how one feature varies across all samples
- Dimension Annotations: Clear labels showing n (samples) and d (features)
- Cell Hover: View individual values and their context
Matrix Structure Concepts
Rows as Samples
Each row in a data matrix represents a single sample (also called an observation, instance, or data point). A row contains all feature values for one sample, forming a feature vector:
Columns as Features
Each column represents a single feature (also called an attribute, variable, or dimension). A column shows how one measurement varies across all samples.
Matrix Notation
The full data matrix \(\mathbf{X}\) with n samples and d features:
Example Datasets
Iris Dataset
- Samples: 150 flower specimens
- Features: 4 measurements (sepal length/width, petal length/width)
- Use case: Classification of flower species
MNIST Digit
- Samples: 1 handwritten digit image
- Features: 784 pixel intensities (28x28 grid)
- Use case: Image classification
Housing Dataset
- Samples: 506 Boston neighborhoods
- Features: 13 socioeconomic indicators
- Use case: Price prediction (regression)
Lesson Plan
Learning Objectives
After using this MicroSim, students will be able to:
- Identify rows as samples and columns as features in a data matrix
- Extract a feature vector for a specific sample
- Compare feature values across different samples
- Recognize the dimensions n (samples) and d (features)
Guided Exploration (5-7 minutes)
- Start with Iris: Observe the 4-feature structure representing flower measurements
- Click a Row: See how one flower's measurements form a feature vector
- Click a Column: Observe how sepal length varies across different flowers
- Switch to MNIST: Notice the dramatic increase in features (pixels as features)
- Explore Housing: See how socioeconomic features describe neighborhoods
Key Discussion Points
- Feature Selection: Not all columns contribute equally to predictions
- Sample Size: More rows generally improve model reliability
- Curse of Dimensionality: High d relative to n can cause problems
- Data Normalization: Features may have different scales
Assessment Questions
- If the Iris dataset has 150 samples and 4 features, what is the shape of the data matrix?
- In the MNIST dataset, why does a single image have 784 features?
- Which dimension (n or d) grows when you collect more data points?
- What does it mean to extract "row 3" from a data matrix?
References
- Chapter 2: Matrices and Matrix Operations - Matrix fundamentals
- Chapter 9: Machine Learning Foundations - Data representation in ML
- UCI Machine Learning Repository - Source for example datasets