Scree Plot and Component Selection

About This MicroSim

This visualization teaches the critical skill of selecting the optimal number of principal components in PCA. Choosing the right number of components balances dimensionality reduction against information preservation.

The key decision in PCA is: How many components \(k\) should we keep?

\[\text{Keep } k \text{ components if } \frac{\sum_{i=1}^{k} \lambda_i}{\sum_{i=1}^{d} \lambda_i} \geq \text{threshold}\]

Key Features

Dual Panel View: Scree plot (left) and cumulative variance (right) side by side
Multiple Datasets: Compare different eigenvalue decay patterns
Three Selection Methods: Elbow method, variance threshold, and Kaiser criterion
Interactive Threshold: Drag to adjust target variance level
Reconstruction Error Display: Visualize information loss at different k values

Selection Methods

Method	Rule	Best For
Elbow Method	Find sharp bend in scree plot	Data with clear structure
Variance Threshold	Keep k to explain 95% variance	When error tolerance is known
Kaiser Criterion	Keep components with eigenvalue > 1	Standardized data

How to Use

Select a Dataset: Different patterns show when each method works best
Observe the Scree Plot: Look for the "elbow" where eigenvalues drop sharply
Check Cumulative Variance: See how much variance each component adds
Drag the Threshold Line: Interactively adjust your target variance
Compare Methods: The summary box shows suggested k from each method
Toggle Reconstruction Error: Visualize the trade-off between k and information loss

Understanding the Visualization

Left Panel: Scree Plot

Blue bars: Eigenvalues of selected components
Gray bars: Eigenvalues of discarded components
Orange circle: Detected elbow point
Red dashed line: Kaiser criterion (eigenvalue = 1)
Connecting line: Helps visualize the "elbow"

Right Panel: Cumulative Variance

Blue line: Running sum of variance explained
Blue shading: Variance captured by selected components
Green dashed line: Target variance threshold
Green dot: Point where threshold is achieved
Draggable handle: Adjust threshold interactively

Dataset Patterns

Synthetic (Elbow at k=3)

Clear elbow makes selection straightforward
All three methods roughly agree
Ideal case for the elbow method

Gradual Decay

No sharp elbow visible
Variance threshold method works better
Common in real-world data

Uniform

No clear structure in eigenvalues
All components roughly equal importance
Dimensionality reduction may not be appropriate

Two Groups

Clear separation between important and noise components
Strong agreement between methods
Common pattern in data with distinct signal and noise

Learning Objectives

After using this MicroSim, students will be able to:

Interpret scree plots to identify natural dimensionality
Apply the elbow method to select number of components
Set and justify variance thresholds for component selection
Understand Kaiser criterion and when to apply it
Recognize when dimensionality reduction is appropriate
Evaluate trade-offs between compression and reconstruction quality

Mathematical Background

Variance Explained

For eigenvalues \(\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_d\):

Individual variance: \(\frac{\lambda_i}{\sum_j \lambda_j}\)

Cumulative variance: \(\frac{\sum_{i=1}^{k} \lambda_i}{\sum_{j=1}^{d} \lambda_j}\)

Reconstruction Error

The Frobenius norm reconstruction error equals the sum of discarded eigenvalues:

\[\|X - X_k\|_F^2 = \sum_{i=k+1}^{d} \lambda_i\]

Kaiser Criterion

For standardized data (correlation matrix), keep components where:

\[\lambda_i \geq 1\]

This threshold represents average variance per original variable.

Selection Guidelines

Use Elbow Method When:

Scree plot shows clear bend
Data has distinct signal vs. noise components
You want a data-driven selection

Use Variance Threshold When:

You have a specific accuracy requirement
Domain knowledge suggests acceptable error level
Elbow is not clearly visible

Use Kaiser Criterion When:

Data is standardized (mean 0, variance 1)
Working with correlation matrix
You want components that explain more than one original variable

Common Patterns and Interpretations

Sharp Initial Drop

First few components capture most variance
Good candidate for significant dimensionality reduction
Clear separation between signal and noise

Gradual Decay (Exponential)

Information distributed across many components
May need more components to preserve structure
Consider domain-specific thresholds

Nearly Flat

No dominant directions in data
PCA may not be the right technique
Consider other approaches or full dimensionality

Practical Tips

Always visualize: Don't rely solely on numbers
Compare methods: Agreement suggests robust choice
Consider the task: Classification may need fewer components than reconstruction
Cross-validate: Test downstream performance at different k values
Be conservative: When uncertain, keep more components

Lesson Plan

Introduction (5 minutes)

Ask: "After computing PCA, how do we decide how many components to use?"
Introduce the bias-variance trade-off in choosing k

Demonstration (10 minutes)

Start with Synthetic dataset showing clear elbow
Walk through each selection method
Show how the summary box compares suggestions
Demonstrate dragging the threshold line

Exploration (10 minutes)

Have students:

Switch to Gradual Decay - note elbow is less clear
Find what k achieves 90% variance for each dataset
Identify which dataset is hardest to choose k for
Predict reconstruction quality at different k values

Assessment Questions

Why might different selection methods suggest different k values?
When would you choose a higher k than the elbow suggests?
What does it mean if eigenvalues are nearly uniform?
How does the variance threshold relate to reconstruction error?

Applications

Image Processing: Choose k for face recognition (eigenfaces)
Genomics: Select significant gene expression components
Finance: Identify market factors from many correlated assets
Signal Processing: Separate signal from noise components
Machine Learning: Feature selection for model training

References

Chapter 9: Dimensionality Reduction (Component Selection section)
Cattell, R.B. "The Scree Test for the Number of Factors"
Kaiser, H.F. "The Application of Electronic Computers to Factor Analysis"