Regularization Geometry Visualizer
Run the Regularization Geometry Visualizer Fullscreen
Edit the MicroSim with the p5.js editor
About This MicroSim
This visualization demonstrates the geometric interpretation of regularization in machine learning. By showing how constraint regions (L1 diamond or L2 circle) intersect with loss function contours, students can understand why L1 regularization produces sparse solutions while L2 produces smooth shrinkage.
Learning Objective: Understand how L1 and L2 regularization constrain weights geometrically and why L1 produces sparsity.
How to Use
- Drag the OLS Point: Click and drag the red OLS (Ordinary Least Squares) solution point to different locations
- Adjust Regularization Strength: Use the alpha slider to change the constraint region size
- Toggle L1/L2: Click the buttons to switch between Lasso (L1) and Ridge (L2) regularization
- Show Regularization Path: Enable to see how the solution moves as alpha changes
- Animate: Watch the regularized solution evolve as constraint strength increases
Key Concepts Demonstrated
L2 Regularization (Ridge)
- Constraint region is a circle
- Solution is found where loss contour is tangent to circle
- All weights shrink proportionally toward zero
- Weights become small but rarely exactly zero
L1 Regularization (Lasso)
- Constraint region is a diamond (rotated square)
- Solution often hits corners of the diamond
- Corners lie on the axes where one or more weights equal zero
- This produces sparse solutions with exact zeros
Why L1 Produces Sparsity
The key insight is geometric:
- Elliptical loss contours typically first touch the diamond constraint at a corner
- Corners of the diamond correspond to sparse solutions (weights on axes)
- For L2, the smooth circle has no corners, so solutions rarely hit the axes exactly
Mathematical Interpretation
Constrained vs Penalized Form
The visualization shows the constrained form:
- L2: \(\|\theta\|_2 \leq t\) (inside circle)
- L1: \(\|\theta\|_1 \leq t\) (inside diamond)
This is equivalent to the penalized form used in practice, where \(\alpha\) controls the tradeoff.
The Regularization Path
As \(\alpha\) increases (constraint tightens):
- L2: Solution moves smoothly toward origin along a curved path
- L1: Solution moves toward axes, with weights becoming exactly zero
Embedding This MicroSim
1 2 3 4 5 | |
Lesson Plan
Grade Level
Undergraduate machine learning or statistics course
Duration
20-25 minutes
Prerequisites
- Understanding of least squares regression
- Familiarity with norms (L1 and L2)
- Basic optimization concepts
Learning Activities
- Exploration (5 min):
- Start with L2 regularization
- Drag the OLS solution to various positions
- Observe how the regularized solution moves along the circle boundary
-
Note that both weights shrink but neither becomes exactly zero
-
L1 Discovery (5 min):
- Switch to L1 regularization
- Observe that the constraint region is now a diamond
- Drag the OLS solution and watch the regularized solution
-
Notice when the solution "snaps" to an axis (sparse solution)
-
Comparative Analysis (5 min):
- Enable "Show Regularization Path"
- Click "Animate" to watch the full path
- Compare paths for L1 vs L2
-
Identify when L1 produces exact zeros
-
Critical Thinking (5 min):
- Why do the corners of the diamond lie on the axes?
- What OLS positions produce sparse solutions most easily?
-
How does ellipse orientation affect the result?
-
Application Discussion (5 min):
- When would you prefer L1 over L2?
- Feature selection implications
- Real-world examples (gene selection, image compression)
Discussion Questions
- Why is the L1 constraint region diamond-shaped?
- For what OLS solution positions will L1 definitely produce sparsity?
- How does the regularization path differ between L1 and L2?
- Why might L2 be preferred when all features are believed to be relevant?
- What happens when the OLS solution is already close to the origin?
Assessment Ideas
- Predict whether L1 or L2 will produce a sparse solution for a given OLS position
- Explain geometrically why L1 promotes sparsity
- Calculate the regularized solution for simple cases
- Compare elastic net (combination of L1 and L2) behavior
Connections to Machine Learning
Ridge Regression (L2)
- Used when multicollinearity is present
- Keeps all features with reduced magnitudes
- Closed-form solution exists
Lasso Regression (L1)
- Automatic feature selection
- Produces interpretable sparse models
- No closed-form solution (requires iterative methods)
Elastic Net
- Combines L1 and L2 penalties
- Benefits of both sparsity and grouped selection
- Useful when features are correlated
References
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer. Chapter 3.
- Visual explanation of regularization - Terence Parr
- Why L1 norm for sparse models - Cross Validated
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1), 267-288.