Regularization Geometry Visualizer

Run the Regularization Geometry Visualizer Fullscreen

About This MicroSim

This visualization demonstrates the geometric interpretation of regularization in machine learning. By showing how constraint regions (L1 diamond or L2 circle) intersect with loss function contours, students can understand why L1 regularization produces sparse solutions while L2 produces smooth shrinkage.

Learning Objective: Understand how L1 and L2 regularization constrain weights geometrically and why L1 produces sparsity.

How to Use

Drag the OLS Point: Click and drag the red OLS (Ordinary Least Squares) solution point to different locations
Adjust Regularization Strength: Use the alpha slider to change the constraint region size
Toggle L1/L2: Click the buttons to switch between Lasso (L1) and Ridge (L2) regularization
Show Regularization Path: Enable to see how the solution moves as alpha changes
Animate: Watch the regularized solution evolve as constraint strength increases

Key Concepts Demonstrated

L2 Regularization (Ridge)

\[\min_\theta \|y - X\theta\|_2^2 + \alpha \|\theta\|_2^2\]

Constraint region is a circle
Solution is found where loss contour is tangent to circle
All weights shrink proportionally toward zero
Weights become small but rarely exactly zero

L1 Regularization (Lasso)

\[\min_\theta \|y - X\theta\|_2^2 + \alpha \|\theta\|_1\]

Constraint region is a diamond (rotated square)
Solution often hits corners of the diamond
Corners lie on the axes where one or more weights equal zero
This produces sparse solutions with exact zeros

Why L1 Produces Sparsity

The key insight is geometric:

Elliptical loss contours typically first touch the diamond constraint at a corner
Corners of the diamond correspond to sparse solutions (weights on axes)
For L2, the smooth circle has no corners, so solutions rarely hit the axes exactly

Mathematical Interpretation

Constrained vs Penalized Form

The visualization shows the constrained form:

L2: \(\|\theta\|_2 \leq t\) (inside circle)
L1: \(\|\theta\|_1 \leq t\) (inside diamond)

This is equivalent to the penalized form used in practice, where \(\alpha\) controls the tradeoff.

The Regularization Path

As \(\alpha\) increases (constraint tightens):

L2: Solution moves smoothly toward origin along a curved path
L1: Solution moves toward axes, with weights becoming exactly zero

Embedding This MicroSim

<iframe src="https://dmccreary.github.io/linear-algebra/sims/regularization-geometry/main.html"
        height="532px"
        width="100%"
        scrolling="no">
</iframe>

Lesson Plan

Grade Level

Undergraduate machine learning or statistics course

Duration

20-25 minutes

Prerequisites

Understanding of least squares regression
Familiarity with norms (L1 and L2)
Basic optimization concepts

Learning Activities

Exploration (5 min):
Start with L2 regularization
Drag the OLS solution to various positions
Observe how the regularized solution moves along the circle boundary
Note that both weights shrink but neither becomes exactly zero
L1 Discovery (5 min):
Switch to L1 regularization
Observe that the constraint region is now a diamond
Drag the OLS solution and watch the regularized solution
Notice when the solution "snaps" to an axis (sparse solution)
Comparative Analysis (5 min):
Enable "Show Regularization Path"
Click "Animate" to watch the full path
Compare paths for L1 vs L2
Identify when L1 produces exact zeros
Critical Thinking (5 min):
Why do the corners of the diamond lie on the axes?
What OLS positions produce sparse solutions most easily?
How does ellipse orientation affect the result?
Application Discussion (5 min):
When would you prefer L1 over L2?
Feature selection implications
Real-world examples (gene selection, image compression)

Discussion Questions

Why is the L1 constraint region diamond-shaped?
For what OLS solution positions will L1 definitely produce sparsity?
How does the regularization path differ between L1 and L2?
Why might L2 be preferred when all features are believed to be relevant?
What happens when the OLS solution is already close to the origin?

Assessment Ideas

Predict whether L1 or L2 will produce a sparse solution for a given OLS position
Explain geometrically why L1 promotes sparsity
Calculate the regularized solution for simple cases
Compare elastic net (combination of L1 and L2) behavior

Connections to Machine Learning

Ridge Regression (L2)

Used when multicollinearity is present
Keeps all features with reduced magnitudes
Closed-form solution exists

Lasso Regression (L1)

Automatic feature selection
Produces interpretable sparse models
No closed-form solution (requires iterative methods)

Elastic Net

Combines L1 and L2 penalties
Benefits of both sparsity and grouped selection
Useful when features are correlated

References

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer. Chapter 3.
Visual explanation of regularization - Terence Parr
Why L1 norm for sparse models - Cross Validated
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, 58(1), 267-288.