Linear Regression Interactive Visualizer

Run the Linear Regression Visualizer Fullscreen

About This MicroSim

This visualization demonstrates linear regression as an optimization problem where we find the line that best fits the data by minimizing the sum of squared errors (residuals).

Key Features

Draggable Data Points: Click and drag any point to see how it affects the regression line
Manual Parameter Control: Adjust the slope (w) and intercept (b) sliders to explore the loss landscape
Fit OLS Button: Instantly compute the optimal parameters using Ordinary Least Squares
Residual Visualization: See the vertical distances from each point to the fitted line
Loss Surface Heatmap: Visualize the Mean Squared Error as a function of w and b
Real-time Statistics: View current and optimal parameters, loss values, and R-squared
Add Noise: Introduce random noise to see how it affects the fit

The Linear Regression Problem

Given data points \((x_i, y_i)\) for \(i = 1, \ldots, n\), we want to find the line:

\[\hat{y} = wx + b\]

that minimizes the sum of squared residuals:

\[L(w, b) = \sum_{i=1}^{n} (y_i - (wx_i + b))^2\]

The Normal Equations

The optimal parameters satisfy the normal equations:

\[(X^T X)\beta = X^T y\]

where:

\[X = \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix}, \quad y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}, \quad \beta = \begin{bmatrix} b \\ w \end{bmatrix}\]

Closed-Form Solution

The optimal slope and intercept are:

\[w^* = \frac{n\sum x_i y_i - \sum x_i \sum y_i}{n\sum x_i^2 - (\sum x_i)^2}\]

\[b^* = \bar{y} - w^*\bar{x}\]

How to Use

Exploring the Loss Surface

Move the sliders for w (slope) and b (intercept)
Watch the red dot move on the loss surface heatmap
Notice how the loss increases as you move away from the optimal point (green dot)
The bowl shape of the loss surface shows this is a convex optimization problem

Understanding Residuals

Enable Residuals checkbox to see vertical error lines
Each residual shows the distance from a data point to the fitted line
The small squares visualize the squared error being minimized
Notice that residuals are vertical (not perpendicular to the line)

Interactive Exploration

Drag data points to change the dataset
Watch the optimal line (green dashed) update instantly
Use Fit OLS to snap to the optimal solution
Compare your manual fit to the computed optimal

Visual Elements

Element	Color	Meaning
Data points	Blue	Observed data \((x_i, y_i)\)
Fitted line	Red	Current line \(\hat{y} = wx + b\)
Optimal line	Green dashed	OLS solution
Residuals	Green dashed	Vertical errors \(y_i - \hat{y}_i\)
Loss surface	Blue-White-Red	MSE as function of (w, b)
Current position	Red dot	Current (w, b) on loss surface
Optimal position	Green dot	Optimal \((w^, b^)\)

Learning Objectives

After using this MicroSim, students will be able to:

Explain why we minimize squared errors (not absolute errors)
Interpret the loss surface as a function of model parameters
Understand why the OLS solution is at the minimum of the loss surface
Derive and apply the normal equations
Calculate and interpret the R-squared goodness-of-fit measure
Recognize that vertical residuals differ from perpendicular distances

Key Insights

Why Squared Errors?

Differentiable: Allows calculus-based optimization
Penalizes large errors: Outliers have significant impact
Unique minimum: Convex loss surface guarantees global optimum
Statistical properties: MLE under Gaussian noise assumption

Why Vertical Residuals?

Linear regression minimizes errors in y (the dependent variable), not perpendicular distance to the line. This makes sense when:

x is known precisely (independent variable)
y has measurement error (dependent variable)

For errors in both variables, use Total Least Squares instead.

The R-Squared Statistic

\[R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}\]

\(R^2 = 1\): Perfect fit (all points on line)
\(R^2 = 0\): Model no better than predicting the mean
\(R^2 < 0\): Model is worse than the mean (usually indicates error)

Lesson Plan

Introduction (5 minutes)

Pose the question: "Given scattered data, how do we find the 'best' line through it?"

Exploration Phase (10 minutes)

Let students drag sliders to find their best fit manually
Discuss what makes a line "good" or "bad"
Reveal the loss surface - where is your solution?
Click "Fit OLS" to see the optimal solution

Mathematical Foundation (10 minutes)

Introduce the loss function \(L(w, b)\)
Show why the minimum is where partial derivatives equal zero
Derive the normal equations
Connect to matrix form \(X^T X \beta = X^T y\)

Interactive Experiments (10 minutes)

Drag an outlier: How much does one point affect the fit?
Add noise: How does noise affect R-squared?
Change slope: What happens to the loss surface?

Discussion Questions

Why do we square the residuals instead of using absolute values?
What would happen if we minimized perpendicular distance instead?
How would you extend this to multiple independent variables?
When might linear regression be inappropriate?

Connections to Linear Algebra

Linear regression connects to several key concepts:

Least Squares: Finding the best approximation when Ax = b has no solution
Projection: The fitted values \(\hat{y} = X\beta\) are the projection of y onto Col(X)
Normal Equations: \(X^T X \beta = X^T y\) ensures the residual is orthogonal to Col(X)
Pseudoinverse: \(\beta = (X^T X)^{-1} X^T y = X^+ y\)

References

Chapter 8: Vector Spaces and Subspaces - Least Squares section
Chapter 9: Solving Linear Systems - Normal equations
3Blue1Brown: Linear Regression