Skip to content

Linear Algebra

Momentum Dynamics Visualizer

Momentum Dynamics Visualizer

Run the Momentum Visualizer Fullscreen

Edit the MicroSim with the p5.js editor

About This MicroSim

This visualization compares three optimization methods on an ill-conditioned quadratic function:

SGD (blue): Standard gradient descent with characteristic zig-zag oscillation
Momentum (green): Accumulates velocity to smooth the path
Nesterov (orange): "Looks ahead" before computing the gradient

How to Use

Adjust Momentum (β): Control velocity decay (0 = no momentum, 0.99 = high momentum)
Adjust Learning Rate: Control step size
Step/Run: Execute optimization steps manually or automatically
Velocity Vectors: Toggle arrows showing accumulated velocity direction and magnitude
Click: Click anywhere on the plot to set a new starting point

Key Observations

SGD shows characteristic zig-zag pattern on elongated contours
Momentum builds up speed in consistent directions, reducing oscillation
Nesterov typically converges slightly faster than classical momentum

The Momentum Update

Classical momentum maintains a velocity vector:

\[\mathbf{v}_{k+1} = \beta \mathbf{v}_k + \nabla f(\mathbf{x}_k)$$ $$\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha \mathbf{v}_{k+1}\]

The velocity accumulates in directions where the gradient is consistent, while damping oscillations.

Lesson Plan

Learning Objectives

Understand how momentum accumulates and dampens oscillations
Compare SGD, Momentum, and Nesterov acceleration
Visualize velocity vectors during optimization

Suggested Activities

No Momentum: Set β=0 and observe pure SGD behavior
High Momentum: Set β=0.95 and watch smooth acceleration
Compare Methods: Run all three simultaneously and count iterations to convergence
Velocity Vectors: Enable velocity display to see how momentum builds

References

Sutskever et al., On the importance of initialization and momentum in deep learning, 2013
Wikipedia: Momentum (gradient descent)