Momentum Dynamics Visualizer
Run the Momentum Visualizer Fullscreen
Edit the MicroSim with the p5.js editor
About This MicroSim
This visualization compares three optimization methods on an ill-conditioned quadratic function:
- SGD (blue): Standard gradient descent with characteristic zig-zag oscillation
- Momentum (green): Accumulates velocity to smooth the path
- Nesterov (orange): "Looks ahead" before computing the gradient
How to Use
- Adjust Momentum (β): Control velocity decay (0 = no momentum, 0.99 = high momentum)
- Adjust Learning Rate: Control step size
- Step/Run: Execute optimization steps manually or automatically
- Velocity Vectors: Toggle arrows showing accumulated velocity direction and magnitude
- Click: Click anywhere on the plot to set a new starting point
Key Observations
- SGD shows characteristic zig-zag pattern on elongated contours
- Momentum builds up speed in consistent directions, reducing oscillation
- Nesterov typically converges slightly faster than classical momentum
The Momentum Update
Classical momentum maintains a velocity vector:
\[\mathbf{v}_{k+1} = \beta \mathbf{v}_k + \nabla f(\mathbf{x}_k)$$
$$\mathbf{x}_{k+1} = \mathbf{x}_k - \alpha \mathbf{v}_{k+1}\]
The velocity accumulates in directions where the gradient is consistent, while damping oscillations.
Lesson Plan
Learning Objectives
- Understand how momentum accumulates and dampens oscillations
- Compare SGD, Momentum, and Nesterov acceleration
- Visualize velocity vectors during optimization
Suggested Activities
- No Momentum: Set β=0 and observe pure SGD behavior
- High Momentum: Set β=0.95 and watch smooth acceleration
- Compare Methods: Run all three simultaneously and count iterations to convergence
- Velocity Vectors: Enable velocity display to see how momentum builds
References
- Sutskever et al., On the importance of initialization and momentum in deep learning, 2013
- Wikipedia: Momentum (gradient descent)