Newton vs Gradient Descent Comparison
Edit the MicroSim with the p5.js editor
About This MicroSim
This visualization compares the convergence behavior of Gradient Descent (blue) and Newton's Method (orange) on quadratic functions with varying condition numbers. The condition number controls how "elongated" the contour ellipses are.
How to Use
- Adjust Condition Number: Use the slider to change from 1 (circular contours) to 100 (highly elongated)
- Adjust Learning Rate: Control the step size for gradient descent
- Step/Run: Execute optimization steps manually or automatically
- Click: Click anywhere on the plot to set a new starting point
- Reset: Return to initial state
Key Observations
- Newton's Method converges in exactly 1 step for quadratic functions regardless of condition number
- Gradient Descent shows characteristic zig-zag oscillation on ill-conditioned problems
- Higher condition numbers require smaller learning rates for gradient descent stability
- Newton's method is condition-number invariant
Why Newton Converges Faster
Newton's method uses the inverse Hessian to rescale the gradient:
\[\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{H}^{-1} \nabla f(\mathbf{x}_k)\]
This effectively transforms the problem into one with circular contours, eliminating the zig-zag behavior of gradient descent on elongated landscapes.
Lesson Plan
Learning Objectives
- Compare convergence rates of first and second-order methods
- Understand the impact of condition number on optimization
- Visualize why Newton's method is condition-number invariant
Suggested Activities
- Low Condition Number: Set condition number to 1 and observe both methods
- High Condition Number: Increase to 50-100 and watch gradient descent struggle
- Learning Rate Tuning: Find the largest stable learning rate for each condition number
- Iteration Counting: Compare how many iterations each method needs
References
- Boyd & Vandenberghe, Convex Optimization, Chapter 9
- Wikipedia: Newton's Method in Optimization