Activation Functions
Run the Activation Functions Fullscreen
Edit the MicroSim with the p5.js editor
About This MicroSim
This visualization compares common neural network activation functions, showing both their shape and derivative behavior—crucial for understanding gradient flow during backpropagation.
Activation Functions Included
| Function | Formula | Range | Key Property |
|---|---|---|---|
| ReLU | max(0, x) | [0, ∞) | Efficient, sparse |
| Sigmoid | 1/(1+e⁻ˣ) | (0, 1) | Probability output |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1, 1) | Zero-centered |
| Leaky ReLU | max(0.1x, x) | (-∞, ∞) | No dead neurons |
| Softplus | log(1+eˣ) | (0, ∞) | Smooth ReLU |
Interactive Features
- Function Selector: Choose which activation to examine
- Show Derivative: Toggle to display f'(x) as dashed line
- Compare All: Overlay all functions for comparison
- Input Slider: Trace along the curve to see exact values
- Info Panel: Shows f(x), f'(x), range, and gradient status
Visual Indicators
- Yellow regions: Low gradient areas (|f'(x)| < 0.1) where vanishing gradients occur
- Solid line: The activation function f(x)
- Dashed line: The derivative f'(x)
Lesson Plan
Learning Objectives
Students will be able to:
- Describe the shape and range of common activation functions
- Explain why nonlinear activations are necessary
- Identify regions where gradients vanish
- Choose appropriate activations for different use cases
Suggested Activities
- Gradient Exploration: Move the slider to x = -3 for sigmoid. What happens to f'(x)?
- Compare ReLU Family: Look at ReLU, Leaky ReLU, and Softplus side by side
- Saturation Investigation: Find where sigmoid and tanh have near-zero gradients
- Zero-Centered Discussion: Compare sigmoid (not zero-centered) with tanh (zero-centered)
Discussion Questions
- Why does ReLU dominate modern deep learning despite having a discontinuous derivative?
- What does "vanishing gradient" mean and why is it a problem?
- When would you choose sigmoid over tanh for an output layer?
- Why might Leaky ReLU be preferred over standard ReLU?
References
- Nair & Hinton (2010). Rectified Linear Units Improve Restricted Boltzmann Machines
- Glorot et al. (2011). Deep Sparse Rectifier Neural Networks