Convolution Operation

Run the Convolution Operation Fullscreen

About This MicroSim

This visualization shows exactly how convolution works in convolutional neural networks (CNNs). Watch as the kernel slides across the input image, performing element-wise multiplication and summing to produce each output pixel.

The Convolution Formula

\((\mathbf{I} * \mathbf{K})_{i,j} = \sum_{m}\sum_{n} I_{i+m, j+n} \cdot K_{m,n}\)

Output Size Formula

\(\text{output\_size} = \frac{\text{input\_size} - \text{kernel\_size} + 2 \times \text{padding}}{\text{stride}} + 1\)

Interactive Features

Kernel Selector: Try different kernels (edge detection, blur, sharpen, etc.)
Stride: Change how many pixels the kernel moves between positions
Padding: Valid (no padding) or Same (preserve dimensions)
Step: Advance one position at a time
Animate: Watch the full convolution process

Preset Kernels

Kernel	Effect	Pattern
Edge Detect	Highlights edges	Center +8, neighbors -1
Blur	Smooths image	All equal weights
Sharpen	Enhances details	Center +5, cross -1
Sobel X	Detects vertical edges	Gradient filter

Lesson Plan

Learning Objectives

Students will be able to:

Explain how convolution kernels extract features from images
Calculate output dimensions given input size, kernel size, stride, and padding
Describe the effect of different kernel patterns
Understand weight sharing in convolutional layers

Suggested Activities

Dimension Calculation: For each stride/padding combination, verify the output size formula
Kernel Exploration: Compare what edge detection vs blur kernels produce
Manual Computation: Pause and verify one output value by hand
Same vs Valid: When would you choose "same" padding over "valid"?

Discussion Questions

Why does convolution reduce spatial dimensions (with valid padding)?
How does a 3×3 kernel have only 9 parameters but process arbitrarily large images?
What real-world features might an edge detection kernel capture?
Why might we use stride > 1 instead of pooling for downsampling?

Mathematical Details

A 3×3 kernel with stride 1 and valid padding on a 7×7 input:

Output size: \((7 - 3 + 0) / 1 + 1 = 5×5\)
Parameters: 9 weights + 1 bias = 10

References

LeCun et al. (1998). Gradient-Based Learning Applied to Document Recognition
Krizhevsky et al. (2012). ImageNet Classification with Deep CNNs