Skip to content

Linear Algebra

Attention Mechanism Step-by-Step

Attention Mechanism Step-by-Step

Run the Attention Mechanism Visualizer Fullscreen

Edit the MicroSim with the p5.js editor

About This MicroSim

This step-by-step visualization demonstrates how the attention mechanism works in transformers. Walk through each stage of the computation:

Input: Token embeddings as vectors
Project Q,K,V: Linear projections create Query, Key, Value matrices
Compute Scores: Query-Key dot products measure compatibility
Softmax: Normalize scores to attention weights (probabilities)
Weighted Sum: Combine Value vectors using attention weights

How to Use

Step Slider: Move through the 5 stages of attention computation
Query Position: Select which token's attention to visualize
Observe: Watch how attention weights determine which positions to focus on

Key Concepts

The Attention Formula

\[\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V\]

Query-Key-Value Intuition

Query: "What am I looking for?"
Key: "What do I contain?"
Value: "What's my actual content?"

High query-key compatibility means that value contributes more to the output.

Softmax Normalization

Attention weights in each row sum to 1, creating a probability distribution over positions:

\[A_{ij} = \frac{\exp(S_{ij})}{\sum_k \exp(S_{ik})}\]

Lesson Plan

Learning Objectives:

Understand the role of Query, Key, and Value matrices
Trace the flow of information through attention computation
Interpret attention weights as a soft addressing mechanism

Activities:

Step through all 5 stages and describe what happens at each
Change the query position and observe how attention patterns change
Identify which tokens attend most strongly to each other

Assessment:

Why do we scale by √d_k in the score computation?
What does a uniform attention distribution (all weights equal) mean?
How would masking affect the attention computation?

References

Attention Is All You Need - Original transformer paper
Chapter 11: Generative AI and LLMs
The Illustrated Transformer