LoRA Low-Rank Adaptation Visualizer

About This MicroSim

This visualization demonstrates LoRA (Low-Rank Adaptation), a technique for efficiently fine-tuning large language models by training only a small number of additional parameters.

Instead of updating the full weight matrix \(W\), LoRA adds a low-rank decomposition:

\[W' = W + \Delta W = W + BA\]

where: - \(W\) is frozen (not trained) - \(B \in \mathbb{R}^{d \times r}\) and \(A \in \mathbb{R}^{r \times k}\) are trainable - \(r \ll \min(d, k)\) is the rank (typically 4, 8, or 16)

How to Use

Rank Slider: Adjust the LoRA rank (1-16) to see parameter savings
Dimension Slider: Change matrix size to see scaling behavior
Observe: Watch how parameter count changes with rank

Key Insights

Parameter Efficiency

For a \(d \times k\) weight matrix:

Method	Parameters
Full fine-tuning	\(d \times k\)
LoRA (rank \(r\))	\(r(d + k)\)

Example: With \(d = k = 4096\) and \(r = 8\): - Full: 16.7M parameters per matrix - LoRA: 65K parameters (0.4%)

Why Low-Rank Works

Research suggests that model adaptation often lies in a low-dimensional subspace. The "intrinsic dimension" of fine-tuning is much smaller than the total parameter count.

LoRA Benefits

Memory efficient: Train only 0.1-1% of original parameters
No inference latency: Can merge \(W' = W + BA\) after training
Modular: Swap different LoRA adapters for different tasks
Stable: Original model weights remain frozen

Lesson Plan

Learning Objectives:

Understand why low-rank approximations enable efficient fine-tuning
Calculate parameter savings for different rank values
Explain the LoRA forward pass: \(h = Wx + BAx\)

Activities:

Find the rank that achieves 99% parameter savings for a 1024×1024 matrix
Compare parameter counts for different model sizes
Discuss when LoRA might not work well (what if task requires full-rank updates?)

Assessment:

Why is \(B\) initialized to zeros and \(A\) to small random values?
How does LoRA compare to other efficient fine-tuning methods?
What's the computational overhead during training vs inference?

References

LoRA: Low-Rank Adaptation of Large Language Models - Original paper
Chapter 11: Generative AI and LLMs
Hugging Face PEFT Library