Variation Graph for a Genomic Region

Run the Variation Graph for a Genomic Region MicroSim Fullscreen
Edit in the p5.js Editor

About This MicroSim

This MicroSim visualizes a variation graph (VG) — a directed graph that represents a reference genome along with known genetic variants. Instead of a single linear reference sequence, a variation graph encodes multiple alleles as alternative paths through the graph.

Graph Structure

Reference path — The main path through the graph representing the reference genome sequence
SNP paths — Alternative nodes that branch and rejoin, encoding single nucleotide polymorphisms
Insertion paths — Extra nodes inserted between reference nodes, encoding insertions
Deletion paths — Edges that skip reference nodes, encoding deletions
Read paths — Toggleable alignments showing how sequencing reads traverse different paths

Why Variation Graphs?

Traditional linear reference genomes introduce reference bias — reads from non-reference alleles may fail to align or align incorrectly. Variation graphs solve this by:

Representing all known variants as first-class paths
Allowing reads to align to any path, reducing reference bias
Enabling genotyping by identifying which paths each read traverses
Tools: vg, GraphAligner, minigraph

How to Use

Examine the graph — Follow the reference path and identify where alternative paths branch off for SNPs, insertions, and deletions
Toggle read paths — Show or hide read alignments to see which paths individual reads traverse
Identify variants — Each branching point in the graph represents a known variant site

Suggested Exploration

Find the SNP site: one path has the reference allele, the other has the alternate allele. Which reads support each allele?
Find the insertion: extra nodes appear in one path but not the reference. How do reads align through this region?
Find the deletion: one path skips a reference node. What does this mean for reads from individuals with this deletion?

Iframe Embed Code

<iframe src="https://dmccreary.github.io/bioinformatics/sims/variation-graph/main.html"
        height="502"
        width="100%"
        scrolling="no"></iframe>

Lesson Plan

Grade Level

College introductory bioinformatics

Duration

15-20 minutes

Prerequisites

Understanding of reference genomes and genome variants (SNPs, indels)
Concept of read alignment to a reference
Basic graph theory (directed graphs, paths)

Activities

Exploration (5 min): Trace the reference path through the graph. Identify all variant sites (branching points). Classify each as SNP, insertion, or deletion.
Read Alignment (5 min): Toggle on read paths. For each variant site, count how many reads support the reference allele vs. the alternate allele. What genotype would you call?
Discussion (5 min): Why does a linear reference genome create bias against non-reference alleles? How do variation graphs solve this problem? Which populations benefit most from variation graphs?
Assessment (5 min): Answer the reflection questions below.

Assessment

What is reference bias in read alignment, and how do variation graphs reduce it?
How is a SNP represented in a variation graph compared to a VCF file?
Why are variation graphs particularly important for studying structurally diverse genomic regions (like the MHC locus)?
A read traverses the reference path at all variant sites. What genotype does this suggest?