De Bruijn Graph Construction
Run the De Bruijn Graph Construction MicroSim Fullscreen
Edit in the p5.js Editor
About This MicroSim
This MicroSim demonstrates how de Bruijn graphs are constructed from DNA sequences — the mathematical foundation behind modern genome assemblers like SPAdes, Velvet, and ABySS. Students enter a DNA sequence, choose a k-mer size, and watch the sequence decompose into overlapping k-mers that form a directed graph.
How It Works
- The input DNA sequence is split into overlapping k-mers (subsequences of length k)
- Each k-mer becomes a directed edge in the graph
- The nodes are (k-1)-mers — the prefix and suffix of each k-mer
- Nodes that share the same (k-1)-mer sequence are merged, creating the characteristic branching and merging structure of de Bruijn graphs
- Nodes are laid out in a circle for clarity
Why This Matters for Bioinformatics
De Bruijn graphs are the dominant data structure for short-read genome assembly:
- Sequencing machines produce millions of short reads (100-300 bp)
- These reads are broken into k-mers and assembled into a de Bruijn graph
- Walking through the graph (finding an Eulerian path) reconstructs the original genome
- Repeated sequences create cycles in the graph, which is why genome assembly is computationally challenging
How to Use
- Sequence input — Type or paste a DNA sequence (A, T, G, C). Invalid characters are stripped.
- k slider — Adjust the k-mer size. Smaller k produces fewer, more connected nodes; larger k produces more nodes with less ambiguity.
- Build button — Construct the de Bruijn graph from the current sequence and k value
- Reset button — Return to the default sequence
Suggested Experiments
- Start with the default sequence
ATGGCGTGCAand k=4. Count the nodes and edges. Are any nodes repeated? - Change k from 3 to 6 and observe how the graph structure changes — more nodes appear with less overlap
- Try a repetitive sequence like
ATGATGATGand see how repeats create cycles in the graph - Enter a longer sequence and notice how the graph becomes more complex — this illustrates the assembly challenge
Iframe Embed Code
You can add this MicroSim to any web page by adding this to your HTML:
1 2 3 4 | |
Lesson Plan
Grade Level
College introductory bioinformatics
Duration
15-20 minutes
Prerequisites
- Understanding of DNA sequences and k-mers
- Basic graph theory (nodes, directed edges)
- Concept of genome sequencing and short reads
Activities
- Exploration (5 min): Build the graph with the default sequence at k=4. List all the k-mers and verify that each one appears as an edge. Verify that nodes are (k-1)-mers.
- K-mer Size Effects (5 min): Keep the same sequence but change k from 3 to 5 to 7. How does the graph change? At what k value does the graph become a simple linear path with no branches?
- Repeat Sequences (5 min): Enter
ATGATGATGATG. Build at k=3, then k=4, then k=5. At which k value do you first see a cycle? What does this tell you about how repeats complicate genome assembly? - Assessment (5 min): Answer the reflection questions below.
Assessment
- In a de Bruijn graph, what do the nodes represent and what do the edges represent?
- Why does increasing k reduce the number of branches in the graph?
- A genome contains a 500-bp tandem repeat. If your k-mer size is 100, will the de Bruijn graph contain a cycle at this repeat? Why or why not?
- Why do genome assemblers use de Bruijn graphs instead of simply aligning all reads to each other?