Genome Assembly Pipeline Overview
Run the Genome Assembly Pipeline Overview MicroSim Fullscreen
Edit in the p5.js Editor
About This MicroSim
This MicroSim presents the genome assembly pipeline as an animated, step-through flowchart. Students advance through each stage to understand how millions of short sequencing reads are transformed into a contiguous genome assembly.
Pipeline Stages
- Raw Reads — Millions of short DNA sequences (100-300 bp) from a sequencing machine
- QC Filter — Remove low-quality reads, trim adapters, filter contaminants
- K-mer Counting — Count all k-length subsequences to estimate genome size and detect errors
- De Bruijn Graph — Construct the assembly graph where k-mers define edges between (k-1)-mer nodes
- Contigs — Traverse the graph to produce contiguous sequences (contigs)
- Scaffolding — Use paired-end or mate-pair information to order and orient contigs
- Gap Filling — Fill gaps between scaffolded contigs using overlap information
- Final Assembly — The completed genome assembly in FASTA format
How to Use
- Step through — Advance through each pipeline stage to see what processing occurs
- Read descriptions — Each stage explains the computational methods and tools used
- Follow the data flow — Understand how raw reads are progressively transformed into a genome
Iframe Embed Code
1 2 3 4 | |
Lesson Plan
Grade Level
College introductory bioinformatics
Duration
15-20 minutes
Prerequisites
- Understanding of DNA sequencing technologies
- Basic concept of k-mers and sequence overlap
- Familiarity with the de Bruijn graph concept
Activities
- Exploration (5 min): Step through all stages. At each, note what the input is, what processing occurs, and what the output is.
- Error Handling (5 min): Sequencing errors create erroneous k-mers. At which pipeline stages are errors detected or corrected? How does k-mer counting help?
- Discussion (5 min): Repetitive sequences (transposons, tandem repeats) are the main challenge in genome assembly. At which stage(s) do repeats cause problems? How does scaffolding help resolve ambiguities?
- Assessment (5 min): Answer the reflection questions below.
Assessment
- Why is de Bruijn graph construction preferred over overlap-layout-consensus for short-read assembly?
- What is the difference between a contig and a scaffold?
- How does k-mer counting help estimate genome size before assembly?
- Why are long reads (PacBio, Oxford Nanopore) sometimes used alongside short reads to improve assemblies?