Skip to content

References: Genome Assembly and Variation Graphs

  1. De Bruijn Graph - Wikipedia - Mathematical foundation of de Bruijn graphs and their application to genome assembly, explaining k-mer decomposition, Eulerian paths, and how sequencing reads are reconstructed into contigs.

  2. Genome Assembly - Wikipedia - Overview of genome assembly approaches including overlap-layout-consensus and de Bruijn graph methods, covering scaffolding, gap filling, and quality metrics like N50 and contig statistics.

  3. Pan-genome - Wikipedia - Describes the concept of a pan-genome representing all genetic variation within a species, including core and dispensable genomes, and graph-based pangenome reference structures.

  4. Genome Assembly and Annotation - Mark Sherlock - Springer - Practical guide to genome assembly workflows covering read preprocessing, assembly algorithms, scaffolding strategies, and quality assessment methods for next-generation sequencing data.

  5. Bioinformatics Algorithms: An Active Learning Approach (3rd Edition) - Phillip Compeau - Active Learning Publishers - Interactive textbook with detailed coverage of de Bruijn graph assembly, read error correction, and genome rearrangement algorithms with programming challenges.

  6. vg Toolkit Documentation - vg Team - Wiki documentation for the vg variation graph toolkit, covering graph construction, read mapping with GIRAFFE, variant calling, and pangenome reference graph operations.

  7. Human Pangenome Reference Consortium - HPRC - Resources from the consortium building a human pangenome reference, explaining why graph-based references better represent population genetic diversity than linear references.

  8. SPAdes Genome Assembler Manual - Center for Algorithmic Biotechnology - Documentation for the SPAdes assembler using de Bruijn graphs, covering assembly modes for various data types and parameter optimization strategies.

  9. Galaxy Training: Genome Assembly - Galaxy Project - Hands-on tutorials for genome assembly workflows in the Galaxy platform, covering quality control, assembly with various tools, and assembly evaluation metrics.

  10. GFA Format Specification - GFA-spec - Specification for the Graphical Fragment Assembly format used to represent assembly and variation graphs, defining segment, link, and path records for graph-based genome representations.