References: Bioinformatics Data Formats
-
FASTA Format - Wikipedia - Describes the widely used text-based format for representing nucleotide and protein sequences, including header conventions, multi-sequence files, and format variations.
-
FASTQ Format - Wikipedia - Explains the FASTQ format for storing biological sequences with per-base quality scores, covering Phred encoding schemes and its role in next-generation sequencing pipelines.
-
Variant Call Format - Wikipedia - Overview of the VCF specification for representing genomic variants including SNPs, indels, and structural variants, with header metadata and genotype fields explained.
-
Bioinformatics Data Skills - Vince Buffalo - O'Reilly Media - Practical guide to working with bioinformatics file formats using command-line tools, covering FASTA, FASTQ, BED, GFF, SAM/BAM parsing and manipulation.
-
Genomics in the Cloud - Geraldine Van der Auwera - O'Reilly Media - Covers bioinformatics data formats in the context of GATK best-practices pipelines, with detailed explanations of BAM, VCF, and interval file specifications.
-
SAM/BAM Format Specification - SAMtools/hts-specs - The official specification document for the Sequence Alignment/Map format, defining header lines, alignment fields, CIGAR strings, and optional tags.
-
GFF3 Specification - Sequence Ontology - Formal specification of the Generic Feature Format version 3 for genomic annotations, covering nine-column structure, parent-child relationships, and multi-feature encoding.
-
GenBank Flat File Format - NCBI - Annotated sample GenBank record explaining each field including LOCUS, DEFINITION, ACCESSION, FEATURES table, and ORIGIN sequence section for understanding database entries.
-
PDB File Format Documentation - wwPDB - Official documentation of the PDB coordinate file format version 3.3, covering ATOM records, HETATM entries, and structural metadata used in macromolecular structure files.
-
Samtools Documentation - HTSlib - Reference manual for samtools, the standard toolkit for manipulating SAM/BAM/CRAM files, including view, sort, index, and flagstat operations essential for sequencing workflows.