Concept Taxonomy
The following taxonomy organizes the 480 bioinformatics concepts into 14 categories.
Categories
1. Foundation Concepts (FOUND)
Core biological and molecular concepts that underpin all bioinformatics work: DNA, RNA, protein, genes, genomes, mutations, epigenetics, and the central dogma.
2. Biological Databases (DBAS)
Major public repositories for biological data including NCBI, UniProt, PDB, KEGG, Reactome, and specialized disease/ontology databases, plus programmatic access methods.
3. Data Formats (DFMT)
Standard file formats used in bioinformatics pipelines: FASTA, FASTQ, GenBank, GFF3, VCF, SAM/BAM, PDB, SBML, and data quality/conversion practices.
4. Graph Theory (GRTH)
Fundamental graph theory concepts: nodes, edges, graph types (directed, weighted, bipartite, LPG), properties (degree, centrality, clustering), traversal algorithms, and network models (scale-free, small-world).
5. Graph Databases (GRDB)
Graph database technologies and query languages: Neo4j, Memgraph, Cypher, GQL, SPARQL, schema design, data loading, optimization, and scalability.
6. Sequence Analysis (SEQA)
Sequence alignment methods and tools: pairwise/multiple alignment algorithms, BLAST, scoring matrices, HMMs, sequence similarity networks, and motif discovery.
7. Phylogenetics (PHYL)
Evolutionary analysis and phylogenetic methods: tree-building algorithms, bootstrap analysis, molecular clocks, phylogenetic networks, reticulate evolution, and comparative genomics.
8. Structural Bioinformatics (STRU)
Protein structure analysis: folding, prediction (homology modeling, AlphaFold), contact maps, residue networks, docking, drug-likeness, and molecular fingerprints.
9. Protein Interactions (PPIS)
Protein-protein interaction networks: experimental methods, PPI databases, network topology (hubs, bottlenecks, modules), dynamic networks, and network comparison.
10. Genomics (GENO)
Genome assembly and variation: de Bruijn graphs, pangenome graphs, variant calling, sequencing technologies, read mapping, and annotation.
11. Transcriptomics (TRNS)
Gene expression and regulation: RNA-seq pipelines, differential expression, regulatory networks (WGCNA, ARACNE), single-cell and spatial transcriptomics.
12. Pathways and Disease (PATH)
Metabolic pathways, signaling cascades, disease networks, drug repurposing, network medicine, cancer genomics, and precision medicine.
13. Knowledge Graphs (KNOW)
Biomedical knowledge graphs and ontologies: Gene Ontology, graph embeddings, link prediction, GNNs, text mining, and multi-omics data integration.
14. Tools and Capstone (TOOL)
Python tools (Biopython, NetworkX, Neo4j driver), visualization libraries, reproducibility practices, and capstone project topics.