Graph-Based Discovery Pipeline

Run the Graph-Based Discovery Pipeline MicroSim Fullscreen

About This MicroSim

This MicroSim visualizes the end-to-end pipeline for graph-based biological discovery as a directed acyclic graph (DAG). Each node represents a major stage, from raw data acquisition through graph construction, computational analysis, machine learning, visualization, interpretation, and communication of results.

Pipeline Stages

Data Acquisition — Retrieve biological data from databases (STRING, KEGG, UniProt, PDB)
Data Wrangling — Clean, normalize, and integrate heterogeneous data sources
Graph Construction — Build the biological network (nodes, edges, properties)
Graph Analysis — Compute network metrics (centrality, clustering, community detection)
Machine Learning — Apply graph embeddings, GNNs, or link prediction
Visualization — Create interactive network visualizations for exploration
Interpretation — Connect computational findings to biological hypotheses
Communication — Publish results, share data, write papers

How to Use

Click each pipeline stage to see its description, key tools, and example outputs
Follow the flow — Trace the path from data to discovery
Note dependencies — Some stages branch or merge, reflecting the non-linear nature of research

Iframe Embed Code

<iframe src="https://dmccreary.github.io/bioinformatics/sims/graph-discovery-pipeline/main.html"
        height="470"
        width="100%"
        scrolling="no"></iframe>

Lesson Plan

Grade Level

College introductory bioinformatics

Duration

15-20 minutes

Prerequisites

Familiarity with biological databases
Basic understanding of network analysis
Concept of computational pipelines

Activities

Exploration (5 min): Click each stage and list the key tools or methods used at that stage.
Pipeline Design (5 min): You want to discover new drug targets for Alzheimer's disease using a protein interaction network. For each pipeline stage, describe what specific actions you would take.
Discussion (5 min): Why is data wrangling often the most time-consuming stage? What challenges arise when integrating data from multiple sources?
Assessment (3 min): Answer the reflection questions below.

Assessment

List the eight stages of the graph-based discovery pipeline in order.
Why is graph construction placed after data wrangling rather than immediately after data acquisition?
How do machine learning methods (like graph neural networks) add value beyond traditional graph analysis metrics?
Why is visualization important for biological interpretation of network analysis results?