Integrated Knowledge Graph Pipeline

Run the Integrated Knowledge Graph Pipeline MicroSim Fullscreen

About This MicroSim

This MicroSim shows the complete knowledge graph analysis pipeline — from data sources and KG construction through three machine learning branches (graph embeddings, graph neural networks, and link prediction) that converge on downstream applications like drug discovery and disease gene prediction.

Pipeline Structure

Inputs — Ontologies (GO, Disease Ontology) and databases (STRING, DrugBank, OMIM)
KG Construction — Build the integrated biomedical knowledge graph
Three ML Branches:
- Graph Embeddings (e.g., TransE, node2vec) — Learn low-dimensional vector representations of entities
- Graph Neural Networks (e.g., GCN, GAT) — Learn node/edge representations using message passing
- Link Prediction — Predict missing edges (new drug-target or gene-disease associations)
Applications — Drug repurposing, disease gene discovery, protein function prediction

How to Use

Click each pipeline node to see its description and key methods
Follow branches — Trace how KG construction feeds into three parallel ML approaches
Explore applications — See how each ML method contributes to biological discovery

Iframe Embed Code

<iframe src="https://dmccreary.github.io/bioinformatics/sims/kg-integrated-pipeline/main.html"
        height="550"
        width="100%"
        scrolling="no"></iframe>

Lesson Plan

Grade Level

College introductory bioinformatics

Duration

15-20 minutes

Prerequisites

Understanding of knowledge graphs
Basic concept of machine learning
Familiarity with graph-based data representations

Activities

Exploration (5 min): Click each node and read its description. Note which methods fall under each ML branch.
Method Comparison (5 min): Compare graph embeddings vs. GNNs. What are the key differences in how they learn from graph structure?
Application Design (5 min): You want to predict which existing drugs might treat a new disease. Which branch of the pipeline would you use? Trace the full path from data sources to prediction.
Assessment (3 min): Answer the reflection questions below.

Assessment

What are the three machine learning branches in this pipeline, and how do they differ?
Why is link prediction particularly useful for drug repurposing?
How do graph embeddings convert a knowledge graph into a format suitable for machine learning?
What role do ontologies play in the KG construction stage?