Node2Vec Biased Random Walk

Run the Node2Vec Biased Random Walk MicroSim Fullscreen
Edit in the p5.js Editor

About This MicroSim

This MicroSim visualizes the node2vec biased random walk algorithm, a foundational technique for learning low-dimensional vector representations (embeddings) of nodes in a graph. Node2vec embeddings are widely used in bioinformatics for tasks such as protein function prediction, drug-target interaction discovery, and disease gene prioritization.

How Node2Vec Works

Node2vec generates node embeddings by first collecting sequences of nodes through biased random walks, then feeding those sequences into a word2vec-style model (Skip-gram). The key insight is that the walk's bias — controlled by two parameters p and q — determines what kind of structural information the embedding captures.

The p and q Parameters

At each step, the walker is at a current node and came from a previous node. Every neighbor of the current node is assigned an unnormalized transition weight based on its relationship to the previous node:

1/p (return weight) — applied to the previous node. Low p encourages the walker to backtrack, producing walks that stay near the starting node and capture local community structure.
1 (stay weight) — applied to neighbors that are also neighbors of the previous node. These are "BFS-like" moves that explore the local neighborhood.
1/q (outward weight) — applied to neighbors that are not connected to the previous node. Low q encourages the walker to move outward into unexplored territory, producing longer-range "DFS-like" walks that capture structural equivalence.

Intuition for Parameter Settings

Setting	Walk Behavior	Captures
Low p, high q	Stays close, revisits neighbors	Local community / homophily
High p, low q	Explores outward, avoids backtracking	Structural role / equivalence
p = 1, q = 1	Unbiased (DeepWalk equivalent)	Balanced mix

Why This Matters for Bioinformatics

In protein-protein interaction (PPI) networks, node2vec embeddings can:

Predict protein function — proteins with similar embeddings tend to share Gene Ontology annotations
Identify drug targets — by embedding drug-target-disease graphs and finding nodes close in embedding space
Detect disease modules — low-p walks capture tightly connected protein complexes that often correspond to disease pathways
Transfer annotations — structurally equivalent nodes (captured by low-q walks) often perform analogous roles even if they are distant in the network

How to Use

p slider — Controls the return parameter. Slide left (low p) to encourage backtracking; slide right (high p) to discourage it
q slider — Controls the in-out parameter. Slide left (low q) to encourage outward exploration (DFS-like); slide right (high q) to encourage local exploration (BFS-like)
Step — Advance the walk by one step. Green dashed lines show candidate neighbors with their transition weight labels (1/p, 1, or 1/q)
Auto — Toggle automatic stepping to watch the walk unfold
Reset — Return the walker to the starting node

Color Legend

Red — Current node (walker's position)
Green — Candidate neighbors for the next step
Orange — Previously visited nodes in the walk trail
Blue — Unvisited nodes

The side panel shows the full walk history, current parameter values, and the probability each candidate has of being selected next.

Try these experiments:

Set p = 0.25, q = 4 — watch the walker repeatedly revisit the same small cluster of nodes (BFS / community detection behavior)
Set p = 4, q = 0.25 — watch the walker traverse long paths across the graph without backtracking (DFS / structural role behavior)
Set p = 1, q = 1 — observe the unbiased random walk (equivalent to the original DeepWalk algorithm)
Run several walks with the same parameters and notice how the stochastic nature produces different paths each time

Iframe Embed Code

You can add this MicroSim to any web page by adding this to your HTML:

<iframe src="https://dmccreary.github.io/bioinformatics/sims/node2vec-random-walk/main.html"
        height="522"
        width="100%"
        scrolling="no"></iframe>

Lesson Plan

Grade Level

College introductory bioinformatics

Duration

15-20 minutes

Prerequisites

Basic understanding of graphs (nodes, edges, neighbors)
Familiarity with random walks on graphs
Introduction to the concept of node embeddings and why vector representations of biological entities are useful

Activities

Exploration (5 min): With default settings (p = 1, q = 1), click Step repeatedly and observe the walk path. Note which nodes get visited and how the walk spreads through the graph. This is the unbiased DeepWalk baseline.
BFS-like behavior (5 min): Set p = 0.25 and q = 4. Reset and step through 10-15 steps. Observe how the walker stays within a local neighborhood. Discuss why this behavior would capture protein complex membership in a PPI network.
DFS-like behavior (5 min): Set p = 4 and q = 0.25. Reset and step through 10-15 steps. Observe how the walker moves outward through the graph. Discuss why this behavior would capture structural roles (e.g., hub proteins vs. peripheral proteins).
Discussion (5 min): Compare the walk histories from activities 2 and 3. If two proteins always appear in the same BFS-like walks, what does that suggest about their relationship? What if they appear in similar DFS-like walks but are in different parts of the network?

Assessment

Explain in your own words what the p parameter controls and how it affects the random walk behavior.
A researcher wants to use node2vec to find proteins that belong to the same protein complex. Should they use low or high p? Low or high q? Explain your reasoning.
Why does node2vec use biased random walks instead of simple unbiased random walks? What information would be lost with unbiased walks?
Two proteins have very similar node2vec embeddings generated with low q. Are they necessarily close together in the PPI network? What can you infer about them?