Sequence Similarity Network
Run the Sequence Similarity Network MicroSim Fullscreen
About This MicroSim
This MicroSim visualizes a sequence similarity network (SSN) — a graph where proteins are nodes and edges connect proteins with significant sequence similarity as determined by BLAST. An E-value threshold slider controls which edges are displayed, letting students explore how stringency affects network structure.
Visual Encoding
- Node colors — Proteins are colored by protein family (e.g., kinases, proteases, transcription factors)
- Edges — Connect proteins with BLAST E-values below the current threshold
- E-value slider — Adjusts the significance threshold. Low E-values (strict) show only highly similar pairs; high E-values (permissive) reveal more distant relationships.
Why Sequence Similarity Networks?
SSNs are an alternative to phylogenetic trees for visualizing relationships among large protein families: - They scale to thousands of sequences (trees become unreadable at this scale) - They naturally reveal protein family boundaries as disconnected clusters - They can show relationships between distantly related families that share structural or functional features
How to Use
- E-value threshold slider — Adjust to control which BLAST hits are shown as edges
- Hover over nodes for protein names and family assignments
- Observe clusters — At strict thresholds, clusters correspond to protein families
- Relax the threshold — Watch how families merge as distant similarities become visible
Suggested Experiments
- Start with a very strict threshold (low E-value) — each protein family should appear as a separate cluster
- Gradually relax the threshold — watch when clusters begin to merge. The E-value at which two families merge indicates how distantly related they are
- At very permissive thresholds, most proteins may connect into a single giant component — this illustrates the "twilight zone" of sequence similarity
Iframe Embed Code
1 2 3 4 | |
Lesson Plan
Grade Level
College introductory bioinformatics
Duration
15-20 minutes
Prerequisites
- Understanding of BLAST and sequence similarity
- Concept of E-values as significance measures
- Familiarity with protein families
Activities
- Exploration (5 min): Find the strictest threshold where all same-colored proteins are still connected. This is the optimal threshold for separating families.
- Family Boundaries (5 min): Relax the threshold until two families merge. What E-value does this happen at? What does this tell you about the evolutionary relationship between these families?
- Discussion (5 min): How does an SSN compare to a phylogenetic tree for analyzing protein family relationships? What can SSNs show that trees cannot, and vice versa?
- Assessment (5 min): Answer the reflection questions below.
Assessment
- What does an edge in a sequence similarity network represent?
- How does the E-value threshold affect the number of clusters in the network?
- Why might two proteins from different families still be connected in an SSN?
- What advantage do SSNs have over phylogenetic trees for very large protein families?