Skip to content

Protein-Protein Interaction Networks

Summary

Covers experimental PPI detection methods (yeast two-hybrid, co-immunoprecipitation, mass spectrometry), interaction databases, network topology analysis including hubs and bottlenecks, dynamic networks, and graph models for protein interactions.

Concepts Covered

This chapter covers the following 30 concepts from the learning graph:

  1. Protein Interaction Network
  2. Interactome
  3. Yeast Two-Hybrid
  4. Co-Immunoprecipitation
  5. Affinity Purification MS
  6. Cross-Linking Mass Spec
  7. PPI Confidence Scoring
  8. Binary vs Complex PPIs
  9. Network Hubs
  10. Network Bottlenecks
  11. Network Modules
  12. Graph Model for PPIs
  13. Hub-and-Spoke Topology
  14. Date Hubs vs Party Hubs
  15. Essential Proteins
  16. Protein Complex Detection
  17. Clique Detection
  18. Dense Subgraph Mining
  19. Network Rewiring
  20. Dynamic PPI Networks
  21. Tissue-Specific PPIs
  22. Host-Pathogen PPIs
  23. Viral Interactome
  24. PPI Prediction Methods
  25. Interaction Domain Pairs
  26. Co-Evolution Analysis
  27. Network Alignment
  28. Network Comparison
  29. Graphlet Analysis
  30. Network Centrality in PPIs

Prerequisites

This chapter builds on concepts from:


Welcome, Explorers! Let's Connect the Dots!

Olli welcomes you Proteins rarely work alone. Inside every cell, thousands of proteins interact with one another — forming complexes, relaying signals, and assembling the molecular machinery that sustains life. In this chapter we will map those interactions as networks, apply graph algorithms to uncover the organizational principles hidden in the wiring, and see how network analysis can identify the proteins that matter most in health and disease. Let's connect the dots!

The Protein Interaction Landscape

A protein interaction network (PIN) represents the physical contacts between proteins in a cell as a graph \(G = (V, E)\), where each node \(v \in V\) is a protein and each edge \((u, v) \in E\) represents a physical interaction between proteins \(u\) and \(v\). These networks are typically undirected, because binding is a mutual event, though edge weights or confidence scores can encode the strength or reliability of each interaction.

The complete set of protein-protein interactions in an organism is called its interactome. The human interactome is estimated to contain roughly 20,000 proteins and between 130,000 and 650,000 binary interactions, though current experimental maps cover only a fraction of this space. Model organisms such as Saccharomyces cerevisiae (budding yeast) have the most thoroughly characterized interactomes, providing a reference framework for understanding the organizing principles of interaction networks.

Why study the interactome as a graph? Because network structure reveals biological function. Proteins that cluster together in the network often participate in the same biological process. Highly connected proteins tend to be essential for survival. Proteins that bridge different network neighborhoods often serve as regulatory control points. These are graph-theoretic insights — degree, clustering, betweenness — applied to the molecular biology of the cell.

Diagram: Protein Interaction Network Overview

View PPI Network Overview MicroSim Fullscreen

Diagram Details sim-id: ppi-network-overview Library: p5.js Status: Specified An interactive force-directed graph visualization of a sample protein interaction network. Nodes represent proteins, sized by degree. Hub proteins are highlighted in a distinct color. Students can hover over a node to see its name, degree, and functional annotation. A sidebar panel shows global network statistics: number of nodes, edges, average degree, clustering coefficient, and network diameter. Students can toggle between a full network view and a filtered view showing only high-confidence interactions.

Experimental Methods for Detecting PPIs

Mapping the interactome requires experimental methods that can detect which proteins physically bind to one another. These methods differ in throughput, sensitivity, and the type of interaction they detect. Understanding their strengths and limitations is essential for interpreting PPI data correctly.

Binary Interaction Methods

The yeast two-hybrid (Y2H) system is the workhorse of binary interaction detection. It exploits the modular nature of transcription factors, which consist of a DNA-binding domain (BD) and an activation domain (AD). In a Y2H assay, one protein of interest (the "bait") is fused to the BD, and a second protein (the "prey") is fused to the AD. If bait and prey physically interact inside a yeast cell, the BD and AD are brought together, reconstituting a functional transcription factor that activates a reporter gene. Growth on selective media or color change indicates an interaction.

Y2H detects binary PPIs — direct physical contacts between two proteins — making it the gold standard for identifying pairwise interactions. Large-scale Y2H screens have mapped thousands of interactions in yeast, fly, worm, and human. However, Y2H has notable limitations: it operates in the yeast nucleus (which may not reflect the native cellular compartment), it can miss interactions that require post-translational modifications absent in yeast, and it produces a significant false-positive rate estimated at 25-50% in early studies.

Co-immunoprecipitation (Co-IP) takes a complementary approach. An antibody against a target protein (the bait) is used to pull the bait and any proteins bound to it out of a cell lysate. The precipitated material is then analyzed — typically by western blot for targeted detection or by mass spectrometry for unbiased identification of all co-precipitated proteins. Co-IP detects interactions in the context of native protein complexes, preserving post-translational modifications and multi-protein assemblies.

Mass Spectrometry-Based Methods

Affinity purification coupled with mass spectrometry (AP-MS) scales up the Co-IP concept for systematic interactome mapping. A tagged bait protein is expressed in cells, purified along with its binding partners using affinity chromatography, and the co-purified proteins are identified by tandem mass spectrometry (LC-MS/MS). AP-MS excels at identifying the composition of complex PPIs — multi-protein assemblies in which the bait may contact some partners directly and others indirectly through bridging proteins.

The distinction between binary vs complex PPIs is important for network construction. Y2H identifies direct binary contacts between two proteins, producing edges that represent true physical binding. AP-MS identifies membership in a complex, but the exact pairwise contacts within the complex are ambiguous. This ambiguity creates a modeling choice: should a complex of proteins A, B, and C be represented as a clique (edges AB, AC, BC) or as a hub-and-spoke topology with the bait at the center (edges AB, AC only)?

Cross-linking mass spectrometry (XL-MS) resolves this ambiguity by providing distance constraints between specific residues. Bifunctional cross-linking reagents covalently connect lysine residues (or other reactive amino acids) that are within a defined distance threshold — typically 20-30 angstroms. After enzymatic digestion, cross-linked peptide pairs are identified by mass spectrometry. XL-MS not only confirms direct binary contacts within a complex but also provides structural information about the spatial arrangement of subunits.

Method Interaction Type Throughput Key Strength Key Limitation
Yeast two-hybrid Binary High (genome-scale) Detects direct pairwise contacts High false-positive rate; nuclear context only
Co-immunoprecipitation Complex Low-medium Native cellular conditions Cannot distinguish direct from indirect contacts
Affinity purification MS Complex High (systematic) Identifies complex composition Spoke vs matrix ambiguity
Cross-linking MS Binary (with distance) Medium Provides structural distance constraints Limited to cross-linkable residue pairs

PPI Confidence Scoring

Not all experimentally detected interactions are equally reliable. PPI confidence scoring systems integrate multiple lines of evidence to assign a probability that a reported interaction is a true positive. The STRING database, for example, combines evidence from experimental assays, computational prediction, text mining, and co-expression data into a combined score between 0 and 1. An interaction supported by multiple independent experiments, conserved across species, and connecting proteins with correlated expression patterns receives a high confidence score.

Confidence scores matter for network analysis because the topology of a PPI network changes dramatically depending on the score threshold. At low thresholds, the network is dense and noisy with many false positives. At high thresholds, the network is sparse and reliable but may miss real interactions. Investigators must choose a threshold that balances sensitivity and specificity for their analytical goals.

What's the Link?

Olli is thinking Think of confidence scoring as edge filtering on a weighted graph. Each edge has a weight representing the evidence for that interaction. By setting a threshold \(\tau\) and keeping only edges with weight \(\geq \tau\), you are pruning the graph. The choice of \(\tau\) affects every downstream analysis — degree distributions, hub identification, and module detection all change with the confidence cutoff. Always report which threshold you used!

The Graph Model for Protein Interactions

The formal graph model for PPIs represents the interactome as an undirected graph \(G = (V, E)\) with optional edge weights:

\[G = (V, E, w) \quad \text{where} \quad V = \{p_1, p_2, \ldots, p_n\}, \quad E \subseteq V \times V, \quad w: E \rightarrow \mathbb{R}^+\]

Each node \(p_i\) represents a protein, and an edge \((p_i, p_j)\) indicates that proteins \(p_i\) and \(p_j\) physically interact. The weight function \(w\) can encode confidence scores, binding affinities, or stoichiometric ratios. This simple yet powerful representation enables the application of the full toolkit of graph theory from Chapter 4 to the biology of protein interactions.

In practice, PPI networks exhibit several characteristic topological properties:

  • Scale-free degree distribution: The degree distribution follows an approximate power law \(P(k) \sim k^{-\gamma}\) with \(\gamma\) typically between 1.5 and 3, meaning most proteins have few interactions while a small number have very many.
  • Small-world property: The average shortest path length is small (typically 4-5 for the yeast interactome), and the clustering coefficient is high relative to a random graph of the same size.
  • Modularity: The network contains densely connected subgraphs (modules) that correspond to functional units such as protein complexes and signaling pathways.

Diagram: PPI Network Topology Explorer

View PPI Topology Explorer MicroSim Fullscreen

Diagram Details sim-id: ppi-topology-explorer Library: p5.js Status: Specified An interactive visualization that displays a PPI network and allows students to explore its topological properties. The left panel shows a force-directed layout of the network with nodes colored by degree. The right panel displays interactive charts: a degree distribution histogram (log-log scale), a clustering coefficient distribution, and a shortest path length distribution. Students can click on individual nodes to highlight their neighborhood and see local metrics. A confidence threshold slider dynamically filters edges and updates all statistics in real time.

Network Hubs, Bottlenecks, and Centrality

Network Hubs

Network hubs are proteins with significantly more interaction partners than the average protein in the network. Formally, a hub is a node whose degree \(k_i\) exceeds a threshold, often defined as one or two standard deviations above the mean degree \(\langle k \rangle\). In the yeast interactome, hub proteins interact with 15 or more partners, while the average protein interacts with only 4-6 others.

Hubs are biologically important because they occupy central positions in the network's wiring diagram. Studies in yeast have shown that hub proteins are more likely to be essential proteins — those whose deletion is lethal. This observation, sometimes called the centrality-lethality rule, was among the first demonstrations that network topology has direct biological consequences. Approximately 60% of hub proteins in the yeast interactome are essential, compared to only 20% of proteins with few interactions.

Date Hubs vs Party Hubs

Not all hubs are the same. Han and colleagues (2004) proposed a distinction between date hubs and party hubs based on the co-expression patterns of a hub and its interaction partners.

  • Party hubs interact with most of their partners simultaneously. Their interaction partners are co-expressed at the same time and place, suggesting that party hubs function as permanent scaffolds within stable protein complexes. Removal of a party hub disrupts its local complex but may have limited effects on the broader network.
  • Date hubs interact with different partners at different times or in different cellular contexts. Their partners are not co-expressed with one another, indicating that date hubs serve as dynamic connectors that link different functional modules. Removal of a date hub can fragment the network by severing connections between modules.

This distinction maps onto graph-theoretic concepts: party hubs have high local clustering coefficients (their partners also interact with each other), while date hubs have low clustering coefficients (their partners do not form a clique).

Network Bottlenecks and Centrality

Network bottlenecks are proteins with high betweenness centrality — they lie on many shortest paths between other proteins in the network. The betweenness centrality of a node \(v\) is defined as:

\[C_B(v) = \sum_{s \neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}}\]

where \(\sigma_{st}\) is the total number of shortest paths from node \(s\) to node \(t\), and \(\sigma_{st}(v)\) is the number of those paths passing through \(v\).

Network centrality in PPIs extends beyond betweenness to include closeness centrality, eigenvector centrality, and PageRank. Each measure captures a different aspect of a protein's importance:

Centrality Measure What It Captures Biological Interpretation
Degree centrality Number of direct partners Binding promiscuity; hub status
Betweenness centrality Control of information flow Bottleneck; regulatory gatekeeper
Closeness centrality Proximity to all other proteins Speed of signal propagation
Eigenvector centrality Connections to other well-connected nodes Influence in the network core

Bottlenecks are not always hubs. A protein can have modest degree but still sit at a critical junction between two densely connected modules, giving it high betweenness. Yu and colleagues (2007) showed that bottleneck proteins are essential more frequently than would be expected by their degree alone, suggesting that control of information flow is a distinct axis of biological importance.

Explorer's Tip

Olli has a tip When analyzing a PPI network, always compute multiple centrality measures rather than relying on degree alone. A protein with high betweenness but low degree may be an important regulatory switch that is easy to miss if you only look at hub status. Network analysis tools like NetworkX and Cytoscape can compute all standard centrality measures in a single pass.

Network Modules and Protein Complex Detection

Network Modules

Network modules are groups of proteins that are more densely connected to each other than to the rest of the network. Modules in PPI networks often correspond to functional units: protein complexes, signaling pathways, or metabolic subsystems. Module detection is one of the most important applications of graph algorithms to interactome analysis.

Protein Complex Detection

Protein complex detection algorithms identify densely connected subgraphs in the PPI network that are likely to represent physical protein complexes. Several graph-theoretic approaches have been developed specifically for this task.

Clique detection searches for complete subgraphs — sets of proteins in which every pair interacts. A clique of size \(k\) in the PPI network suggests a complex in which all \(k\) proteins are in direct physical contact. The Clique Percolation Method (CPM) finds overlapping communities by identifying adjacent \(k\)-cliques (cliques that share \(k-1\) nodes) and merging them into communities. This allows a single protein to belong to multiple complexes, reflecting biological reality.

However, strict clique requirements are often too stringent for noisy PPI data where some true interactions may be missing. Dense subgraph mining relaxes the clique requirement, searching instead for subgraphs whose internal edge density exceeds a threshold. The MCODE algorithm, widely used in Cytoscape, identifies dense regions by first weighting nodes by their local neighborhood density, then expanding clusters outward from seed nodes with the highest weights. The MCL (Markov Cluster) algorithm simulates random walks on the graph, exploiting the fact that a random walker tends to become trapped within densely connected modules.

Algorithm Approach Overlap Allowed Typical Use
Clique Percolation (CFinder) Adjacent \(k\)-cliques Yes Finding overlapping complexes
MCODE Seed-and-expand by density Limited Cytoscape-based complex detection
MCL Random walk / flow simulation No Large-scale module detection
ClusterONE Cohesiveness optimization Yes Balanced complex detection

Diagram: Module Detection in PPI Networks

View Module Detection MicroSim Fullscreen

Diagram Details sim-id: ppi-module-detection Library: p5.js Status: Specified An interactive visualization showing a PPI network with modules highlighted by color. Students can select different detection algorithms from a dropdown (MCODE, MCL, Clique Percolation) and see how the identified modules change. A side panel lists detected modules with their sizes, densities, and GO enrichment terms. Clicking on a module zooms in to show the internal structure and highlights inter-module connector proteins (bottlenecks) in a distinct color.

Dynamic and Context-Specific Networks

Dynamic PPI Networks

The interactome is not a static wiring diagram. Dynamic PPI networks change over time as proteins are synthesized, degraded, modified, and translocated between cellular compartments. An interaction can exist only when both partners are present in the same compartment at the same time. Modeling these dynamics requires integrating PPI data with expression data, localization data, and cell cycle information.

Network rewiring refers to changes in the interaction network in response to cellular signals, environmental stress, or disease. When a kinase phosphorylates a protein, it may create or destroy binding interfaces, adding or removing edges in the network. During the cell cycle, entire modules are assembled and disassembled as proteins are expressed and degraded in coordinated waves. This rewiring means that the "true" interactome at any given moment is a time-dependent subgraph of the complete reference interactome.

Tissue-Specific PPIs

Not all proteins are expressed in every tissue. Tissue-specific PPIs arise because an interaction can occur only in tissues where both partners are expressed. The liver interactome differs from the brain interactome because each tissue expresses a different subset of the proteome. Constructing tissue-specific PPI networks involves filtering the reference interactome by tissue-specific gene expression data from resources such as GTEx (Genotype-Tissue Expression) or the Human Protein Atlas.

Tissue-specific networks reveal that network topology varies across tissues. Some hub proteins maintain their centrality across many tissues, while others are hubs only in specific contexts. This variation has implications for understanding tissue-specific disease mechanisms — a mutation in a protein that is a hub in cardiac tissue but peripheral in liver may cause heart disease without affecting liver function.

What's the Link?

Olli is thinking Think of the reference interactome as a master graph, and each tissue-specific network as an induced subgraph — the subgraph formed by keeping only the nodes (proteins) expressed in that tissue and the edges between them. The same graph, filtered through different expression profiles, produces different topologies. This is why the same mutation can have different consequences in different tissues!

Host-Pathogen Interactions and the Viral Interactome

Host-Pathogen PPIs

Host-pathogen PPIs describe the physical interactions between proteins from a pathogen (virus, bacterium, or parasite) and proteins from the host organism. These interactions are the molecular basis of infection: pathogens hijack host cellular machinery by binding to host proteins, redirecting signaling pathways, suppressing immune responses, and commandeering the protein synthesis apparatus.

Mapping host-pathogen interactions creates a bipartite graph — one set of nodes represents pathogen proteins, the other represents host proteins, and edges cross between the two sets. Analysis of this bipartite structure reveals which host pathways are most frequently targeted. Pathogens consistently target host hub and bottleneck proteins, presumably because disrupting these central nodes has the greatest impact on host cell function.

Viral Interactome

The viral interactome refers specifically to the interactions among viral proteins themselves and between viral proteins and host proteins. Viruses encode relatively few proteins (HIV has only 15, SARS-CoV-2 has 29), so each viral protein must interact with multiple host targets to accomplish the viral life cycle. The SARS-CoV-2 interactome, mapped by Gordon and colleagues (2020), identified 332 high-confidence interactions between 26 viral proteins and human host proteins. This map immediately suggested repurposable drug targets — host proteins whose inhibition might disrupt viral replication without directly targeting the virus.

Computational Prediction of PPIs

PPI Prediction Methods

Experimental interactome mapping is expensive and incomplete. PPI prediction methods use computational approaches to predict interactions that have not yet been experimentally observed. These methods draw on sequence features, structural information, evolutionary signals, and network context.

Sequence-based prediction exploits the observation that interacting proteins often contain complementary interaction domain pairs — pairs of protein domains that have been observed to mediate interactions in known complexes. If protein A contains domain X and protein B contains domain Y, and the X-Y domain pair is known to mediate interactions in other protein pairs, then A and B are predicted to interact. The iPfam and 3did databases catalog known domain-domain interactions derived from co-crystal structures.

Co-Evolution Analysis

Co-evolution analysis predicts interactions based on the principle that interacting proteins experience correlated evolutionary pressures. If a mutation in one protein's binding interface must be compensated by a corresponding mutation in its partner's interface, the two proteins will show correlated evolutionary rates across species. Methods such as mirror tree analysis compare phylogenetic distance matrices for pairs of protein families; high correlation between the distance matrices suggests co-evolution and, by inference, physical interaction.

More sophisticated approaches use direct coupling analysis (DCA) to identify co-evolving residue pairs between protein families, analogous to the intra-protein contact prediction methods that contributed to AlphaFold's success (Chapter 8). Inter-protein co-evolution signatures can predict not only whether two proteins interact but which residues are at the binding interface.

Comparing Interactomes: Network Alignment and Graphlets

Network Alignment

Network alignment is the process of finding correspondences between nodes in two or more PPI networks from different species such that both sequence similarity and network topology are preserved. Global network alignment seeks a one-to-one mapping between all nodes in two networks that maximizes a combined score of sequence similarity and topological similarity. Local network alignment identifies conserved subnetworks — regions of similar topology that may correspond to conserved functional modules.

The alignment problem is computationally related to the graph isomorphism problem and is NP-hard in general. Practical algorithms such as IsoRank, GRAAL, and HubAlign use heuristics to find good alignments in polynomial time. Network alignment enables the transfer of functional annotations from well-studied organisms (such as yeast) to less-studied organisms based on conserved network neighborhoods.

Network Comparison

Network comparison encompasses methods for quantifying the similarity or difference between two PPI networks without requiring a node-to-node alignment. Comparison metrics include global statistics (degree distribution, diameter, clustering coefficient), spectral properties (eigenvalue distributions of the adjacency or Laplacian matrices), and subgraph frequencies.

Graphlet Analysis

Graphlet analysis provides a fine-grained approach to network comparison. Graphlets are small connected non-isomorphic subgraphs — there are 2 graphlets on 2 nodes, 4 on 3 nodes, 11 on 4 nodes, and 30 on 5 nodes. The graphlet degree distribution (GDD) counts, for each node, the number of times it participates in each graphlet in each topologically distinct position (called an orbit). The resulting graphlet degree vector (GDV) provides a 73-dimensional signature for each node (considering graphlets up to 5 nodes).

Comparing graphlet frequency distributions between two networks provides a sensitive measure of topological similarity. Przulj (2007) showed that PPI networks from different organisms have similar graphlet frequency distributions that differ systematically from random graph models (Erdos-Renyi, scale-free, or geometric random graphs), suggesting that evolution has selected for specific local network motifs. The graphlet correlation distance (GCD) provides a scalar summary of the topological distance between two networks.

\[\text{GCD}(G_1, G_2) = \sqrt{\sum_{i,j} \left( r_{ij}^{(G_1)} - r_{ij}^{(G_2)} \right)^2}\]

where \(r_{ij}^{(G)}\) is the Spearman correlation between orbits \(i\) and \(j\) in graph \(G\).

Diagram: Graphlet Orbits and Network Comparison

View Graphlet Analysis MicroSim Fullscreen

Diagram Details sim-id: graphlet-analysis Library: p5.js Status: Specified An interactive visualization showing all graphlets up to 5 nodes with their orbits labeled. Students can select a node in a sample PPI network to see its graphlet degree vector displayed as a bar chart. A comparison panel shows graphlet frequency distributions for two networks (e.g., yeast and human PPI networks) side by side, with a computed graphlet correlation distance. Students can toggle between different random graph models to see how their graphlet distributions differ from biological networks.

Watch Out!

Olli warns you Be careful not to over-interpret PPI network topology without accounting for detection biases. Well-studied proteins have more reported interactions simply because they have been tested more often — this "study bias" can create artificial hubs. Always check whether a protein's high degree reflects genuine biological centrality or merely reflects its popularity as a research target. The number of PubMed citations for a protein correlates with its degree in curated PPI databases.

Case Study: Cancer Driver Gene Identification via PPI Network Analysis

Network analysis of the interactome has become a powerful tool for identifying cancer driver genes — genes whose mutations causally contribute to tumor development. This case study illustrates how PPI network topology can distinguish true cancer drivers from the thousands of passenger mutations that accumulate in tumor genomes.

The challenge. A typical tumor genome carries 50-100 nonsynonymous mutations, but only a handful of these are true drivers. Frequency-based methods identify recurrently mutated genes across patient cohorts, but rare drivers that are mutated in only a few patients are missed. Network-based approaches complement frequency analysis by identifying genes that, while individually rare, cluster together in the PPI network near known cancer pathways.

The network approach. Researchers construct a PPI network and overlay somatic mutation data from cancer genome sequencing projects (such as The Cancer Genome Atlas, TCGA). Proteins encoded by mutated genes are marked on the network. Network propagation algorithms — which spread mutation signals across network edges using random walks or heat diffusion — amplify weak signals from rare mutations by aggregating evidence from network neighbors. A gene that is not frequently mutated itself but whose network neighbors are frequently mutated receives a high propagation score.

The heat diffusion model propagates mutation scores according to:

\[\mathbf{f}(t) = e^{-\beta L t} \mathbf{f}(0)\]

where \(\mathbf{f}(0)\) is the initial mutation score vector, \(L\) is the graph Laplacian, \(\beta\) is a diffusion rate parameter, and \(t\) is the diffusion time. After propagation, proteins with high scores in \(\mathbf{f}(t)\) are candidate cancer drivers even if they were not frequently mutated in the original data.

Results. Network-based methods such as HotNet2 and NetSig have identified novel cancer driver genes that were missed by frequency-based approaches alone, including genes in chromatin remodeling complexes and RNA splicing machinery. The method works because cancer driver mutations tend to cluster in specific network modules corresponding to hallmark cancer pathways: cell cycle regulation, DNA damage response, PI3K-AKT signaling, and apoptosis. Module-based analysis also reveals which combinations of pathway disruptions co-occur within individual tumors, informing precision oncology strategies.

This case study demonstrates the practical value of the network concepts covered in this chapter. Hub and bottleneck analysis identifies the most critical nodes in cancer signaling networks. Module detection reveals the pathway-level organization of driver mutations. Network propagation leverages the graph structure to amplify weak signals. And confidence scoring ensures that the underlying PPI data is reliable enough to support clinical conclusions.

Key Takeaways

  1. Protein interaction networks represent physical protein-protein interactions as graphs, where nodes are proteins and edges are interactions. The complete set of interactions in an organism is its interactome.

  2. Experimental methods for detecting PPIs include yeast two-hybrid (binary interactions), co-immunoprecipitation, affinity purification mass spectrometry (complex membership), and cross-linking mass spectrometry (distance-constrained binary contacts). Each method has distinct strengths and biases.

  3. Confidence scoring integrates multiple evidence types to assign reliability scores to interactions. The choice of confidence threshold critically affects network topology and all downstream analyses.

  4. Network hubs (high-degree nodes) are more likely to be essential proteins. Date hubs connect different functional modules dynamically, while party hubs serve as stable scaffolds within complexes.

  5. Network bottlenecks (high betweenness centrality) control information flow between network modules and are essential independent of their degree. Multiple centrality measures capture different aspects of a protein's network importance.

  6. Module detection algorithms — including clique percolation, MCODE, and MCL — identify densely connected subgraphs that correspond to protein complexes and functional pathways.

  7. The interactome is dynamic: interactions change with cell cycle stage, tissue type, and environmental conditions. Tissue-specific networks are induced subgraphs of the reference interactome filtered by expression data.

  8. Host-pathogen PPIs reveal how pathogens hijack host hub and bottleneck proteins. The viral interactome maps interactions between a small number of viral proteins and hundreds of host targets.

  9. PPI prediction methods use interaction domain pairs, co-evolution analysis, and machine learning to predict interactions not yet experimentally detected.

  10. Network alignment, comparison, and graphlet analysis enable cross-species comparisons of interactome topology, revealing evolutionarily conserved network motifs and functional modules.

Great Work, Explorers!

Olli celebrates You have navigated the rich landscape of protein interaction networks — from the experimental benchtop to the computational graph. You now understand how to build, filter, analyze, and compare interactomes using the graph-theoretic tools from Chapter 4. In the next chapter, we will shift our focus to genome assembly and variation graphs, where graph structures capture the diversity of DNA sequences across populations. Onward, explorers!

See Annotated References