Quiz: Transcriptomics and Gene Regulatory Networks
Test your understanding of RNA-seq analysis, differential expression, co-expression networks, regulatory network inference, and single-cell transcriptomics.
1. Why do DESeq2 and edgeR use the negative binomial distribution rather than a normal distribution for modeling RNA-seq counts?
- RNA-seq counts are continuous and symmetric, which the negative binomial handles better
- The negative binomial distribution is computationally faster than the normal distribution
- RNA-seq counts are discrete with mean-variance coupling, where higher-expressed genes have higher variance, which the negative binomial captures appropriately
- The normal distribution can only model data with exactly three replicates
Show Answer
The correct answer is C. RNA-seq read counts are discrete (not continuous), skewed, and exhibit mean-variance coupling where genes with higher expression show proportionally higher variance. The negative binomial distribution models this relationship with a gene-specific dispersion parameter. Both DESeq2 and edgeR use empirical Bayes shrinkage to stabilize dispersion estimates when replicates are few. Using a normal distribution or t-test would violate distributional assumptions and inflate false positive rates.
Concept Tested: Statistical Testing for DE
2. What is the false discovery rate (FDR) and why is it important in differential expression analysis?
- FDR is the probability that a single gene is a false positive; it replaces the p-value entirely
- FDR controls the expected proportion of false positives among all genes declared significant, accounting for the multiple testing burden of testing thousands of genes
- FDR measures the fold change threshold required for a gene to be considered significant
- FDR is only relevant when analyzing fewer than 100 genes
Show Answer
The correct answer is B. When testing thousands of genes simultaneously, using a standard p-value threshold (e.g., 0.05) would produce hundreds of false positives by chance alone. The false discovery rate controls the expected proportion of false discoveries among all rejected hypotheses. The Benjamini-Hochberg procedure adjusts p-values to produce FDR-corrected values (often called q-values or adjusted p-values). An FDR threshold of 0.05 means that no more than 5% of genes called significant are expected to be false positives.
Concept Tested: False Discovery Rate
3. What does WGCNA identify in transcriptomic data?
- Individual differentially expressed genes between two conditions
- Modules of highly co-expressed genes using weighted correlation networks, often corresponding to shared biological functions
- The three-dimensional structure of RNA molecules
- Single nucleotide polymorphisms in transcribed regions
Show Answer
The correct answer is B. WGCNA (Weighted Gene Co-expression Network Analysis) builds a co-expression network where genes are nodes and edge weights reflect pairwise expression correlation across samples. It then identifies modules — clusters of genes with highly correlated expression patterns — using hierarchical clustering. These modules often correspond to shared biological pathways or functions, and module eigengenes can be correlated with clinical traits to identify disease-associated gene programs.
Concept Tested: WGCNA and Co-Expression Network
4. How does single-cell RNA-seq (scRNA-seq) differ from bulk RNA-seq?
- scRNA-seq measures only non-coding RNAs while bulk RNA-seq measures mRNA
- scRNA-seq profiles gene expression in individual cells, revealing cellular heterogeneity that bulk RNA-seq masks by averaging across thousands of cells
- scRNA-seq requires a reference genome while bulk RNA-seq does not
- scRNA-seq produces longer reads than bulk RNA-seq
Show Answer
The correct answer is B. Bulk RNA-seq measures the average expression across thousands or millions of cells in a sample, masking cell-to-cell variation. Single-cell RNA-seq profiles transcriptomes of individual cells, enabling discovery of rare cell types, mapping of cell state transitions, and construction of cell-type-specific regulatory networks. The trade-off is that scRNA-seq data is sparser (more zeros per gene due to dropout) and requires specialized computational methods for normalization and analysis.
Concept Tested: Single-Cell RNA-Seq
5. In a gene regulatory network, what do nodes and directed edges represent?
- Nodes represent metabolites and edges represent enzymatic reactions
- Nodes represent genes or transcription factors and directed edges represent regulatory relationships such as activation or repression
- Nodes represent chromosomes and edges represent physical linkage
- Nodes represent experimental samples and edges represent statistical correlations
Show Answer
The correct answer is B. In a gene regulatory network (GRN), nodes represent genes or transcription factors and directed edges represent regulatory relationships — indicating that one gene's product regulates the expression of another gene. Edges can represent activation (positive regulation) or repression (negative regulation). GRNs are inherently directed graphs because regulation flows from transcription factor to target gene. Inference methods like ARACNE and GENIE3 reconstruct these networks from expression data.
Concept Tested: Gene Regulatory Network and Graph Model for Regulation
6. What is the purpose of the read alignment step in an RNA-seq pipeline, and why must it be splice-aware?
- Read alignment converts RNA sequences to DNA sequences; splice awareness corrects sequencing errors
- Read alignment maps reads to a reference genome; splice-aware aligners handle reads that span exon-exon junctions where introns have been removed
- Read alignment removes duplicate reads; splice awareness identifies alternative promoters
- Read alignment counts the number of reads per gene; splice awareness normalizes for gene length
Show Answer
The correct answer is B. Read alignment maps trimmed RNA-seq reads back to the reference genome coordinates. Because RNA-seq reads come from mature mRNA where introns have been spliced out, many reads span exon-exon junctions and cannot be mapped contiguously to the genome. Splice-aware aligners like STAR and HISAT2 can identify these junction-spanning reads and map them correctly across the intron gap, which is essential for accurate transcript quantification.
Concept Tested: Read Alignment
7. What is a log2 fold change of -2 in differential expression analysis?
- The gene is expressed at twice the level in the treatment group
- The gene is expressed at four times the level in the treatment group
- The gene is expressed at one-quarter the level in the treatment group compared to control
- The gene shows no change in expression between conditions
Show Answer
The correct answer is C. A log2 fold change of -2 means \(2^{-2} = 0.25\), indicating the gene is expressed at one-quarter (25%) of its control level in the treatment group — a four-fold down-regulation. Log2 fold change treats up- and down-regulation symmetrically: +1 means two-fold up, -1 means two-fold down, +2 means four-fold up, -2 means four-fold down. This symmetry makes it the standard metric displayed on volcano plots and MA plots.
Concept Tested: Fold Change
8. How does ARACNE infer gene regulatory relationships from expression data?
- By measuring physical binding between proteins using mass spectrometry
- By computing mutual information between gene expression profiles and applying the data processing inequality to remove indirect interactions
- By aligning gene sequences to find regulatory motifs
- By comparing gene expression between exactly two experimental conditions
Show Answer
The correct answer is B. ARACNE (Algorithm for the Reconstruction of Accurate Cellular Networks) computes mutual information between all pairs of genes based on their expression profiles across multiple samples. It then applies the data processing inequality (DPI) to distinguish direct regulatory interactions from indirect ones: if gene A regulates gene B, and B regulates C, the mutual information between A and C will be the weakest of the three pairs, and the A-C edge is pruned. This produces a sparser, more accurate network.
Concept Tested: ARACNE and Mutual Information
9. What is trajectory analysis in single-cell transcriptomics?
- Tracking the physical movement of cells under a microscope
- Ordering single cells along a pseudotime axis to reconstruct continuous biological processes like differentiation, revealing how gene expression changes progressively
- Aligning RNA-seq reads to a reference trajectory stored in a database
- Measuring the speed at which RNA polymerase transcribes genes
Show Answer
The correct answer is B. Trajectory analysis (also called pseudotime analysis) computationally orders individual cells along a continuous path representing a biological process such as cell differentiation, immune activation, or disease progression. Even though all cells are measured at a single time point, cells at different stages of the process are captured simultaneously. Algorithms like Monocle and Slingshot reconstruct the ordering, enabling analysis of how gene expression changes progressively along the trajectory.
Concept Tested: Trajectory Analysis
10. What is an operon, and in which organisms is it primarily found?
- An operon is a cluster of co-regulated genes transcribed as a single mRNA unit, found primarily in prokaryotes
- An operon is a type of non-coding RNA found in eukaryotic genomes
- An operon is a protein complex that regulates transcription in all organisms
- An operon is an alternative splicing pattern unique to plant genomes
Show Answer
The correct answer is A. An operon is a cluster of functionally related genes that share a single promoter and are transcribed together as one polycistronic mRNA molecule. Operons are a hallmark of prokaryotic gene regulation, enabling coordinated expression of genes in the same metabolic pathway. The classic example is the lac operon in E. coli, which coordinates expression of genes for lactose metabolism. Operons are rare in eukaryotes, where each gene typically has its own promoter.
Concept Tested: Operon