Quiz: Biostatistics — Regression and Advanced Methods¶

Test your understanding of regression models, survival analysis, meta-analysis, and interrupted time series with these review questions.

1. In a linear regression model, the coefficient β₁ for a continuous predictor variable is interpreted as:¶

The probability that the outcome occurs for each one-unit increase in the predictor
The expected change in the outcome for a one-unit increase in the predictor, holding other variables constant
The odds of the outcome for individuals with the predictor present versus absent
The correlation coefficient between the predictor and the outcome

Show Answer

The correct answer is B. In linear regression, β₁ is the expected change in the continuous outcome variable for a one-unit increase in the predictor, with all other covariates held constant. This "all else equal" interpretation is what makes multiple regression useful for isolating the effect of one variable while controlling for confounders. Option A describes a predicted probability from logistic regression; option C describes an odds ratio.

Concept Tested: Linear Regression Coefficient Interpretation

2. Logistic regression is preferred over linear regression when the outcome variable is:¶

A continuous measurement like blood pressure
A count of events per unit time
A binary outcome such as disease/no disease
A time-to-event variable with censoring

Show Answer

The correct answer is C. Logistic regression is designed for binary (dichotomous) outcomes. It models the log-odds of the outcome as a linear function of predictors, producing odds ratios as coefficients. Using linear regression for a binary outcome can produce predicted probabilities outside [0,1] and violates the assumption of normally distributed residuals. Poisson regression (option B) handles count data; Cox regression (option D) handles time-to-event data.

Concept Tested: Logistic Regression Indications

3. The odds ratio from logistic regression approximates the relative risk most closely when:¶

The sample size is very large
The outcome is rare in the study population
All predictors are dichotomous
The model is adjusted for all measured confounders

Show Answer

The correct answer is B. When disease incidence is low — typically less than 10% — the OR approximates the RR because the denominator of the odds (1 − p) approaches 1.0. As outcome frequency increases, OR diverges from RR. For common outcomes, log-binomial or Poisson regression with robust variance can directly estimate the prevalence or risk ratio.

Concept Tested: OR Approximation of RR

4. In survival analysis, censoring occurs when:¶

An outcome event is observed before the planned study end date
A participant's follow-up ends before an event occurs and before study end
Participants who experience the outcome early are excluded from analysis
The Kaplan-Meier curve reaches a cumulative survival probability of zero

Show Answer

The correct answer is B. Censoring occurs when a participant's exact event time is unknown because they were lost to follow-up, withdrew, or the study ended before they experienced the outcome. Censored observations contribute the time they were observed to the analysis. The Kaplan-Meier estimator and Cox proportional hazards model handle censored observations appropriately — simply excluding them would introduce bias.

Concept Tested: Censoring in Survival Analysis

5. A forest plot in a meta-analysis displays:¶

The geographic distribution of studies included in the review
The effect estimate and confidence interval from each individual study plus the pooled estimate
The risk of bias assessment for each included study
The dose-response relationship between exposure and outcome across studies

Show Answer

The correct answer is B. A forest plot graphically displays each study's effect estimate and its confidence interval as a horizontal line with a box (sized by study weight), with a diamond at the bottom representing the pooled meta-analytic estimate. Visual inspection shows whether study results are consistent or heterogeneous. Risk-of-bias assessment (option C) is typically displayed in a separate table.

Concept Tested: Forest Plot in Meta-Analysis

6. I² in a meta-analysis quantifies:¶

The number of studies with statistically significant findings
The proportion of total variability in effect estimates due to heterogeneity rather than chance
The weighted average effect size across all included studies
The probability that all studies share a common true effect size

Show Answer

The correct answer is B. I² ranges from 0% to 100% and estimates the proportion of observed variability in study estimates attributable to true heterogeneity (differences in populations, interventions, outcomes) rather than sampling error. By convention: I² < 25% is low heterogeneity, 25–75% is moderate, and > 75% is high. High heterogeneity suggests a single pooled estimate may be misleading without subgroup analysis.

Concept Tested: Heterogeneity in Meta-Analysis (I²)

7. Interrupted time series (ITS) analysis is best suited to evaluate:¶

Randomized trials where participants cross over between treatment arms
The impact of a policy or population-level intervention using pre- and post-intervention trend data
Time-to-event outcomes where follow-up duration varies across participants
Seasonal variation in infectious disease incidence using spectral decomposition

Show Answer

The correct answer is B. ITS analysis uses pre-intervention time series data to model the expected trend, then tests whether the intervention produced a change in level (immediate step change) and/or slope (change in trend) after implementation. It is particularly useful for evaluating population-level policy changes — such as a tobacco tax or a vaccination program — where randomization is impossible. The design requires sufficient pre-intervention data points to establish a reliable baseline trend.

Concept Tested: Interrupted Time Series Design

8. In a Cox proportional hazards model, the hazard ratio for a binary predictor represents:¶

The probability that a participant experiences the event by a specified time
The ratio of the instantaneous event rate in the exposed group to that in the unexposed group at any given time
The difference in median survival time between exposure groups
The odds that an event occurs in the exposed group relative to the unexposed group

Show Answer

The correct answer is B. The hazard ratio (HR) in a Cox model is the ratio of the instantaneous event rate (hazard) in the exposed versus unexposed group, assumed to be constant over time (the proportional hazards assumption). An HR of 1.5 means the exposed group has a 50% higher instantaneous rate of the event at any given time point. Unlike the odds ratio, the HR is not equivalent to the risk ratio except under specific conditions.

Concept Tested: Cox Proportional Hazards Model

9. Missing data handled with multiple imputation produces less biased estimates than complete-case analysis primarily because:¶

Multiple imputation eliminates all uncertainty introduced by missing values
Multiple imputation uses the observed data to preserve the distribution and relationships among variables
Complete-case analysis requires a larger sample size than multiple imputation
Multiple imputation assumes data are missing completely at random, which is always satisfied

Show Answer

The correct answer is B. Multiple imputation creates several complete datasets by replacing missing values with plausible values drawn from the conditional distribution of the missing variable given observed data, then pools results across imputed datasets. This preserves the variability and covariance structure of the data. Complete-case analysis is unbiased only when data are missing completely at random (MCAR) — a strong assumption rarely met. Multiple imputation is valid under the weaker missing-at-random (MAR) assumption.

Concept Tested: Multiple Imputation vs. Complete-Case Analysis

10. Excess mortality during a pandemic is estimated by comparing:¶

Hospital death records to nursing home death records in the same period
Observed total deaths to modeled expected deaths based on historical trends
COVID-19 certified deaths to all-cause mortality in the pre-pandemic period
Age-standardized mortality rates between high-income and low-income countries

Show Answer

The correct answer is B. Excess mortality = observed all-cause deaths minus expected deaths (modeled from historical trend data, seasonality, and population changes). It captures deaths directly from the disease plus indirect deaths from healthcare disruption, and does not depend on accurate death certification. During COVID-19, excess mortality consistently exceeded confirmed COVID-19 death counts, revealing the full pandemic toll including misattributed and indirect deaths.

Concept Tested: Excess Mortality Estimation