Inference for Means

Summary

This chapter extends inference procedures to population means using t-distributions. Students will learn about the t-distribution and its properties, construct confidence intervals and perform hypothesis tests for one-sample and two-sample means, and understand when to use paired t-procedures. The robustness of t-procedures and their conditions are emphasized.

Concepts Covered

This chapter covers the following 18 concepts from the learning graph:

T-Distribution
T vs Z Distribution
Degrees of Freedom
T Critical Values
One-Sample T-Interval
Conditions for T-Procedures
One-Sample T-Test
Two-Sample T-Interval
Two-Sample T-Test
Pooled vs Unpooled
Paired T-Test
Paired Data
When to Pair
Robustness
Regression Model
Slope Parameter
Standard Error of Slope

Prerequisites

This chapter builds on concepts from:

From Proportions to Means: A New Challenge

So far, you've mastered inference for proportions—estimating and testing claims about what fraction of a population has some characteristic. But what about quantitative data? What if we want to estimate the average height of students, compare mean test scores between two groups, or test whether a new teaching method improves learning?

Welcome to inference for means! This chapter opens up a whole new world of statistical analysis, one that handles measurements, amounts, and continuous data. The good news? The logical framework you learned for proportions still applies—we'll still construct confidence intervals and perform hypothesis tests. The twist? We need a new distribution to work with.

"Acorn for your thoughts?" Sylvia tilts her head thoughtfully. "When I wanted to know if south-side oaks produced more acorns, I wasn't just counting successes and failures—I was measuring actual quantities! How many acorns per tree? What's the average weight? That's quantitative data, and it needs special treatment. Don't worry—we've got just the tool for the job!"

By the end of this chapter, you'll be able to:

Understand why we need the t-distribution for inference about means
Calculate degrees of freedom and find t critical values
Construct and interpret confidence intervals for one mean and the difference of two means
Perform hypothesis tests for means using one-sample and two-sample t-tests
Recognize when paired data requires special treatment
Evaluate the robustness of t-procedures when conditions aren't perfectly met

Why Not Use the Z-Distribution?

When we did inference for proportions, we used the normal (Z) distribution. This worked because the sampling distribution of \( \hat{p} \) is approximately normal for large samples, and we knew (or could estimate) the population proportion to calculate the standard error.

But here's the problem with means: to calculate the standard error of \( \bar{x} \), we'd need to know the population standard deviation \( \sigma \). And we almost never know \( \sigma \)!

The standard error formula for sample means is:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

Since we don't know \( \sigma \), we substitute the sample standard deviation \( s \):

\[ SE_{\bar{x}} = \frac{s}{\sqrt{n}} \]

This substitution introduces extra uncertainty—\( s \) is itself a random variable that varies from sample to sample. The normal distribution doesn't account for this extra variability, especially in smaller samples. Enter the t-distribution!

When Doing Inference About...	We Know...	We Use...
Proportions	Can estimate \( p \) from \( \hat{p} \)	Z-distribution
Means (known \( \sigma \))	Population SD	Z-distribution
Means (unknown \( \sigma \))	Only sample SD \( s \)	t-distribution

The T-Distribution

The t-distribution (also called Student's t-distribution) was developed by William Sealy Gosset in 1908 while working at the Guinness Brewery in Dublin. He published under the pseudonym "Student" because Guinness didn't allow employees to publish under their own names—hence "Student's t."

The t-distribution looks similar to the normal distribution but accounts for the extra uncertainty when we estimate \( \sigma \) with \( s \).

Properties of the T-Distribution

The t-distribution has several key properties:

Symmetric and bell-shaped: Just like the normal distribution
Centered at zero: The mean is 0 (when sampling from a normal population)
Heavier tails: More probability in the tails than the normal distribution
Depends on sample size: Gets closer to normal as \( n \) increases
Defined by degrees of freedom: The shape is determined by a parameter called degrees of freedom

"Here's something that really helped the concept click for me," Sylvia shares. "The t-distribution is basically saying 'Hey, we're less certain about things because we had to estimate the spread from our sample.' Those heavier tails mean extreme values are more likely than with the normal distribution. It's the distribution being honest about our uncertainty!"

Visual Comparison: T vs. Normal

The t-distribution's heavier tails have real consequences for inference. Because more probability is in the tails, critical values for the t-distribution are larger than for the normal distribution. This means:

Confidence intervals are wider when using t
It's harder to get statistically significant results with small samples

As the sample size increases, the t-distribution approaches the normal distribution. With 30+ observations, they're quite similar. With 100+ observations, they're nearly identical.

Degrees of Freedom	Critical Value for 95% CI
5	2.571
10	2.228
20	2.086
30	2.042
50	2.009
100	1.984
∞ (Normal)	1.960

Notice how the critical values decrease as degrees of freedom increase, approaching 1.96 (the z* value for 95% confidence).

Diagram: T-Distribution vs Normal Distribution Comparison

T-Distribution vs Normal Distribution Comparison

Type: microsim

Bloom Level: Understand (L2) Bloom Verb: Compare, contrast

Learning Objective: Students will compare the shapes of t-distributions with different degrees of freedom to the standard normal distribution, understanding how heavier tails affect inference.

Data Visibility Requirements: - Stage 1: Show standard normal distribution (Z) as baseline curve in blue - Stage 2: Overlay t-distribution with user-selected df in orange/red - Stage 3: Show critical values for both distributions at 95% confidence level - Stage 4: Display area in tails for both distributions

Visual Elements: - Two overlapping distribution curves on same axes - Standard normal curve (blue, solid line) - T-distribution curve (orange, dashed line initially) - Shaded tail areas showing 2.5% in each tail - Vertical lines marking critical values - Legend showing which curve is which

Interactive Controls: - Slider: Degrees of freedom (df) from 1 to 100 - Radio buttons: Show 90%, 95%, or 99% confidence level - Checkbox: Show/hide shaded tail areas - Checkbox: Show/hide critical value lines

Display Panel (right side): - Current df value - t critical value for selected confidence level - z critical value for comparison - Difference between t and z

Default Parameters: - df = 5 - Confidence level = 95% - Tail areas shown - Critical values shown

Behavior: - As df slider moves, t-distribution curve smoothly transitions - Critical value lines and tail areas update in real-time - At high df (100+), curves should nearly overlap - At low df (1-5), t-distribution should have noticeably heavier tails

Instructional Rationale: Slider exploration is appropriate because the Apply/compare objective requires learners to see how the parameter (df) affects the distribution shape. Real-time visual feedback helps build intuition about why small samples produce wider intervals.

Implementation: p5.js with canvas-based controls

Degrees of Freedom

Degrees of freedom (df) is a parameter that determines the exact shape of the t-distribution. For the procedures in this chapter:

One-sample t-procedures: df = n - 1
Two-sample t-procedures: df is calculated from a complex formula (or conservatively estimated)
Paired t-procedures: df = n - 1 (where n is the number of pairs)

But what ARE degrees of freedom? Conceptually, they represent the number of independent pieces of information available to estimate a parameter.

Here's an analogy: Imagine you have 5 numbers that must add up to 50. You can choose the first 4 numbers freely, but once you've chosen them, the 5th number is determined—it's whatever value makes the sum equal 50. You had 4 "degrees of freedom" in your choices.

Similarly, when calculating the sample standard deviation \( s \), we use the sample mean \( \bar{x} \) in our calculations. Since we've already used the data to calculate \( \bar{x} \), we've "used up" one degree of freedom. That's why df = n - 1.

Why Degrees of Freedom Matter

Degrees of freedom affect:

Shape of the t-distribution: Lower df means heavier tails
Critical values: Lower df means larger critical values
Width of confidence intervals: Lower df means wider intervals
Difficulty of rejecting H₀: Lower df means we need more extreme evidence

"I love thinking about this one!" Sylvia's tail twitches with excitement. "Degrees of freedom are like how many independent choices you have left. If you're filling 5 bags with exactly 100 acorns total, you can put whatever you want in the first 4 bags. But that last bag? No choice—it gets whatever makes the total 100. Four degrees of freedom!"

T Critical Values

T critical values (denoted \( t^* \)) are the values that mark off specified areas in the tails of the t-distribution. To find a t critical value, you need:

The confidence level (or significance level)
The degrees of freedom

Finding T Critical Values

Most statistics courses use t-tables, calculators, or statistical software to find t critical values.

For a confidence interval at confidence level C: - Find the value \( t^* \) such that C% of the t-distribution is between -\( t^* \) and +\( t^* \) - This leaves (1-C)/2 in each tail

For a hypothesis test at significance level α: - For a two-sided test: find \( t^* \) with α/2 in each tail - For a one-sided test: find \( t^* \) with α in the relevant tail

Confidence Level	Area in Each Tail	Example t* (df=20)
90%	0.05	1.725
95%	0.025	2.086
99%	0.005	2.845

Diagram: Interactive T Critical Value Finder

Interactive T Critical Value Finder

Type: microsim

Bloom Level: Apply (L3) Bloom Verb: Calculate, use

Learning Objective: Students will find t critical values for different degrees of freedom and confidence levels, connecting the visual representation to the numerical values used in formulas.

Visual Elements: - T-distribution curve centered on canvas - Shaded regions showing tail areas or central area - Vertical lines at critical values - Labels showing t* values on the axis

Interactive Controls: - Slider: Degrees of freedom (1 to 100) - Dropdown: Select test type (Two-sided, Right-tailed, Left-tailed) - Dropdown: Select confidence/significance level (90%, 95%, 99%) - Toggle: Show confidence interval view vs. hypothesis test view

Display Panel: - Current df - t critical value(s) - Shaded area percentage - Comparison to z (when df > 30)

Behavior: - Curve shape updates smoothly with df changes - Shaded areas and critical value lines update in real-time - For two-sided: shade both tails - For one-sided: shade appropriate tail - Display exact t* value rounded to 3 decimal places

Implementation: p5.js with canvas-based controls

Conditions for T-Procedures

Before using any t-procedure, we must check that certain conditions for t-procedures are met. The validity of our inference depends on these conditions.

The Three Conditions

1. Random: The data must come from a random sample or randomized experiment.

This ensures our sample is representative
Without randomness, we cannot make inferences about the population
Check: Was there a random selection or random assignment process?

2. Normal/Large Sample: The sampling distribution of \( \bar{x} \) must be approximately normal.

This condition is satisfied if EITHER: - The population distribution is approximately normal, OR - The sample size is large (n ≥ 30) due to the Central Limit Theorem

For smaller samples (n < 30): - Look at a dotplot, histogram, or Normal probability plot of the data - Check for severe skewness or outliers - The more symmetric and outlier-free the data, the smaller the sample can be

3. Independence: Individual observations must be independent.

For sampling without replacement: The population should be at least 10 times the sample size (10% condition)
For experiments: Random assignment helps ensure independence

Checking Normality

The t-procedures are fairly robust to violations of the normality condition—they work reasonably well even when the population isn't perfectly normal. However:

With small samples (n < 15), the data should be close to normal with no outliers
With moderate samples (15 ≤ n < 30), the procedures can handle slight skewness
With large samples (n ≥ 30), the CLT kicks in, and normality matters less

Sylvia's Normality Check Tip

"Here's my rule of thumb: Graph the data first! If it looks roughly symmetric and has no extreme outliers, you're probably fine. If it looks like a ski slope (heavily skewed) or has values way out in the tails, be cautious—especially with small samples."

Sample Size	Acceptable Data Shape
n < 15	Must be close to normal, no outliers
15 ≤ n < 30	Can handle moderate skewness, no extreme outliers
n ≥ 30	CLT applies; any reasonable distribution works
n ≥ 40	Even skewed distributions are usually fine

One-Sample T-Interval

A one-sample t-interval is a confidence interval for a single population mean μ when σ is unknown (which is almost always).

The Formula

\[ \bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}} \]

Where: - \( \bar{x} \) = sample mean - \( t^* \) = t critical value for the desired confidence level with df = n - 1 - \( s \) = sample standard deviation - \( n \) = sample size - \( \frac{s}{\sqrt{n}} \) = standard error of the mean

Interpretation

We interpret the interval the same way as before: "We are C% confident that the true population mean μ lies between [lower bound] and [upper bound]."

Remember: The confidence level refers to the method, not to any particular interval. If we repeatedly took samples and built 95% confidence intervals, about 95% of those intervals would contain the true μ.

Complete Example: Study Time

Scenario: A researcher wants to estimate the average amount of time high school students spend on homework per week. A random sample of 25 students reported their weekly homework hours.

Data summary: - Sample size: n = 25 - Sample mean: \( \bar{x} = 8.2 \) hours - Sample standard deviation: s = 3.1 hours - Desired confidence level: 95%

Step 1: Check conditions - Random? Assume the sample was randomly selected ✓ - Normal? With n = 25, we need to check the data. Assume a histogram showed roughly symmetric distribution with no extreme outliers ✓ - Independent? The population of high school students is much larger than 10(25) = 250 ✓

Step 2: Find the critical value - df = 25 - 1 = 24 - For 95% confidence, t* = 2.064 (from t-table or calculator)

Step 3: Calculate the confidence interval

\[ 8.2 \pm 2.064 \cdot \frac{3.1}{\sqrt{25}} = 8.2 \pm 2.064 \cdot 0.62 = 8.2 \pm 1.28 \]

95% CI: (6.92, 9.48) hours

Step 4: Interpret We are 95% confident that the true mean weekly homework time for all high school students is between 6.92 and 9.48 hours.

Diagram: One-Sample T-Interval Calculator

One-Sample T-Interval Calculator

Type: microsim

Bloom Level: Apply (L3) Bloom Verb: Calculate, demonstrate

Learning Objective: Students will construct and interpret one-sample t-intervals for a population mean by entering sample statistics and seeing the step-by-step calculation process.

Data Visibility Requirements: - Stage 1: Show input values (x̄, s, n, confidence level) - Stage 2: Show df calculation (n - 1) - Stage 3: Show t lookup with visual on distribution - Stage 4: Show SE calculation (s / √n) - Stage 5: Show margin of error (t × SE) - Stage 6: Show final interval (x̄ ± ME)

Visual Elements: - Input form for sample statistics - Step-by-step calculation display - T-distribution curve with shaded confidence region - Number line showing the confidence interval - Written interpretation in proper statistical language

Interactive Controls: - Number input: Sample mean (x̄) - Number input: Sample standard deviation (s) - Number input: Sample size (n) - Dropdown: Confidence level (90%, 95%, 99%) - Button: Calculate

Display Areas: - Left: Calculation steps with formulas and values - Right: T-distribution visualization - Bottom: Number line with interval marked - Below: Written interpretation template

Behavior: - Validate inputs (n ≥ 2, s > 0) - Display df warning if n < 15 (check normality) - Show each calculation step when Calculate is pressed - Animate the interval appearing on the number line - Generate proper interpretation sentence

Default Values: - x̄ = 8.2 - s = 3.1 - n = 25 - Confidence = 95%

Implementation: p5.js with canvas-based input fields

One-Sample T-Test

The one-sample t-test is used to test a hypothesis about a single population mean when σ is unknown.

Setting Up the Test

Null hypothesis: \( H_0: \mu = \mu_0 \) (the population mean equals some specified value)

Alternative hypothesis: - Two-sided: \( H_a: \mu \neq \mu_0 \) - Right-tailed: \( H_a: \mu > \mu_0 \) - Left-tailed: \( H_a: \mu < \mu_0 \)

The Test Statistic

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

This formula measures how many standard errors the sample mean is from the hypothesized mean. It follows a t-distribution with df = n - 1.

Finding the P-Value

The p-value depends on the direction of the alternative:

Two-sided (\( H_a: \mu \neq \mu_0 \)): P = 2 × P(T > |t|)
Right-tailed (\( H_a: \mu > \mu_0 \)): P = P(T > t)
Left-tailed (\( H_a: \mu < \mu_0 \)): P = P(T < t)

Complete Example: Sleep Study

Scenario: It's recommended that high school students get at least 8 hours of sleep per night. A health researcher suspects that students at a particular school get less than the recommended amount. She surveys a random sample of 36 students and finds they average 7.2 hours with a standard deviation of 1.8 hours. Test at α = 0.05.

Step 1: State hypotheses - \( H_0: \mu = 8 \) (students get the recommended amount) - \( H_a: \mu < 8 \) (students get less than recommended) [left-tailed]

Step 2: Check conditions - Random: Random sample of students ✓ - Normal/Large Sample: n = 36 ≥ 30, so CLT applies ✓ - Independence: Population of students >> 360 ✓

Step 3: Calculate test statistic

\[ t = \frac{7.2 - 8}{1.8 / \sqrt{36}} = \frac{-0.8}{0.3} = -2.67 \]

Step 4: Find p-value - df = 36 - 1 = 35 - P-value = P(T < -2.67) ≈ 0.006

Step 5: Make conclusion Since p-value (0.006) < α (0.05), we reject H₀.

Step 6: Interpret in context There is convincing statistical evidence that students at this school get less than the recommended 8 hours of sleep per night on average.

Two-Sample T-Procedures: Comparing Two Means

Often we want to compare the means of two different groups—does a new teaching method work better than the traditional one? Do students who exercise perform differently academically? These questions call for two-sample t-procedures.

The Setup

We have two independent groups: - Group 1: sample size \( n_1 \), sample mean \( \bar{x}_1 \), sample SD \( s_1 \) - Group 2: sample size \( n_2 \), sample mean \( \bar{x}_2 \), sample SD \( s_2 \)

We want to estimate or test \( \mu_1 - \mu_2 \), the difference between population means.

Two-Sample T-Interval

The two-sample t-interval for \( \mu_1 - \mu_2 \) is:

\[ (\bar{x}_1 - \bar{x}_2) \pm t^* \cdot \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

The degrees of freedom for this interval use a complicated formula (Welch's approximation). Most calculators and software compute this automatically. A conservative approach uses df = smaller of (n₁ - 1) and (n₂ - 1).

Two-Sample T-Test

The two-sample t-test tests whether two population means are different.

Hypotheses: - \( H_0: \mu_1 - \mu_2 = 0 \) (or equivalently, \( \mu_1 = \mu_2 \)) - \( H_a: \mu_1 - \mu_2 \neq 0 \) (or >, or <)

Test statistic:

\[ t = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Conditions for Two-Sample T-Procedures

The same three conditions apply, but now for BOTH samples:

Random: Both samples must be randomly selected (or randomly assigned in an experiment)
Normal/Large Sample: Both sampling distributions of \( \bar{x} \) should be approximately normal
Independence: Observations within each sample are independent; the two samples are independent of each other

"Time to squirrel away this key insight!" Sylvia taps her notebook. "The two samples MUST be independent of each other. If the same subjects appear in both groups, or if there's some natural pairing, you need a different approach—paired data. We'll get to that soon!"

Comparing...	Example	Method
Two independent groups	Boys vs. girls	Two-sample t
Same subjects, two conditions	Before vs. after	Paired t
Matched pairs	Twins, siblings	Paired t

Pooled vs. Unpooled Procedures

You may encounter two versions of two-sample t-procedures: pooled and unpooled.

Unpooled (Welch's) Procedure

The formulas above are the unpooled (or Welch's) version. This is the default in AP Statistics and most modern software because:

It doesn't assume equal variances in the two populations
It's more robust to violations of assumptions
The degrees of freedom calculation is more accurate

Pooled Procedure

The pooled version assumes that \( \sigma_1 = \sigma_2 \) (equal population variances). It combines (pools) the sample variances into a single estimate of the common variance.

The pooled estimate is:

\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} \]

With df = \( n_1 + n_2 - 2 \).

Which to Use?

Situation	Recommendation
AP Statistics exam	Use unpooled (two-sample t) unless told otherwise
Software default	Usually unpooled
Sample SDs are very different	Definitely unpooled
Told variances are equal	Can use pooled
Randomized experiment with same variance	Either is acceptable

When in Doubt, Use Unpooled

The unpooled procedure is safer because it doesn't require the equal-variance assumption. When variances truly are equal, both methods give similar results. When variances differ, the pooled method can be misleading.

Diagram: Two-Sample T-Test Visualization

Two-Sample T-Test Visualization

Type: microsim

Bloom Level: Analyze (L4) Bloom Verb: Compare, differentiate

Learning Objective: Students will compare two group means visually and statistically, understanding when the difference is statistically significant versus when overlap makes conclusions uncertain.

Visual Elements: - Two dotplots or histograms showing sample data (side by side or stacked) - Vertical lines at each sample mean - Display of sample statistics for each group - Number line showing confidence interval for μ₁ - μ₂ - T-distribution with test statistic marked

Interactive Controls: - Input fields for: n₁, x̄₁, s₁, n₂, x̄₂, s₂ - OR ability to generate random samples with specified parameters - Dropdown: Confidence level / significance level - Radio buttons: Alternative hypothesis direction - Button: Perform test

Display Areas: - Top: Visual comparison of two groups - Middle: Summary statistics table - Bottom-left: Confidence interval for difference - Bottom-right: Hypothesis test results (t-statistic, df, p-value)

Data Visibility: - Show both sample distributions - Mark means with clear visual indicators - Display difference between means prominently - Show whether CI for difference includes 0

Behavior: - When user changes inputs, visualizations update - Highlight when p < α (statistically significant) - Show connection: if 0 is not in CI, test rejects H₀ - Display interpretation in words

Default Values: - Group 1: n₁ = 30, x̄₁ = 78, s₁ = 10 - Group 2: n₂ = 32, x̄₂ = 72, s₂ = 12 - α = 0.05

Implementation: p5.js with canvas-based controls

Paired Data and the Paired T-Test

Sometimes the two samples aren't independent—they're connected in some meaningful way. This is paired data, and it requires a different approach.

What Is Paired Data?

Paired data occurs when each observation in one group is naturally linked to an observation in the other group. Common examples:

Before/after measurements: The same subjects measured at two different times
Matched pairs: Subjects are deliberately paired based on similar characteristics
Twins or siblings: Each pair shares genetic or environmental factors
Left/right measurements: Same person, different sides

When to Pair

The key question for when to pair: Is there a natural connection between observations across groups?

Scenario	Independent or Paired?	Why?
Compare test scores of class A vs. class B	Independent	Different students
Compare pretest vs. posttest for same students	Paired	Same students
Compare sleep of athletes vs. non-athletes	Independent	Different people
Compare sleep on weekdays vs. weekends for same people	Paired	Same people

"Don't worry—every statistician drops an acorn sometimes when figuring this out!" Sylvia laughs. "I remember getting confused until I asked myself: 'Is there a natural one-to-one matching?' If each observation in Group 1 has a specific partner in Group 2, you've got paired data!"

The Paired T-Test

For paired data, we don't compare the two samples directly. Instead, we:

Calculate the difference for each pair: \( d = x_1 - x_2 \)
Treat these differences as a single sample
Apply a one-sample t-test to the differences

Hypotheses: - \( H_0: \mu_d = 0 \) (the mean difference is zero) - \( H_a: \mu_d \neq 0 \) (or > 0, or < 0)

Test statistic:

\[ t = \frac{\bar{d} - 0}{s_d / \sqrt{n}} \]

Where: - \( \bar{d} \) = mean of the differences - \( s_d \) = standard deviation of the differences - \( n \) = number of pairs

Degrees of freedom: df = n - 1 (number of pairs minus 1)

Confidence Interval for Mean Difference

\[ \bar{d} \pm t^* \cdot \frac{s_d}{\sqrt{n}} \]

Complete Example: Study Technique

Scenario: Researchers want to test whether a new study technique improves test scores. They recruit 20 students and give them a pretest, teach them the technique, and give a posttest. Here are summary statistics for the differences (Post - Pre):

n = 20 pairs
Mean difference: \( \bar{d} = 4.2 \) points
SD of differences: \( s_d = 6.5 \) points
Test at α = 0.05

Step 1: State hypotheses - \( H_0: \mu_d = 0 \) (technique doesn't improve scores) - \( H_a: \mu_d > 0 \) (technique improves scores) [right-tailed]

Step 2: Check conditions - Random: Assume students were randomly selected ✓ - Normal: n = 20, check histogram of differences for approximate normality ✓ - Independence: Differences are independent of each other ✓

Step 3: Calculate test statistic

\[ t = \frac{4.2 - 0}{6.5 / \sqrt{20}} = \frac{4.2}{1.454} = 2.89 \]

Step 4: Find p-value - df = 20 - 1 = 19 - P-value = P(T > 2.89) ≈ 0.0047

Step 5: Make conclusion Since p-value (0.0047) < α (0.05), we reject H₀.

Step 6: Interpret There is convincing statistical evidence that the new study technique improves test scores, on average.

Diagram: Paired vs Independent Data Decision Flowchart

Paired vs Independent Data Decision Flowchart

Type: infographic

Bloom Level: Analyze (L4) Bloom Verb: Differentiate, distinguish

Learning Objective: Students will correctly identify whether a given scenario calls for paired or independent samples t-procedures by following a decision flowchart.

Layout: Decision tree flowchart with yes/no branches

Starting Question: "Comparing two groups?"

Branch 1: "Are the same individuals measured twice?" - Yes → Paired data - No → Continue to Branch 2

Branch 2: "Are individuals deliberately matched into pairs?" - Yes → Paired data - No → Continue to Branch 3

Branch 3: "Is there any natural one-to-one connection between observations?" - Yes → Paired data - No → Independent samples

End Nodes: - "Paired data → Use paired t-test (analyze differences)" - "Independent samples → Use two-sample t-test"

Visual Style: - Diamond shapes for decision points - Rectangular boxes for conclusions - Green arrows for "Yes" - Orange arrows for "No" - Sylvia illustration at start with speech bubble

Interactive Features: - Hover over each node for example scenario - Click end nodes for summary of appropriate procedure - Optional: Quiz mode where students classify scenarios

Color Scheme: - Sylvia green for decision diamonds - Auburn accent for conclusion boxes - Cream background

Implementation: HTML/CSS/JavaScript or p5.js

Why Pairing Matters: The Advantage of Paired Design

Why do we bother with pairing? Because it often gives us more power to detect real differences.

The Key Insight

When we pair data, we control for variability between subjects. Consider measuring blood pressure before and after taking medication:

Two-sample approach: We'd see huge variability because different people have different baseline blood pressures
Paired approach: We focus on the CHANGE within each person, removing between-person variability

By eliminating subject-to-subject variability, the differences tend to have less spread, leading to: - Smaller standard error - Narrower confidence intervals - More statistical power

Mathematical Comparison

For independent samples, the SE of the difference is:

\[ SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

For paired data, the SE of the mean difference is:

\[ SE = \frac{s_d}{\sqrt{n}} \]

When subjects are consistent (their individual measurements are similar), \( s_d \) will be much smaller than the individual sample SDs, making paired procedures more powerful.

Design Type	Controls for...	Best when...
Independent samples	Nothing special	Groups are naturally separate
Paired data	Between-subject variability	Within-subject changes are the focus

Robustness of T-Procedures

How well do t-procedures work when our conditions aren't perfectly met? This quality is called robustness.

What Robustness Means

A procedure is robust if it gives reasonably accurate results even when some assumptions are violated. T-procedures are considered quite robust, meaning:

Confidence levels are approximately correct even when the population isn't exactly normal
P-values are approximately valid even with moderate departures from normality

Guidelines for Robustness

Sample size matters most:

n < 15: The data should be close to normal with no outliers. T-procedures are NOT robust with very small samples from non-normal populations.
15 ≤ n < 30: The procedures can handle moderate skewness but are sensitive to extreme outliers.
n ≥ 30: The Central Limit Theorem provides robustness. Even fairly skewed distributions work well.
n ≥ 40: Strong robustness. The procedures work for most real-world distributions.

What affects robustness most:

Outliers: The biggest concern! Outliers affect both \( \bar{x} \) and \( s \), potentially distorting results
Extreme skewness: One-sided tails pull the mean away from the center
Heavy tails: Populations with many extreme values

What doesn't affect robustness much:

Slight skewness: Especially with larger samples
Non-normality with symmetric distributions: T-procedures handle these well
Gaps in the data: Unless they indicate outliers

Sylvia's Robustness Rule

"When in doubt, graph it out! Always look at your data before running inference. A boxplot or dotplot can reveal outliers and skewness. If you see major problems, you might need a larger sample or alternative methods."

Diagram: Robustness Exploration MicroSim

Robustness Exploration MicroSim

Type: microsim

Bloom Level: Evaluate (L5) Bloom Verb: Judge, assess

Learning Objective: Students will assess how violations of the normality condition affect the reliability of t-procedures by simulating many samples from populations with different shapes and observing the actual confidence interval coverage rates.

Data Visibility Requirements: - Show population distribution shape - Generate many (100+) samples of specified size - Calculate confidence interval for each sample - Track what percentage of intervals contain the true μ - Compare to nominal confidence level

Visual Elements: - Population distribution display (normal, skewed, uniform, with outliers) - Animation of sample CIs being generated - Running count of "hits" (CI contains μ) vs "misses" - Bar chart comparing actual coverage to nominal level - Final summary statistics

Interactive Controls: - Dropdown: Population shape (Normal, Right-skewed, Left-skewed, Uniform, With outliers) - Slider: Sample size (5, 10, 15, 20, 30, 50, 100) - Slider: Number of simulations (100, 500, 1000) - Button: Run simulation - Radio: Confidence level (90%, 95%, 99%)

Display Areas: - Top: Population distribution visualization - Middle: Animation of samples and CIs - Bottom: Summary comparing actual vs. expected coverage

Key Metrics Shown: - Nominal confidence level (e.g., 95%) - Actual coverage rate (e.g., 93.4%) - Whether the difference is concerning

Expected Behavior: - Normal population: coverage ≈ nominal at all sample sizes - Skewed population, small n: coverage < nominal - Skewed population, large n: coverage ≈ nominal (robustness!) - Outliers: coverage varies depending on severity

Instructional Rationale: Simulation is appropriate for the Evaluate objective because students need to see empirical evidence of how robustness works. Seeing actual coverage rates helps them judge when to trust t-procedures.

Implementation: p5.js with canvas-based controls

Introduction to Regression Inference

The final concepts in this chapter connect to regression—specifically, making inferences about the slope of a linear relationship. While full regression analysis often has its own chapter, understanding the basics of inference for slopes fits naturally here because it uses t-procedures.

The Regression Model

A regression model assumes that the relationship between an explanatory variable x and a response variable y follows:

\[ y = \alpha + \beta x + \epsilon \]

Where: - \( \alpha \) (alpha) = population y-intercept - \( \beta \) (beta) = population slope (the slope parameter) - \( \epsilon \) (epsilon) = random error term (assumed to be normally distributed)

The regression line we calculate from sample data, \( \hat{y} = a + bx \), estimates this true relationship.

Why Test the Slope?

The most common inferential question about regression is: Is there a significant linear relationship between x and y?

This translates to testing whether the true slope β equals zero:

\( H_0: \beta = 0 \) (no linear relationship)
\( H_a: \beta \neq 0 \) (there IS a linear relationship)

If β = 0, then y doesn't change as x changes—there's no linear relationship. If we can reject this hypothesis, we have evidence of a genuine linear association.

Standard Error of the Slope

The standard error of the slope measures how much the sample slope b would vary from sample to sample:

\[ SE_b = \frac{s}{\sqrt{\sum(x_i - \bar{x})^2}} \]

Where \( s \) is the standard deviation of the residuals.

This formula isn't on the AP formula sheet—you'll use calculator or software output. But understanding what it means is important: smaller SE means more precise estimates of the true slope.

T-Test for the Slope

The test statistic for testing \( H_0: \beta = 0 \) is:

\[ t = \frac{b - 0}{SE_b} = \frac{b}{SE_b} \]

This follows a t-distribution with df = n - 2 (we estimate two parameters: slope and intercept).

Confidence Interval for the Slope

\[ b \pm t^* \cdot SE_b \]

This interval tells us the range of plausible values for the true population slope.

Reading Computer Output

Most regression questions provide computer output. You need to identify:

Term	What to Look For
Slope estimate (b)	Usually labeled "Coef" or "Estimate" for the x-variable
SE of slope	Usually labeled "SE Coef" or "Std Error"
t-statistic	Often provided, or calculate as b/SE
p-value	Usually labeled "P" or "p-value"
df	Typically n - 2 for simple linear regression

Summary: Choosing the Right T-Procedure

Let's bring it all together. Here's how to choose the appropriate t-procedure:

Question Type	Parameter	Procedure	Test Statistic df
Estimate/test one mean	μ	One-sample t	n - 1
Compare two independent means	μ₁ - μ₂	Two-sample t	Formula or conservative
Compare paired measurements	μ_d	Paired t	n - 1 (# of pairs)
Test slope of regression	β	Regression t	n - 2

Decision Checklist

When facing a problem involving means, ask yourself:

How many groups?
One group → One-sample t
Two groups → Continue to question 2
Are the groups independent or paired?
Independent → Two-sample t
Paired → Paired t
What do you want to do?
Estimate → Confidence interval
Test a claim → Hypothesis test
Check conditions!
Random?
Normal/Large Sample?
Independent?

"Time to squirrel away this knowledge!" Sylvia beams. "You've got a whole toolkit now for inference about means. The key is matching the right tool to the situation. And remember—always check those conditions before diving in!"

Key Formulas Summary

One-Sample T-Interval

\[ \bar{x} \pm t^* \cdot \frac{s}{\sqrt{n}} \quad \text{where df} = n - 1 \]

One-Sample T-Test Statistic

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \quad \text{where df} = n - 1 \]

Two-Sample T-Interval (Unpooled)

\[ (\bar{x}_1 - \bar{x}_2) \pm t^* \cdot \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \]

Two-Sample T-Test Statistic (Unpooled)

\[ t = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Paired T-Interval

\[ \bar{d} \pm t^* \cdot \frac{s_d}{\sqrt{n}} \quad \text{where df} = n - 1 \]

Paired T-Test Statistic

\[ t = \frac{\bar{d} - 0}{s_d / \sqrt{n}} \quad \text{where df} = n - 1 \]

T-Test for Regression Slope

\[ t = \frac{b}{SE_b} \quad \text{where df} = n - 2 \]

Chapter Summary

In this chapter, you learned how to extend statistical inference to population means using t-distributions. Let's recap the key ideas:

The T-Distribution:

Used when the population standard deviation is unknown (almost always)
Has heavier tails than the normal distribution
Approaches the normal distribution as sample size increases
Shape determined by degrees of freedom

Conditions for T-Procedures:

Random: Data from random sample or randomized experiment
Normal/Large Sample: Population normal OR n ≥ 30
Independence: Observations are independent (10% condition for sampling)

One-Sample Procedures:

Use when estimating or testing one population mean
df = n - 1
Check conditions on the sample data

Two-Sample Procedures:

Use when comparing two independent groups
Use unpooled (Welch's) approach unless told otherwise
Check conditions on both samples

Paired Procedures:

Use when observations are naturally paired (before/after, matched pairs)
Calculate differences first, then do one-sample analysis
More powerful when subjects vary more than within-subject changes

Robustness:

T-procedures work reasonably well even when conditions aren't perfect
Larger samples provide more robustness
Watch out for outliers and extreme skewness with small samples

Regression Inference:

Test whether slope differs from zero to assess linear relationship
Uses t-distribution with df = n - 2
Usually read results from computer output

You now have a complete toolkit for inference about means. These procedures are workhorses of statistical analysis, used in countless real-world applications from medical research to educational studies to quality control.

Acorn for Your Thoughts: Self-Check Questions

Why do we use the t-distribution instead of the normal distribution for inference about means?

Because we have to estimate the population standard deviation using the sample standard deviation s. This introduces extra uncertainty that the t-distribution accounts for with its heavier tails.

A researcher has 12 observations and wants to construct a 95% confidence interval for the mean. What degrees of freedom should she use?

df = n - 1 = 12 - 1 = 11

How would you decide whether to use a two-sample t-test or a paired t-test?

Ask whether there's a natural one-to-one pairing between observations. If the same subjects are measured twice, if subjects are deliberately matched, or if there's any inherent connection between pairs—use paired. If the groups are completely separate with no connection, use two-sample.

Why are t-procedures considered "robust"?

They give reasonably accurate results even when the population isn't perfectly normal, especially with larger sample sizes. The CLT helps ensure the sampling distribution is approximately normal even when the population isn't.

What's the advantage of paired data over independent samples?

Paired data controls for between-subject variability. When we look at differences within subjects, we eliminate the noise from comparing different individuals, often leading to smaller standard errors and more statistical power.

Looking Ahead

In the next chapter, we'll explore inference for categorical data using chi-square tests. These methods let us analyze relationships between categorical variables and test whether observed frequencies match expected patterns. Get ready to expand your statistical toolkit even further!