Sampling Distributions
Summary
This chapter introduces the concept of sampling distributions, which form the theoretical foundation for statistical inference. Students will learn about sampling variability, the sampling distribution of sample proportions and sample means, and the Central Limit Theorem. Understanding these concepts is essential for constructing confidence intervals and performing hypothesis tests.
Concepts Covered
This chapter covers the following 17 concepts from the learning graph:
- Sampling Variability
- Sampling Distribution
- Sample Proportion
- Sampling Dist of Proportion
- Mean of Sample Proportion
- SD of Sample Proportion
- Conditions for Proportion SD
- Sample Mean
- Sampling Dist of Mean
- Mean of Sample Mean
- SD of Sample Mean
- Central Limit Theorem
- CLT Conditions
- Normal Approximation
- Statistical Inference
- Sample Size for CI
Prerequisites
This chapter builds on concepts from:
- Chapter 5: Standardization and Normal Distributions
- Chapter 11: Sampling and Bias
- Chapter 13: Random Variables
Introduction: From Sample to Population
Welcome back! Sylvia here, and I have to tell you—this chapter contains some of the most beautiful ideas in all of statistics. I know, I know, you might think I say that about every chapter. But sampling distributions? These are the mathematical bridge that lets us make claims about millions of people based on just a few hundred responses. That's pretty amazing when you think about it!
Here's the big question we're tackling: If you take a sample from a population and calculate a statistic (like a mean or proportion), how confident can you be that your sample statistic is close to the true population parameter? After all, different samples give different results—that's just the nature of randomness.
Think about it this way: I've spent years tracking acorn production in Oak Valley. Every autumn, I can't possibly count every acorn on every tree (trust me, I've tried—it made me a little nutty). Instead, I sample a few trees and use those results to estimate the whole forest's production. But here's what kept me up at night: how much can my estimate vary from sample to sample? And can I trust my estimate?
That's exactly what sampling distributions help us understand. They're the key to unlocking statistical inference—the process of using sample data to draw conclusions about populations.
Sampling Variability: Why Samples Differ
Let's start with a fundamental truth that every statistician must embrace: sampling variability is unavoidable. Different random samples from the same population will yield different statistics.
Imagine we want to know the proportion of students at your school who prefer online learning over in-person classes. If you randomly surveyed 50 students, you might find that 62% prefer online learning. But if your friend surveyed a different random sample of 50 students, they might find 58%. A third sample might give 65%.
None of these samples are "wrong"—they're just different. This natural fluctuation in sample statistics from one sample to another is called sampling variability.
| Sample | Number Surveyed | Proportion Preferring Online |
|---|---|---|
| Sample 1 | 50 | 0.62 |
| Sample 2 | 50 | 0.58 |
| Sample 3 | 50 | 0.65 |
| Sample 4 | 50 | 0.60 |
| Sample 5 | 50 | 0.56 |
Sylvia Says
Don't worry—every statistician drops an acorn sometimes. Sampling variability isn't a problem to fix; it's a reality to understand and quantify. Once we know how much our estimates typically vary, we can account for that uncertainty in our conclusions.
The key insight is that while individual samples vary, the pattern of this variation is predictable. If we could take thousands of samples and calculate a statistic from each one, the distribution of those statistics would follow a recognizable pattern. This pattern is called a sampling distribution.
What Is a Sampling Distribution?
A sampling distribution is the distribution of a statistic (like the sample mean or sample proportion) calculated from all possible samples of the same size from a population.
Here's how to think about it conceptually:
- Imagine taking every possible random sample of size \( n \) from a population
- Calculate your statistic (mean, proportion, etc.) for each sample
- Create a distribution of all those statistics
The resulting distribution shows us:
- The center: What value does the statistic typically cluster around?
- The spread: How much do sample statistics vary?
- The shape: Is the distribution normal, skewed, or something else?
Now, in reality, we only take one sample. But understanding the theoretical sampling distribution helps us know how reliable our single sample is likely to be.
Diagram: Sampling Distribution Concept Visualization
Sampling Distribution Concept Visualization
Type: microsim
Learning Objective: Understand (L2) how individual samples combine to form a sampling distribution; explain how the distribution of sample statistics differs from the distribution of individual data points
Bloom Taxonomy Level: Understand (L2) Bloom Taxonomy Verb: explain, demonstrate
Purpose: Demonstrate visually how taking many samples and calculating a statistic from each creates a predictable sampling distribution, even when individual samples vary
Canvas layout: - Drawing area (full width): Population visualization at top, samples in middle, sampling distribution histogram at bottom
Visual elements: - Top section: A large population represented as colored dots (blue and orange for a proportion, or a distribution for means) - Middle section: Animation showing samples being drawn, with sample statistic calculated - Bottom section: Histogram building as each sample statistic is added - Counter showing number of samples taken - Display of current sample statistic
Interactive controls: - Button: "Take 1 Sample" - draws one sample and adds statistic to histogram - Button: "Take 10 Samples" - rapidly draws 10 samples - Button: "Take 100 Samples" - rapidly draws 100 samples - Button: "Reset" - clears the histogram and starts over - Slider: Sample size n (10, 25, 50, 100) - Display: Mean and standard deviation of the sampling distribution
Default parameters: - Population proportion: 0.6 (or mean = 100, SD = 15 for means) - Sample size: 25 - Animation speed: moderate
Behavior: - When sample is taken, show dots being selected from population - Calculate sample statistic and briefly display it - Add statistic to histogram with animation - Update running mean and SD of sampling distribution - As more samples accumulate, show histogram becoming smoother and more normal-shaped
Instructional Rationale: Step-through demonstration is appropriate because students need to see the process of sampling and statistic calculation repeatedly to understand that the sampling distribution emerges from aggregating many sample statistics. The ability to control pace (1, 10, or 100 samples) lets students observe both individual variation and the overall pattern.
Implementation: p5.js with responsive canvas sizing
Sample Proportion and Its Sampling Distribution
When we're dealing with categorical data, we often want to estimate the proportion of a population that has a certain characteristic. The sample proportion, denoted \( \hat{p} \) (read as "p-hat"), is calculated as:
where \( n \) is the sample size and "success" refers to the outcome we're counting.
For example, if you survey 200 students and 124 say they prefer morning classes, then:
The Sampling Distribution of the Sample Proportion
Now here's where it gets exciting! If we could take many, many samples of size \( n \) from a population with true proportion \( p \), the sampling distribution of \( \hat{p} \) has these remarkable properties:
Center (Mean of \( \hat{p} \)):
The mean of the sampling distribution equals the population proportion! This tells us that \( \hat{p} \) is an unbiased estimator of \( p \)—on average, our sample proportion hits the target.
Spread (Standard Deviation of \( \hat{p} \)):
This formula reveals something profound: larger samples have smaller standard deviations, meaning less variability in our estimates. This is why pollsters survey 1,000 people instead of 100!
| Sample Size \( n \) | Standard Deviation (when \( p = 0.5 \)) |
|---|---|
| 25 | 0.100 |
| 100 | 0.050 |
| 400 | 0.025 |
| 1,600 | 0.0125 |
Notice the pattern: to cut the standard deviation in half, you need to quadruple the sample size. This is because of the square root in the formula.
Conditions for Using the Proportion Formula
The standard deviation formula works when these conditions are met:
-
Independence Condition: Individual observations must be independent. This is satisfied when sampling is random and, if sampling without replacement, the sample size is less than 10% of the population (the "10% condition").
-
Large Counts Condition: Both \( np \geq 10 \) and \( n(1-p) \geq 10 \). This ensures enough successes and failures for the normal approximation to work.
Why the 10% Rule?
When we sample without replacement, each selection affects the probability of subsequent selections. However, when the sample is less than 10% of the population, this effect is negligible, and we can treat selections as approximately independent.
Sample Mean and Its Sampling Distribution
When working with quantitative data, we often focus on the sample mean, denoted \( \bar{x} \) (read as "x-bar"):
Just like the sample proportion, the sample mean has a sampling distribution with predictable properties.
Properties of the Sampling Distribution of the Sample Mean
Center (Mean of \( \bar{x} \)):
The mean of the sampling distribution equals the population mean. Like \( \hat{p} \), the sample mean \( \bar{x} \) is an unbiased estimator.
Spread (Standard Deviation of \( \bar{x} \)):
This quantity is sometimes called the standard error of the mean. Again, we see that larger samples lead to less variability—the square root of \( n \) appears in the denominator.
Let me give you a concrete example. Suppose SAT math scores have a population mean \( \mu = 500 \) and standard deviation \( \sigma = 100 \).
| Sample Size | Standard Deviation of \( \bar{x} \) |
|---|---|
| 1 | 100 |
| 25 | 20 |
| 100 | 10 |
| 400 | 5 |
With one person, your "sample mean" is just their score—wildly variable. But with 400 people, your sample mean will typically be within about 10 points (two standard errors) of the true mean.
Diagram: Standard Error and Sample Size Explorer
Standard Error and Sample Size Explorer
Type: microsim
Learning Objective: Apply (L3) the relationship between sample size and standard error; demonstrate how increasing sample size reduces the variability of the sampling distribution
Bloom Taxonomy Level: Apply (L3) Bloom Taxonomy Verb: demonstrate, calculate, apply
Purpose: Allow students to interactively explore how sample size affects the spread of sampling distributions for both proportions and means
Canvas layout: - Left side (60%): Graph showing sampling distribution curve - Right side (40%): Control panel and calculations
Visual elements: - Normal curve representing the sampling distribution - Shaded region showing one standard error on each side of the mean - Numerical display of standard error calculation - Formula display with current values substituted
Interactive controls: - Radio buttons: Choose "Proportion" or "Mean" - Slider: Sample size n (10 to 500) - For Proportion mode: Slider for p (0.1 to 0.9) - For Mean mode: Input for population standard deviation sigma (1 to 100) - Display: Current standard error with calculation breakdown
Default parameters: - Mode: Proportion - p = 0.5 - n = 50 - sigma = 15 (for mean mode)
Behavior: - As sample size increases, curve becomes narrower (less spread) - Standard error calculation updates in real-time - Show the formula with actual numbers substituted - Highlight that quadrupling n halves the standard error
Instructional Rationale: This apply-level simulation lets students manipulate parameters and immediately see the effect on the sampling distribution. The focus on the formula with substituted values helps connect the abstract formula to concrete calculations.
Implementation: p5.js with responsive canvas sizing
The Central Limit Theorem: The Crown Jewel of Statistics
And now, my tail is absolutely tingling because we've arrived at the most important theorem in statistics: the Central Limit Theorem (CLT).
Here's the amazing claim:
The Central Limit Theorem
For a random sample of size \( n \) from any population with mean \( \mu \) and standard deviation \( \sigma \), the sampling distribution of the sample mean \( \bar{x} \) becomes approximately normal as \( n \) gets larger, regardless of the shape of the population distribution.
Let that sink in. The population could be skewed, bimodal, uniform, or any shape whatsoever. It doesn't matter! If you take large enough samples, the distribution of sample means will be approximately normal.
This is genuinely remarkable. It's like saying, "No matter how chaotic the ingredients, the cake always comes out the same shape."
Why the CLT Matters
The Central Limit Theorem is why we can:
- Use normal probability calculations for inference
- Construct confidence intervals
- Perform hypothesis tests
- Make predictions about sample statistics
Without the CLT, we'd need to know the exact shape of every population we study—an impossible task!
CLT Conditions: When Does It Apply?
The Central Limit Theorem works when:
- Random sampling: Data must come from a random sample or randomized experiment
- Independence: Observations must be independent (10% condition for sampling without replacement)
- Sample size: \( n \) must be "large enough"
What counts as "large enough"? Here's a practical guide:
| Population Shape | Minimum Sample Size |
|---|---|
| Already normal | Any size works |
| Slightly skewed | \( n \geq 15 \) |
| Moderately skewed | \( n \geq 25 \) |
| Heavily skewed | \( n \geq 40 \) |
| Extremely skewed or outliers | May need \( n \geq 100 \) |
The more skewed or unusual the population, the larger the sample you need for the sampling distribution to become approximately normal.
Diagram: Central Limit Theorem Demonstration
Central Limit Theorem Demonstration
Type: microsim
Learning Objective: Analyze (L4) how the Central Limit Theorem transforms non-normal population distributions into normal sampling distributions as sample size increases; compare sampling distributions across different population shapes
Bloom Taxonomy Level: Analyze (L4) Bloom Taxonomy Verb: compare, examine, differentiate
Purpose: Provide visual, interactive proof of the Central Limit Theorem by showing how sampling distributions become normal regardless of the population shape
Canvas layout: - Top section (30%): Population distribution display - Middle section (50%): Sampling distribution histogram that builds dynamically - Bottom section (20%): Controls and statistics
Visual elements: - Population distribution shown as histogram or smooth curve - Building histogram of sample means - Overlaid normal curve on sampling distribution for comparison - Statistics panel showing mean and SD of sampling distribution - Comparison of theoretical vs observed values
Interactive controls: - Dropdown: Population shape (Uniform, Skewed Right, Skewed Left, Bimodal, Normal, U-shaped) - Slider: Sample size n (1, 2, 5, 10, 25, 50, 100) - Button: "Take 1 Sample" - Button: "Take 100 Samples" - Button: "Take 1000 Samples" - Button: "Reset" - Toggle: Show/hide normal overlay curve
Default parameters: - Population: Skewed Right - Sample size: n = 1 - Show normal overlay: On
Behavior: - When n = 1, sampling distribution matches population shape - As n increases, sampling distribution becomes more bell-shaped - Normal overlay helps students see how close to normal the distribution is - Display theoretical mean and SD alongside observed values - Color coding: population in one color, sampling distribution in another
Data Visibility Requirements: - Stage 1: Show population distribution clearly labeled - Stage 2: Show individual sample being drawn (highlight selected values) - Stage 3: Show calculation of sample mean for that sample - Stage 4: Show sample mean being added to histogram - Stage 5: As samples accumulate, show histogram converging to normal
Instructional Rationale: The power of this simulation is letting students see the CLT in action. Starting with n=1 (where sampling distribution matches population) and gradually increasing n shows the transformation process. The analyze-level objective is supported by comparing different population shapes and sample sizes.
Implementation: p5.js with responsive canvas sizing; pre-defined population distributions
The CLT for Proportions
The Central Limit Theorem also applies to sample proportions! When the conditions are met, the sampling distribution of \( \hat{p} \) is approximately normal with:
- Mean: \( \mu_{\hat{p}} = p \)
- Standard deviation: \( \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \)
The conditions for the normal approximation for proportions are:
- Random sample
- Independence (10% condition)
- Large counts: \( np \geq 10 \) and \( n(1-p) \geq 10 \)
The large counts condition ensures we have enough "successes" and "failures" for the normal approximation to work well.
Normal Approximation in Practice
Once we know a sampling distribution is approximately normal, we can use z-scores to find probabilities! This is called the normal approximation.
For Sample Proportions
If the conditions are met, we can calculate:
This z-score tells us how many standard errors our sample proportion is from the population proportion.
For Sample Means
Similarly, for sample means:
Worked Example: Polling for an Election
Let's work through a realistic example. Suppose 52% of voters in a large city support a ballot measure. A polling organization surveys a random sample of 400 voters.
Question: What's the probability that the sample proportion supporting the measure is less than 50%?
Step 1: Check conditions
- Random sample: Given
- Independence: 400 is less than 10% of the city's voters
- Large counts: \( np = 400(0.52) = 208 \geq 10 \) and \( n(1-p) = 400(0.48) = 192 \geq 10 \)
Step 2: Find mean and standard deviation of sampling distribution
Step 3: Calculate z-score
Step 4: Find probability using normal distribution
Using a z-table or calculator: \( P(Z < -0.80) \approx 0.212 \)
Interpretation: Even though the true proportion is 52%, there's about a 21% chance that a random sample of 400 voters will show less than 50% support. This illustrates why close elections are so hard to predict!
Sylvia Says
Acorn for your thoughts? This example shows why sample size matters. If the pollster surveyed 1,600 voters instead of 400, the standard error would be half as large (0.0125 instead of 0.025), making it much less likely to get a misleading result.
Statistical Inference: Connecting Samples to Populations
Everything we've learned in this chapter builds toward statistical inference—the process of using sample data to make conclusions about population parameters.
There are two main types of statistical inference:
- Confidence Intervals: Estimating a parameter with a range of plausible values
- Hypothesis Testing: Evaluating claims about a parameter
Both rely critically on understanding sampling distributions! Here's why:
- When we construct a confidence interval, we need to know how much sample statistics typically vary (the standard error)
- When we perform a hypothesis test, we need to know how likely our observed result would be if the null hypothesis were true (requires knowing the sampling distribution)
The sampling distribution is the bridge between our single sample and the population we're trying to understand.
Diagram: Statistical Inference Workflow
Statistical Inference Workflow
Type: infographic
Learning Objective: Understand (L2) how sampling distributions enable statistical inference; explain the connection between sample statistics, sampling distributions, and population parameters
Bloom Taxonomy Level: Understand (L2) Bloom Taxonomy Verb: explain, summarize
Purpose: Provide a visual roadmap showing how sample data, sampling distributions, and population parameters connect in the inference process
Layout: Flowchart style with three main stages arranged vertically or horizontally
Components: 1. Population box (top or left): - Label: "Population" - Contains parameter symbols (p or mu) - Text: "Unknown - what we want to learn about" - Color: Green (Sylvia's cardigan color)
- Sample box (middle):
- Label: "Sample (n)"
- Shows data collection arrow from population
- Contains statistic symbols (p-hat or x-bar)
- Text: "Observed - what we can measure"
-
Color: Auburn (Sylvia's fur color)
-
Sampling Distribution box (central, larger):
- Shows normal curve
- Labels for mean and standard error
- Text: "Theoretical - describes how statistics vary"
- Connects sample to population via CLT
-
Color: Light green background
-
Inference arrows:
- Confidence Interval arrow: points from sample toward population
- Hypothesis Test arrow: compares sample to claimed population value
Interactive elements: - Hover over each component to reveal detailed explanation - Click on "Confidence Interval" to highlight that path - Click on "Hypothesis Test" to highlight that path - Tooltips explain each concept
Color scheme: Use Sylvia color palette (green, auburn, cream)
Implementation: HTML/CSS/JavaScript with SVG elements and interactive hover/click events
Sample Size Determination for Confidence Intervals
Looking ahead to confidence intervals (coming up in the next chapter!), the sample size directly affects how precise our estimates can be. The margin of error in a confidence interval depends on:
Since the standard error decreases as \( n \) increases, larger samples give smaller margins of error and more precise estimates.
Sample Size Formula for Proportions
If you want a specific margin of error \( E \) for a proportion at a given confidence level, you can solve for the required sample size:
If you don't know \( p \) in advance, use \( p = 0.5 \) (which maximizes the sample size, giving a conservative estimate).
Example: How many people should you survey to estimate a proportion within 3 percentage points (E = 0.03) with 95% confidence?
Using \( z^* = 1.96 \) for 95% confidence and \( p = 0.5 \):
You'd need about 1,067 people to achieve that precision.
| Desired Margin of Error | Sample Size Needed (95% confidence) |
|---|---|
| 5% (0.05) | 385 |
| 4% (0.04) | 601 |
| 3% (0.03) | 1,068 |
| 2% (0.02) | 2,401 |
| 1% (0.01) | 9,604 |
This table explains why national polls typically survey around 1,000-1,500 people—that's enough for a margin of error around 3%, which is acceptable for most purposes.
Cost-Benefit of Larger Samples
Notice the diminishing returns: going from 3% to 2% margin of error requires more than doubling your sample size (from ~1,000 to ~2,400). At some point, the extra precision isn't worth the extra cost!
Putting It All Together: A Complete Example
Let's work through a comprehensive example that uses everything from this chapter.
Scenario: A factory produces light bulbs with a mean lifetime of 1,200 hours and standard deviation of 100 hours. Quality control randomly selects 64 bulbs to test.
Question 1: What are the mean and standard deviation of the sampling distribution of the sample mean lifetime?
Question 2: Can we assume the sampling distribution is approximately normal? Why?
Yes! By the Central Limit Theorem, with \( n = 64 \), the sampling distribution of \( \bar{x} \) will be approximately normal regardless of the population shape. Even if individual bulb lifetimes aren't normally distributed, the mean of 64 bulbs will be.
Question 3: What's the probability that the sample mean is between 1,190 and 1,210 hours?
First, calculate z-scores:
Using normal probability:
There's about a 58% chance the sample mean falls within 10 hours of the true mean.
Question 4: What's the probability that the sample mean is less than 1,175 hours?
Only about a 2.3% chance! If the sample mean actually came out this low, quality control might suspect a problem with the production process.
Diagram: Light Bulb Sampling Distribution Calculator
Sampling Distribution Calculator
Type: microsim
Learning Objective: Apply (L3) sampling distribution concepts to calculate probabilities; practice using z-scores with sampling distributions
Bloom Taxonomy Level: Apply (L3) Bloom Taxonomy Verb: calculate, solve, apply
Purpose: Interactive calculator for finding probabilities involving sample means or proportions, with visual feedback
Canvas layout: - Top section (30%): Input parameters - Middle section (50%): Normal curve with shaded region - Bottom section (20%): Results and calculations
Visual elements: - Normal distribution curve representing the sampling distribution - Shaded region showing the probability being calculated - Labels for mean and standard error on the curve - Step-by-step calculation display
Interactive controls: - Radio buttons: "Sample Mean" or "Sample Proportion" - For Mean: Inputs for population mean (mu), population SD (sigma), sample size (n) - For Proportion: Inputs for population proportion (p), sample size (n) - Dropdown: Probability type (less than, greater than, between) - Input fields: Cutoff value(s) - Button: "Calculate"
Default parameters: - Mode: Sample Mean - mu = 1200 - sigma = 100 - n = 64 - Probability: less than 1175
Behavior: - Display calculation steps: "Step 1: Find standard error", "Step 2: Calculate z-score", "Step 3: Find probability" - Show the formula with values substituted - Animate the shaded region on the curve - Display final probability prominently
Instructional Rationale: This apply-level tool provides scaffolded practice for probability calculations. Showing each step explicitly helps students learn the process while getting immediate feedback on their understanding.
Implementation: p5.js with responsive canvas sizing
Common Misconceptions to Avoid
Before we wrap up, let's address some common mistakes students make with sampling distributions. Even experienced squirrels slip up sometimes!
Misconception 1: "A larger sample means a larger standard deviation."
Reality: It's the opposite! Larger samples give smaller standard errors because there's less variability in the sampling distribution.
Misconception 2: "The Central Limit Theorem says the population becomes normal."
Reality: The CLT says the sampling distribution of the mean becomes normal. The population's shape doesn't change at all.
Misconception 3: "We need the population to be normal for inference to work."
Reality: Thanks to the CLT, we only need a large enough sample. The population can be any shape.
Misconception 4: "The standard deviation and standard error are the same thing."
Reality: The standard deviation measures spread of individual values. The standard error measures spread of sample statistics. They're related but different:
Misconception 5: "A sample proportion of 0.60 means the population proportion is 0.60."
Reality: The sample proportion is an estimate of the population proportion. Due to sampling variability, it may not equal the true value exactly.
Summary: Time to Squirrel Away This Knowledge!
Congratulations! You've just learned some of the most foundational concepts in statistical inference. Let's recap the big ideas:
Key Concepts:
- Sampling variability is the natural variation in statistics from sample to sample
- A sampling distribution shows the pattern of how a statistic varies across all possible samples
- The sample proportion \( \hat{p} \) and sample mean \( \bar{x} \) are unbiased estimators of their population parameters
- The standard error measures variability in the sampling distribution and decreases as sample size increases
- The Central Limit Theorem states that sampling distributions of means become approximately normal for large samples, regardless of population shape
- For proportions, the normal approximation requires the large counts condition (\( np \geq 10 \) and \( n(1-p) \geq 10 \))
- Sampling distributions are the foundation for statistical inference—confidence intervals and hypothesis tests
Key Formulas:
| Statistic | Mean of Sampling Distribution | Standard Error |
|---|---|---|
| \( \hat{p} \) | \( \mu_{\hat{p}} = p \) | \( \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \) |
| \( \bar{x} \) | \( \mu_{\bar{x}} = \mu \) | \( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \) |
The Big Picture:
Sampling distributions connect what we observe (sample statistics) to what we want to know (population parameters). The CLT guarantees that with large enough samples, we can use normal probability calculations to quantify our uncertainty and make inferences about populations.
You've Got This!
Look at you go! You just learned something that took statisticians centuries to figure out. The Central Limit Theorem is one of the most beautiful results in mathematics, and now you understand why it's so powerful. My tail's puffing up with pride!
Practice Problems
Problem 1: Checking Conditions
A researcher wants to estimate the proportion of households in a city that have solar panels. She plans to survey 150 randomly selected households.
a) If the true proportion is 0.08, can she use the normal approximation for the sampling distribution of \( \hat{p} \)?
b) What if the true proportion were 0.15?
Click to reveal answer
a) Check: \( np = 150(0.08) = 12 \geq 10 \) ✓, but \( n(1-p) = 150(0.92) = 138 \geq 10 \) ✓. Yes, conditions are met!
b) \( np = 150(0.15) = 22.5 \geq 10 \) ✓ and \( n(1-p) = 150(0.85) = 127.5 \geq 10 \) ✓. Yes, conditions are also met.
Problem 2: Sample Mean Probabilities
The weights of apples from an orchard have a mean of 150 grams and standard deviation of 25 grams. A sample of 100 apples is selected.
a) What are the mean and standard error of the sampling distribution of \( \bar{x} \)?
b) What is the probability that the sample mean weight is greater than 155 grams?
Click to reveal answer
a) Mean: \( \mu_{\bar{x}} = 150 \) grams; Standard error: \( \sigma_{\bar{x}} = \frac{25}{\sqrt{100}} = 2.5 \) grams
b) \( z = \frac{155 - 150}{2.5} = 2.0 \); \( P(Z > 2.0) = 1 - 0.977 = 0.023 \), or about 2.3%
Problem 3: Sample Size Calculation
A political campaign wants to estimate support for their candidate within 2 percentage points (margin of error) with 95% confidence. How many voters should they survey? (Use \( p = 0.5 \) for planning.)
Click to reveal answer
Using the formula: \( n = \left(\frac{1.96}{0.02}\right)^2 \times 0.5(0.5) = (98)^2 \times 0.25 = 9604 \times 0.25 = 2401 \)
They need to survey about 2,401 voters.
Problem 4: Understanding the CLT
Explain why a marketing researcher can use normal probability calculations to find the probability that the sample mean income of 50 randomly selected customers exceeds $60,000, even though income distributions are typically right-skewed.
Click to reveal answer
By the Central Limit Theorem, the sampling distribution of the sample mean becomes approximately normal for sufficiently large samples (typically n ≥ 30 for moderately skewed populations), regardless of the shape of the population distribution. Since n = 50, the sampling distribution of the sample mean income will be approximately normal, even though individual incomes are right-skewed. This allows us to use z-scores and the normal distribution to calculate probabilities.
Looking Ahead
In the next chapter, we'll put sampling distributions to work! We'll learn how to construct confidence intervals—ranges of plausible values for population parameters based on sample data. Everything we learned about standard errors and the normal approximation will be essential.
Until then, keep collecting those data points, and remember—every sample tells a story about the population it came from!