Testing Means — the t-test (One / Two: Student's·Welch's / Paired)
Table of Contents
- The rationale for testing means (why is it needed? feat. uncertainty/variability)
- The definition and understanding of mean testing
- Assumptions (normality, equal variance)
- Types and worked examples
Intro — Starting from a Question
If we compare two groups' means and the numbers differ, are they really different? Couldn't the sample means differ merely due to variability (uncertainty)? You must grasp this question to understand mean testing, and hypothesis testing more broadly.
Figure 1. Two independent groups (same mean and variance)
Suppose, to compare the effects of drugs #1 and #2, we administered each to a group of the same constitution. In reality, the two drugs' mean effects and variances are equal. But sampling and observing, drug #1's mean came out at ① and #2's at ②. Can we conclude the effects differ just because the sample means differ? No. This is a case where the effects are equal but variability made the means come out differently.
Figure 2. Significance level 5%: adding a drug #3 example
So the heart of mean testing is judging whether a given mean is an ordinary (95%) result from mere variability, or an extreme (5%) result beyond variability. Figure 2 adds the X3 distribution. Intuitively, X1 and X2 have the same mean while X3 differs.
$$\bar{X}_1 \sim (\mu_1,\ \mathrm{std}_1) \qquad \bar{X}_2 \sim (\mu_2,\ \mathrm{std}_2)$$
$$\Downarrow$$
$$\bar{X}_1 - \bar{X}_2 \sim (0,\ \mathrm{std}_{\text{combined}})$$
Since it's hard to lay out all three distributions and judge whether two group means are equal, comparing means via a test statistic is what "mean testing" is.
(Aside) "What if a sample from distribution #3 lands within the 95% of #1·#2?" Conversely, "what if #1·#2 are actually equal but a sample happens to be found in the 5%?" — if these questions occurred to you, you're ready to understand Type I / Type II errors. (Not this post's topic, so omitted.)
Body
1. What is mean testing?
A technique to judge whether two groups' means are statistically significantly equal or different.
2. Assumptions
An assumption means the theory holds on top of it. Which technique you use depends on whether they're met.
2.1 Normality assumption
The t-test is performed under the assumption that the population follows a normal distribution. This matters especially when the sample is small.
Reference — as the sample size (df) grows, the t-distribution approaches the Z distribution (at df=30 it's nearly Z)
- If the sample is large enough ($n>30$), the Central Limit Theorem makes the sample mean approach normal, so you can use the t-test even if the population isn't perfectly normal.
Q. If the CLT makes the sample mean normal, why use the t-distribution instead of the Z distribution?
A. As degrees of freedom grow, t→Z anyway, and using the Z distribution requires knowing the population standard deviation, which we don't.
If data are few and not normal, use a nonparametric test (Mann-Whitney U, Wilcoxon signed-rank, etc.).
2.2 Equal-variance (homoscedasticity) assumption
This applies only to the independent two-sample t-test, since it compares the mean difference of two mutually independent groups. (One-sample compares against a reference value, so no equal-variance check; paired uses two measurements of the same group, so it's unnecessary.)
Q. What if you run a two-sample t-test knowing the variances differ?
A. e.g., if a new drug's treatment group has very large variance, then even if the mean difference is significant, it means the effect varies greatly between individual patients — it may not work for some or even worsen them, lowering reliability and consistency.
3. Types
3.1 One-Sample t-test
- Purpose: whether one sample's mean differs significantly from a specific reference value
- Example: whether blood pressure after taking a new drug differs from a reference (120)
| Patient | Blood pressure after dosing (mmHg) |
|---|---|
| 1 | 118 |
| 2 | 121 |
| 3 | 119 |
| 4 | 117 |
| 5 | 120 |
$$t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}$$
The farther the sample mean's difference from the reference mean is from 0, the more different; the closer to 0, the more equal. Dividing by the standard deviation standardizes it. This statistic follows the t-distribution.
$$t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}} = \dfrac{119 - 120}{1.58 / \sqrt{5}} = -1.41$$ One-Sample t-test t-value result
Figure 3. One-Sample t-test hypothesis test (t-distribution)
Compute the p-value on a t-distribution with df 4. Since we judge equal-or-different, it's a two-sided test (significance level 0.025 each side). The p-value of the region at or below t-value −1.41 is 0.2313 > 0.025 → cannot reject the null = no significant difference (accept).
3.2 Independent two-sample t-test
- Purpose: whether two independent groups' means differ significantly
- Example: comparing mean blood pressure of medicated vs unmedicated groups
| Patient | Group | Blood pressure after dosing |
|---|---|---|
| 1 | Medicated | 115 |
| 2 | Medicated | 118 |
| 3 | Medicated | 116 |
| 4 | Unmedicated | 122 |
| 5 | Unmedicated | 124 |
It splits into equal-variance and unequal-variance cases.
a) Equal variance met — Student's t-test (Pooled)
$$t = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\dfrac{1}{n_1} + \dfrac{1}{n_2}}}, \qquad s_p = \sqrt{\dfrac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$$
$$t = \dfrac{\bar{X}_1 - \bar{X}_2}{\sqrt{s_p^2\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}} = \dfrac{116.23 - 123}{\sqrt{2.223\left(\dfrac{1}{3} + \dfrac{1}{2}\right)}} \approx -4.91$$ Student's t-test t-value result
Figure 4. Student's t-test hypothesis test (t-distribution)
The p-value of the region at or below t-value −4.91 is 0.0162 < 0.025 → reject the null, accept the alternative (the two group means differ).
b) Equal variance not met — Welch's t-test
$$t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}$$
$$t = \dfrac{\bar{X}_1 - \bar{X}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} = \dfrac{116.33 - 123}{\sqrt{\dfrac{2.33385}{3} + \dfrac{2}{2}}} \approx \dfrac{-6.67}{1.3334} \approx -5.00$$ Welch's t-test t-value result
Figure 5. Welch's t-test
The p-value of the region at or below t-value −6.47 is 0.0137 < 0.025 → reject the null, accept the alternative.
3.3 Paired-Sample t-test
- Purpose: comparing the mean difference of two paired measurements from the same group
- Example: blood pressure change before/after dosing (before/after a process, before/after a trial, before/after PT, etc.)
| Patient | Before | After |
|---|---|---|
| 1 | 130 | 120 |
| 2 | 128 | 119 |
| 3 | 135 | 125 |
| 4 | 132 | 123 |
| 5 | 129 | 121 |
$$t = \dfrac{\bar{d}}{s_d / \sqrt{n}}$$
$$\begin{aligned} D_i &= [\,130-120,\ 128-119,\ 135-125,\ 132-123,\ 129-121\,] = [\,10,\ 9,\ 10,\ 9,\ 8\,] \\[6pt] \bar{D} &= \dfrac{10 + 9 + 10 + 9 + 8}{5} = 9.2 \\[6pt] s_D &= \sqrt{\dfrac{(10-9.2)^2 + (9-9.2)^2 + (10-9.2)^2 + (9-9.2)^2 + (8-9.2)^2}{5 - 1}} \approx 0.84 \\[6pt] t &= \dfrac{\bar{D}}{s_D / \sqrt{n}} = \dfrac{9.2}{0.84 / \sqrt{5}} \approx 24.5 \end{aligned}$$ Paired-Sample t-test t-value process and result
Compute the mean and variance from the paired differences ($D_i$).
Figure 6. Paired t-test
The p-value of the region at or above t-value 24.50 is ≈ 0 < 0.025 → reject the null, accept the alternative (the before/after means differ).
Wrap-up
Mean testing is a technique to judge whether a difference in two group means is due to mere variability or a real difference. We looked at the assumptions (normality, equal variance) and per-type calculations with examples.
References
- Basic Statistics (Introduction) — Jayu Academy (textbook)
- Introduction to Data Analysis — course material
- Wikipedia — Welch's t-test / Student's t-test
📦 Migrated from my own Korean blog (my own writing). Original: taehyuklee.tistory.com/24
Comments