Testing Means — the t-test (One / Two: Student's·Welch's / Paired)

Table of Contents

  1. The rationale for testing means (why is it needed? feat. uncertainty/variability)
  2. The definition and understanding of mean testing
  3. Assumptions (normality, equal variance)
  4. Types and worked examples

Intro — Starting from a Question

If we compare two groups' means and the numbers differ, are they really different? Couldn't the sample means differ merely due to variability (uncertainty)? You must grasp this question to understand mean testing, and hypothesis testing more broadly.

Two independent groups (same mean and variance) Figure 1. Two independent groups (same mean and variance)

Suppose, to compare the effects of drugs #1 and #2, we administered each to a group of the same constitution. In reality, the two drugs' mean effects and variances are equal. But sampling and observing, drug #1's mean came out at ① and #2's at ②. Can we conclude the effects differ just because the sample means differ? No. This is a case where the effects are equal but variability made the means come out differently.

Significance level 5% — adding a drug #3 example Figure 2. Significance level 5%: adding a drug #3 example

So the heart of mean testing is judging whether a given mean is an ordinary (95%) result from mere variability, or an extreme (5%) result beyond variability. Figure 2 adds the X3 distribution. Intuitively, X1 and X2 have the same mean while X3 differs.

$$\bar{X}_1 \sim (\mu_1,\ \mathrm{std}_1) \qquad \bar{X}_2 \sim (\mu_2,\ \mathrm{std}_2)$$

$$\Downarrow$$

$$\bar{X}_1 - \bar{X}_2 \sim (0,\ \mathrm{std}_{\text{combined}})$$

Since it's hard to lay out all three distributions and judge whether two group means are equal, comparing means via a test statistic is what "mean testing" is.

(Aside) "What if a sample from distribution #3 lands within the 95% of #1·#2?" Conversely, "what if #1·#2 are actually equal but a sample happens to be found in the 5%?" — if these questions occurred to you, you're ready to understand Type I / Type II errors. (Not this post's topic, so omitted.)

Body

1. What is mean testing?

A technique to judge whether two groups' means are statistically significantly equal or different.

2. Assumptions

An assumption means the theory holds on top of it. Which technique you use depends on whether they're met.

2.1 Normality assumption

The t-test is performed under the assumption that the population follows a normal distribution. This matters especially when the sample is small.

As degrees of freedom (df) grow, the t-distribution approaches the Z distribution Reference — as the sample size (df) grows, the t-distribution approaches the Z distribution (at df=30 it's nearly Z)

  • If the sample is large enough ($n>30$), the Central Limit Theorem makes the sample mean approach normal, so you can use the t-test even if the population isn't perfectly normal.

Q. If the CLT makes the sample mean normal, why use the t-distribution instead of the Z distribution?
A. As degrees of freedom grow, t→Z anyway, and using the Z distribution requires knowing the population standard deviation, which we don't.

If data are few and not normal, use a nonparametric test (Mann-Whitney U, Wilcoxon signed-rank, etc.).

2.2 Equal-variance (homoscedasticity) assumption

This applies only to the independent two-sample t-test, since it compares the mean difference of two mutually independent groups. (One-sample compares against a reference value, so no equal-variance check; paired uses two measurements of the same group, so it's unnecessary.)

Q. What if you run a two-sample t-test knowing the variances differ?
A. e.g., if a new drug's treatment group has very large variance, then even if the mean difference is significant, it means the effect varies greatly between individual patients — it may not work for some or even worsen them, lowering reliability and consistency.

3. Types

3.1 One-Sample t-test

  • Purpose: whether one sample's mean differs significantly from a specific reference value
  • Example: whether blood pressure after taking a new drug differs from a reference (120)
Patient Blood pressure after dosing (mmHg)
1 118
2 121
3 119
4 117
5 120

$$t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}}$$

The farther the sample mean's difference from the reference mean is from 0, the more different; the closer to 0, the more equal. Dividing by the standard deviation standardizes it. This statistic follows the t-distribution.

$$t = \dfrac{\bar{x} - \mu_0}{s / \sqrt{n}} = \dfrac{119 - 120}{1.58 / \sqrt{5}} = -1.41$$ One-Sample t-test t-value result

One-Sample hypothesis test (t-distribution) Figure 3. One-Sample t-test hypothesis test (t-distribution)

Compute the p-value on a t-distribution with df 4. Since we judge equal-or-different, it's a two-sided test (significance level 0.025 each side). The p-value of the region at or below t-value −1.41 is 0.2313 > 0.025 → cannot reject the null = no significant difference (accept).

3.2 Independent two-sample t-test

  • Purpose: whether two independent groups' means differ significantly
  • Example: comparing mean blood pressure of medicated vs unmedicated groups
Patient Group Blood pressure after dosing
1 Medicated 115
2 Medicated 118
3 Medicated 116
4 Unmedicated 122
5 Unmedicated 124

It splits into equal-variance and unequal-variance cases.

a) Equal variance met — Student's t-test (Pooled)

$$t = \dfrac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\dfrac{1}{n_1} + \dfrac{1}{n_2}}}, \qquad s_p = \sqrt{\dfrac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}}$$

$$t = \dfrac{\bar{X}_1 - \bar{X}_2}{\sqrt{s_p^2\left(\dfrac{1}{n_1} + \dfrac{1}{n_2}\right)}} = \dfrac{116.23 - 123}{\sqrt{2.223\left(\dfrac{1}{3} + \dfrac{1}{2}\right)}} \approx -4.91$$ Student's t-test t-value result

Student's t-test hypothesis test (t-distribution) Figure 4. Student's t-test hypothesis test (t-distribution)

The p-value of the region at or below t-value −4.91 is 0.0162 < 0.025 → reject the null, accept the alternative (the two group means differ).

b) Equal variance not met — Welch's t-test

$$t = \dfrac{\bar{x}_1 - \bar{x}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}$$

$$t = \dfrac{\bar{X}_1 - \bar{X}_2}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}} = \dfrac{116.33 - 123}{\sqrt{\dfrac{2.33385}{3} + \dfrac{2}{2}}} \approx \dfrac{-6.67}{1.3334} \approx -5.00$$ Welch's t-test t-value result

Welch's t-test hypothesis test Figure 5. Welch's t-test

The p-value of the region at or below t-value −6.47 is 0.0137 < 0.025 → reject the null, accept the alternative.

3.3 Paired-Sample t-test

  • Purpose: comparing the mean difference of two paired measurements from the same group
  • Example: blood pressure change before/after dosing (before/after a process, before/after a trial, before/after PT, etc.)
Patient Before After
1 130 120
2 128 119
3 135 125
4 132 123
5 129 121

$$t = \dfrac{\bar{d}}{s_d / \sqrt{n}}$$

$$\begin{aligned} D_i &= [\,130-120,\ 128-119,\ 135-125,\ 132-123,\ 129-121\,] = [\,10,\ 9,\ 10,\ 9,\ 8\,] \\[6pt] \bar{D} &= \dfrac{10 + 9 + 10 + 9 + 8}{5} = 9.2 \\[6pt] s_D &= \sqrt{\dfrac{(10-9.2)^2 + (9-9.2)^2 + (10-9.2)^2 + (9-9.2)^2 + (8-9.2)^2}{5 - 1}} \approx 0.84 \\[6pt] t &= \dfrac{\bar{D}}{s_D / \sqrt{n}} = \dfrac{9.2}{0.84 / \sqrt{5}} \approx 24.5 \end{aligned}$$ Paired-Sample t-test t-value process and result

Compute the mean and variance from the paired differences ($D_i$).

Paired t-test hypothesis test Figure 6. Paired t-test

The p-value of the region at or above t-value 24.50 is ≈ 0 < 0.025 → reject the null, accept the alternative (the before/after means differ).

Wrap-up

Mean testing is a technique to judge whether a difference in two group means is due to mere variability or a real difference. We looked at the assumptions (normality, equal variance) and per-type calculations with examples.


References

  1. Basic Statistics (Introduction) — Jayu Academy (textbook)
  2. Introduction to Data Analysis — course material
  3. Wikipedia — Welch's t-test / Student's t-test

📦 Migrated from my own Korean blog (my own writing). Original: taehyuklee.tistory.com/24

Share𝕏f

Comments