Understanding Hypothesis Testing — Test Statistic, Null/Alternative, p-value, Significance Level, Critical Value
While studying statistics, the hypothesis-testing terms (null/alternative hypothesis, test statistic, p-value, significance level, critical value) felt disconnected, so this post ties them all together through a single example. I drew the figures myself.
Goals of this post
- Understand intuitively what hypothesis testing is
- Grasp why the test statistic was introduced and what it means
- The precise definitions of the null and alternative hypotheses
- The precise concepts of p-value, significance level, and critical value
- How these terms are mapped and used in hypothesis testing
What is hypothesis testing?
Make some claim about a population, and observe a sample to judge whether the claim is right or wrong.
Literally, you "test" a "hypothesis." Let's show it with an intuitive figure.
Figure 1. Draw 30 people from a population (mean $\mu$) → judge the "average age is 70" hypothesis with the sample mean $\bar{X}=68$
Body — A Mean-Test Example
The basic story
Suppose a group's average age is about 70. Since we can't survey the whole population, we draw a sample of 30 and compute the sample mean, getting 68.
(Q1) If the sample doesn't match the hypothesis, is the hypothesis wrong?
No. Because variability (variance) exists — repeating the same procedure yields different results — a sample mean of 68 doesn't let us conclude "the population mean is not 70."
(Q2) So how do we test the hypothesis with a sample? — the test statistic
Test statistic — a numeric indicator computed from sample data (like the sample mean) to test a hypothesis about the population.
The test statistic for a mean hypothesis is defined as:
$$t = \dfrac{\bar{X} - \mu_0}{\,s / \sqrt{n}\,}$$
- The numerator is the sample mean minus the hypothesized mean → 0 if there's no difference from the hypothesized mean, and farther from 0 as the difference grows.
- That is, it measures the distance from the hypothesized mean as a number.
(Q3) So where do uncertainty/variability (variance) come in? — the distribution
The test statistic follows the t-distribution (Student's distribution).
Figure 3. The t-distribution that the test statistic follows
We judge using this distribution's properties.
- Near the center (high probability) : the difference between sample mean and hypothesized mean is small → supports the hypothesis
- At both tails (low probability) : the difference is large → contradicts the hypothesis
The meaning of hypothesis testing
Figure 4. Accept/reject depending on where each sample's test statistic falls on the distribution
A sample found in the extreme probability region can almost never come from that population, so when such a sample is observed we reject the hypothesis. Conversely, when found in the ordinary probability range, we accept it.
Definitions of Hypothesis-Testing Terms
One figure contains all the terms. Using it as the reference, let's define them one by one.
Figure 5. A combined visualization of hypothesis-testing terms
1. Null hypothesis ($H_0$)
The hypothesis set about the population. e.g., "The average age is 70."
2. Significance level ($\alpha$)
The probability that serves as the criterion for accepting or rejecting a hypothesis.
- If the test statistic is found in the significance-level region, the hypothesis is rejected.
- It varies by field — natural sciences usually 0.01, humanities/social sciences 0.05.
- At 0.05, reject in the extreme 5% region at both tails, accept in the central 95% range.
3. Rejection (Critical) Region
The probability region (range) where the hypothesis is rejected.
4. Acceptance Region
The probability region (range) where the hypothesis is accepted.
5. Critical Value
The test-statistic ($t$) value at the boundary of the significance-level region.
We judge accept/reject by whether the sample's test statistic exceeds the critical value.
- If the test statistic is in the rejection region → reject
- If the test statistic is in the acceptance region → accept
6. p-value
Under the assumption that the null hypothesis is true, the probability of observing a statistic as extreme as or more extreme than the one actually observed in the sample.
Figure 6. p-value < significance level → rejection region → reject the hypothesis
Figure 7. p-value > significance level → acceptance region → accept the hypothesis
- p-value < significance level → in the rejection region → reject
- p-value > significance level → in the acceptance region → accept
7. Alternative hypothesis ($H_1$)
The hypothesis accepted by way of rejecting the null hypothesis.
Two-sided — the mean is not 70:
$$H_0:\ \mu = 70 \quad\Longleftrightarrow\quad H_1:\ \mu \neq 70$$
One-sided — the mean is greater / less than 70:
$$H_1:\ \mu > 70 \quad\text{or}\quad \mu < 70$$
- Two-sided: "the mean is not 70"
- One-sided: "the mean is greater than 70" or "less than 70"
The form of the alternative hypothesis determines a two-sided / one-sided test, and changes how the p-value is computed.
Error Types
| Error | Situation | Probability |
|---|---|---|
| Type I | $H_0$ is actually true but rejected | $\alpha$ (usually 5%) |
| Type II | $H_1$ is actually true but $H_0$ is accepted | $\beta$ |
Reference — the Actual Student's Distribution
The figures above are simplified for intuition; the actual t-distribution (by degrees of freedom) looks like this.
Summary
Figure 9. Hypothesis-testing summary
The hypothesis-testing process
- Set a hypothesis (null hypothesis) about the population
- Define a test statistic to evaluate the hypothesis and identify its distribution
- Draw a sample and compute the test statistic
- Accept if it falls in the ordinary 95% range of the distribution; reject if it falls in the extreme 5% region
Two ways to decide accept/reject
- Test-statistic method — compare against the critical value
- p-value method — compare against the significance level
Caveats
- Always consider the possibility of Type I and Type II errors
- Interpretation differs for two-sided vs one-sided tests
📦 Migrated from my own Korean blog (my own writing). Original: taehyuklee.tistory.com/15
Comments