Understanding Hypothesis Testing — Test Statistic, Null/Alternative, p-value, Significance Level, Critical Value

While studying statistics, the hypothesis-testing terms (null/alternative hypothesis, test statistic, p-value, significance level, critical value) felt disconnected, so this post ties them all together through a single example. I drew the figures myself.

Goals of this post

  1. Understand intuitively what hypothesis testing is
  2. Grasp why the test statistic was introduced and what it means
  3. The precise definitions of the null and alternative hypotheses
  4. The precise concepts of p-value, significance level, and critical value
  5. How these terms are mapped and used in hypothesis testing

What is hypothesis testing?

Make some claim about a population, and observe a sample to judge whether the claim is right or wrong.

Literally, you "test" a "hypothesis." Let's show it with an intuitive figure.

Hypothesis testing concept — draw a sample from the population to judge a hypothesis Figure 1. Draw 30 people from a population (mean $\mu$) → judge the "average age is 70" hypothesis with the sample mean $\bar{X}=68$

Body — A Mean-Test Example

The basic story

Suppose a group's average age is about 70. Since we can't survey the whole population, we draw a sample of 30 and compute the sample mean, getting 68.

(Q1) If the sample doesn't match the hypothesis, is the hypothesis wrong?

No. Because variability (variance) exists — repeating the same procedure yields different results — a sample mean of 68 doesn't let us conclude "the population mean is not 70."

(Q2) So how do we test the hypothesis with a sample? — the test statistic

Test statistic — a numeric indicator computed from sample data (like the sample mean) to test a hypothesis about the population.

The test statistic for a mean hypothesis is defined as:

$$t = \dfrac{\bar{X} - \mu_0}{\,s / \sqrt{n}\,}$$

  • The numerator is the sample mean minus the hypothesized mean → 0 if there's no difference from the hypothesized mean, and farther from 0 as the difference grows.
  • That is, it measures the distance from the hypothesized mean as a number.

(Q3) So where do uncertainty/variability (variance) come in? — the distribution

The test statistic follows the t-distribution (Student's distribution).

The t-distribution (Student's distribution) curve Figure 3. The t-distribution that the test statistic follows

We judge using this distribution's properties.

  • Near the center (high probability) : the difference between sample mean and hypothesized mean is small → supports the hypothesis
  • At both tails (low probability) : the difference is large → contradicts the hypothesis

The meaning of hypothesis testing

Where each sample's test statistic falls on the distribution Figure 4. Accept/reject depending on where each sample's test statistic falls on the distribution

A sample found in the extreme probability region can almost never come from that population, so when such a sample is observed we reject the hypothesis. Conversely, when found in the ordinary probability range, we accept it.

Definitions of Hypothesis-Testing Terms

One figure contains all the terms. Using it as the reference, let's define them one by one.

A combined visualization of hypothesis-testing terms — test statistic, rejection/acceptance regions, critical value, significance level Figure 5. A combined visualization of hypothesis-testing terms

1. Null hypothesis ($H_0$)

The hypothesis set about the population. e.g., "The average age is 70."

2. Significance level ($\alpha$)

The probability that serves as the criterion for accepting or rejecting a hypothesis.

  • If the test statistic is found in the significance-level region, the hypothesis is rejected.
  • It varies by field — natural sciences usually 0.01, humanities/social sciences 0.05.
  • At 0.05, reject in the extreme 5% region at both tails, accept in the central 95% range.

3. Rejection (Critical) Region

The probability region (range) where the hypothesis is rejected.

4. Acceptance Region

The probability region (range) where the hypothesis is accepted.

5. Critical Value

The test-statistic ($t$) value at the boundary of the significance-level region.

We judge accept/reject by whether the sample's test statistic exceeds the critical value.

  • If the test statistic is in the rejection region → reject
  • If the test statistic is in the acceptance region → accept

6. p-value

Under the assumption that the null hypothesis is true, the probability of observing a statistic as extreme as or more extreme than the one actually observed in the sample.

Rejection when the p-value is smaller than the significance level Figure 6. p-value < significance level → rejection region → reject the hypothesis

Acceptance when the p-value is larger than the significance level Figure 7. p-value > significance level → acceptance region → accept the hypothesis

  • p-value < significance level → in the rejection region → reject
  • p-value > significance level → in the acceptance region → accept

7. Alternative hypothesis ($H_1$)

The hypothesis accepted by way of rejecting the null hypothesis.

Two-sided — the mean is not 70:

$$H_0:\ \mu = 70 \quad\Longleftrightarrow\quad H_1:\ \mu \neq 70$$

One-sided — the mean is greater / less than 70:

$$H_1:\ \mu > 70 \quad\text{or}\quad \mu < 70$$

  • Two-sided: "the mean is not 70"
  • One-sided: "the mean is greater than 70" or "less than 70"

The form of the alternative hypothesis determines a two-sided / one-sided test, and changes how the p-value is computed.

Error Types

Error Situation Probability
Type I $H_0$ is actually true but rejected $\alpha$ (usually 5%)
Type II $H_1$ is actually true but $H_0$ is accepted $\beta$

Reference — the Actual Student's Distribution

The figures above are simplified for intuition; the actual t-distribution (by degrees of freedom) looks like this.

Actual Student's t-distribution curves by degrees of freedom (source: JMP/SAS docs)
Figure 8. The actual Student's t-distribution by degrees of freedom (source: JMP SAS docs)

Summary

Summary of the full hypothesis-testing process Figure 9. Hypothesis-testing summary

The hypothesis-testing process

  1. Set a hypothesis (null hypothesis) about the population
  2. Define a test statistic to evaluate the hypothesis and identify its distribution
  3. Draw a sample and compute the test statistic
  4. Accept if it falls in the ordinary 95% range of the distribution; reject if it falls in the extreme 5% region

Two ways to decide accept/reject

  1. Test-statistic method — compare against the critical value
  2. p-value method — compare against the significance level

Caveats

  • Always consider the possibility of Type I and Type II errors
  • Interpretation differs for two-sided vs one-sided tests

📦 Migrated from my own Korean blog (my own writing). Original: taehyuklee.tistory.com/15

Share𝕏f

Comments