Understanding Hypothesis Testing: Test Statistic, Null Hypothesis, Alternative Hypothesis, p-value, Significance Level, Critical Value — Concepts (the essence)

2026.05.27 ·#statistics #hypothesis-testing #p-value #test-statistic

Goal of this post

You can intuitively understand what hypothesis testing is.
You can understand why the test statistic was introduced, and what it is.
You can know the exact definitions of the null hypothesis and the alternative hypothesis.
You can understand the exact concepts of p-value, significance level, and critical value.
Finally, you can see how these terms are mapped and used in hypothesis testing.

What is hypothesis testing?

In statistics, hypothesis testing is a method of setting up some claim (hypothesis) about a population (the whole group), observing a sample (a subset), and judging whether that claim is right or wrong.

Main body (Story: a one-sample t-test example for the mean)

* If you just want the summarized conclusion, you can read only the last section, the conclusion. *

Let's first show, with an intuitive figure, what the hypothesis testing defined above is.

Figure 1. A figure to explain the concept of hypothesis testing

Figure explanation

Let me give a concrete story. Suppose I guess that the average age of some group looks to be about 70. But there are too many people in that group to survey them all. To check whether my claim (hypothesis) is true, I drew a sample of 30 people and computed the sample mean, which is the estimator of the population mean. As a result, the sample mean came out to 68.

(Q1) If the sample data don't match the hypothesis, is the hypothesis wrong?

Just because the sample mean came out to 68, can we really say the above hypothesis is wrong? The population mean might actually be 70.

No. Because there exists uncertainty or variability (as a statistic, variance) that can make the result come out differently even when you run the same procedure, you cannot conclude that the population mean is not 70 just because the sample mean came out to 68.

(Q2) How will you verify your claim with a sample? (How will you do the hypothesis test?)

The significance of the test statistic

If the sample mean could be different or could be the same, you might wonder how a hypothesis test can be done with the sample mean. What is introduced here is judgment via the distribution of the test statistic.

📌 What is a test statistic? A numerical indicator (statistic) computed to verify a hypothesis about the population using sample data such as the sample mean.

Let's define a test statistic appropriate for the above hypothesis about the mean.

$$ t = \frac{\bar{X} - \mu_0}{\,s/\sqrt{n}\,} $$ Formula 1. The test statistic for a hypothesis about the mean

By putting the term (sample mean minus the hypothesized mean) in the numerator, the closer to no difference from the hypothesized mean the closer to 0, and the larger the difference the farther from 0. In other words, the test statistic used in a mean test is defined as the distance from the hypothesized mean; it numerically measures how different the sample mean is from the hypothesized mean and plays an important role in evaluating whether this difference is statistically significant.

*What "statistically significant" means is organized once more later.

(Q3) Where is uncertainty/variability (variance) used?

The distribution of the test statistic and the significance of hypothesis testing

Above, you said that because uncertainty or variability (as a statistic, variance) exists, we can't hastily conclude even if the sample mean differs from the hypothesized mean. But the test statistic itself is just a number quantifying the distance or degree of similarity between the sample mean and the hypothesized mean — the concept of variance doesn't enter anywhere. How can a hypothesis test be done with the test statistic alone?

💡 Answer summary

Here the core of hypothesis testing is revealed. The test statistic not only quantifies the distance from the hypothesized mean — to evaluate the significance of this distance, we use the distribution of the test statistic to judge a statistically significant difference. (I expect this won't quite land; a concrete example follows below.)

Explanation of the summary (important)

Let me give a concrete example. To evaluate the hypothesis about the mean, we defined a test statistic like Formula (2). Also, mathematicians/statisticians have already proven that this test statistic follows the t-distribution (Student's distribution).

Figure 3. The t-distribution (Student's distribution)

Let me explain again, mapping the statement "the test statistic follows the t-distribution above" onto the figure.

Figure 4. The meaning of hypothesis testing

Explanation of the meaning of hypothesis testing (explaining Figure 4)

Suppose I drew 4 samples from the population. (Sample 1 through Sample 4 in the figure above.)

If we compute the test statistics of those samples, they become t1 ~ t4 in the figure. Because the test statistic t follows the Student's t-distribution, the drawn samples will, with high probability, be located near 0, and with very low probability they may exist at the two ends (area1, area2 in the figure). Therefore Sample 3 and Sample 4 are samples drawn by high probability, while Sample 1 and Sample 2 are samples drawn despite low probability.

Interpretation point (the point of understanding)

The test statistic above represents the difference between the hypothesized population mean and the sample mean. We can see that t3 (Sample 3) and t4 (Sample 4) are samples drawn from the ordinary distribution (hypothesized population mean − sample mean ≈ near 0). Through this, Samples 3 and 4 can be interpreted as having a difference between the population mean and sample mean close to 0. On the other hand, t1 (Sample 1) and t2 (Sample 2) are samples drawn through extreme probability, so they have a large value of difference from the hypothesized population mean (hypothesized population mean − sample mean ≈ far from 0), so the distance from the hypothesized population mean grows.

Summarizing this: t3 and t4 tend to support the hypothesis since their difference from the population mean isn't large, whereas t1 and t2 can be interpreted as samples against the hypothesis since their difference from the population mean is large.

Samples found by extreme probability (t1, t2) have almost no chance of occurring in that population, so when such samples appear, we can judge that the hypothesis is wrong. We express this as rejecting the hypothesis. Conversely, when found on the ordinary probability distribution (t2, t3), we can judge the hypothesis to be correct. We express this as accepting the hypothesis.

Definitions of hypothesis-testing terms

Now we can finally define the terms — null hypothesis, alternative hypothesis, acceptance region, rejection region, significance level, critical value, p-value — and organize hypothesis testing.

Figure 5. Explanation and visualization of hypothesis-testing terms

1. Null hypothesis (H0 : Null Hypothesis)

The hypothesis we set up about the population (the whole group) (in the example, 'the average age will be 70').

2. Significance level (α : Significance Level)

The probability that becomes the criterion for accepting or rejecting some hypothesis. (In the example, the probability region area1 + area2 = the region where the hypothesis is rejected.) = When the sample's test statistic is found in the significance-level probability region, the hypothesis is rejected.

In probability theory, we don't accept a hypothesis because it's 100% correct or reject it because it's 100% wrong. We accept and reject the hypothesis based on whether the probability of occurrence is high or low. So we must set the probability that becomes the criterion for rejection and acceptance.

The probability that becomes the criterion for rejecting and accepting a hypothesis differs by field. For example, in the natural sciences a significance level of 0.01 is sometimes used, and in the humanities 0.05 is sometimes used. For example, if we set the significance level to 0.05, then in the figure the blue region (area1 + area2) is 0.05. Then naturally the remaining region is 1 − α = 0.95. (Since the total area above is 1.)

To summarize: if the sample's test statistic (about the hypothesis) falls within the ordinary 95%, the hypothesis is accepted; if it's found in the extreme 5% probability region, it's judged not ordinary, and the hypothesis is rejected.

3. Rejection region (Critical Region)

In the test statistic's distribution, the probability region (range) where the hypothesis is rejected is called the rejection region.

4. Acceptance region (Acceptance Region)

In the test statistic's distribution, the probability region (range) where the hypothesis is accepted is called the rejection region.

5. Critical value (Critical Value)

The test-statistic (t) value that becomes the boundary of the significance-level region is called the critical value. Afterward, depending on whether the sample I drew exceeds that critical value or not, we can judge whether to accept the hypothesis.

In the example, let's judge the 4 samples using the test statistics t1 through t4.

Sample 1: t1 exists in the rejection region. It's outside the critical-value boundary. A case that occurs with 5% probability or less happened — reject.
Sample 2: t2 exists in the acceptance region. It's inside the critical-value boundary. A case that occurs with 95% probability happened — accept.
Sample 3: t3 exists in the acceptance region. It's inside the critical-value boundary. A case that occurs with 95% probability happened — accept.
Sample 4: t4 exists in the rejection region. It's outside the critical-value boundary. A case that occurs with 5% probability or less happened — reject.

6. p-value (Probability Value)

The p-value means the probability that, under the premise that the null hypothesis is true, a statistic equal to or more extreme than the one actually observed in the sample is observed.

(For reference, the reason we say "under the premise that the null hypothesis is true" is that we defined the test statistic under the null hypothesis and do the hypothesis test under its distribution.)

Figure 6. Figure explaining p-value (rejection)

Let me explain p-value based on the actually observed statistics t1 and t2.

The probability that a statistic equal to or more extreme than the case where t1 occurred is found means region ① (the red region). The size of this region is called the p-value (probability) for Sample 1 (t1).
The probability that a statistic equal to or more extreme than the case where t2 occurred is found means region ② (the red region). The size of this region is called the p-value (probability) for Sample 2 (t2).

Accepting/rejecting the hypothesis via p-value

Judging acceptance/rejection with the p-value is possible by comparing the p-value with the significance level.

For example, suppose t1's is 0.013 and t2's is 0.018. t1's p-value is smaller than 0.05 (significance level) / 2 = 0.025, so it will exist in the rejection region, so we can reject the hypothesis. t2 can likewise be rejected by the same logic.

Figure 7. Figure explaining p-value (acceptance)

Likewise, let me explain p-value based on the actually observed statistics t3 and t4.

The probability that a statistic equal to or more extreme than the case where t3 occurred is found means region ③ (the red region). The size of this region is called the p-value (probability) for Sample 3 (t3).
The probability that a statistic equal to or more extreme than the case where t4 occurred is found means region ④ (the red region). The size of this region is called the p-value (probability) for Sample 4 (t4).

Accepting/rejecting the hypothesis via p-value

For example, suppose t3's is 0.2 and t4's is 0.25. t3's p-value is larger than 0.05 (significance level) / 2 = 0.025, so it will exist in the acceptance region, and we can accept the hypothesis. t4 can likewise be accepted by the same logic.

7. Alternative hypothesis (H1 : Alternative Hypothesis)

This hypothesis cannot be tested directly like the null hypothesis; it is a hypothesis accepted by directly rejecting the null hypothesis.

So far we've focused on whether to accept or reject the null hypothesis. Thinking further, a question arises: if the null hypothesis is rejected, what is the conclusion?

$$ H_1 : \mu \neq 70 $$ Formula 2. The alternative hypothesis of the example

The example handled above supported the alternative hypothesis (claim) that the mean is not 70, by rejecting the null hypothesis.

$$ H_1 : \mu > 70 \qquad \text{or} \qquad H_1 : \mu < 70 $$ Formula 3. An alternative hypothesis outside the example

However, the alternative hypothesis can be of the form 'the mean is not 70,' but it can also be set concretely, like 'the mean is greater than 70' or 'the mean is less than 70.' Accordingly, the method of testing and the interpretation of the test statistic change, and depending on whether it's a two-sided test or a one-tailed test, the way the p-value is computed also changes.

I'll post separately about two-sided and one-tailed tests next time. In this post, I want to emphasize that the alternative hypothesis — which can be disproven by rejecting the null hypothesis — must always be considered together as a pair.

* Reference (the actual Student's distribution)

Figure 8. The t-distribution, Student's distribution (source: JMP SAS docs)

The t-distribution looks like Figure 2, and we can see that the higher the degrees of freedom and the larger the data sample, the more it follows the standard normal distribution. In fact, the t-distribution looks almost the same as the standard normal distribution; the only difference between the two is whether the standardizing factor (the denominator) is the population variance or the sample variance.

(Q4) Are there cases of wrongly accepting or wrongly rejecting? (Type I error, Type II error)

Above: if the sample's test statistic (about the hypothesis) falls within the ordinary 95%, the hypothesis is accepted; if it's found in the extreme 5% probability region, it's judged not ordinary, and the hypothesis is rejected.

We can explain the core of hypothesis testing this way, and then it's natural and obvious to wonder: 'Could there be a case where the hypothesis is actually correct, yet the sample happens to fall in the extreme 5% probability region, so the hypothesis is rejected?'

Yes. There is a probability that the hypothesis is wrongly rejected as above. Such a case is called a Type I Error. A Type I error means the error of rejecting the null hypothesis even though the null hypothesis is actually true. Generally the probability of this error is set as the significance level (α), and in many cases it's set to 5%.

Conversely, there are also cases of accepting the null hypothesis even though the alternative hypothesis is actually true. This is called a Type II Error, and the probability of this error is denoted β. As a result, in statistical hypothesis testing, we must always keep in mind the possibilities of these Type I and Type II errors.

Conclusion (summary)

The main body's explanation was long. Distilling everything I understand into writing wasn't easy. As a conclusion, let me organize hypothesis testing in one go.

Figure 9. Hypothesis-testing summary

What is hypothesis testing?

Hypothesis testing is the process of setting up a hypothesis about the population, drawing a sample, and observing it to judge whether that hypothesis is correct.

The significance of the test statistic

First, we set up a hypothesis, which is called the null hypothesis. Then, to evaluate the null hypothesis, we define a test statistic that can represent it, and draw the distribution of that test statistic.

The significance of hypothesis testing via the test-statistic distribution

Then, we compute the test statistic from the sample. If this sample's test statistic falls within the ordinary 95% range inside the distribution, we accept the hypothesis; if it falls in the extreme 5% probability region, we judge it not ordinary and reject the hypothesis.

Here, the extreme-probability region such as 5% is called the significance level (α, significance level), and its boundary value is called the critical value. Also, there exists a concept called the p-value. This means the probability that a particular sample is observed, or that a more extreme value than it appears.

[Two indicators for rejecting or accepting the null hypothesis]

• Judgment via the test statistic: if the sample's test statistic is located outside the critical value, reject the null hypothesis.

• Judgment via the p-value: if the p-value is smaller than the significance level (α), reject the null hypothesis.

What is the alternative hypothesis?

The hypothesis that is disproven and accepted when the null hypothesis is rejected is called the alternative hypothesis. (The interpretation of the test statistic changes depending on the alternative hypothesis: two-sided test, one-tailed test.)

Type I error and Type II error

There will be cases where the null hypothesis is actually true, yet the sample happens to fall in the extreme 5% probability region so the null hypothesis is rejected — such an error is called a Type I error. Conversely, the case of accepting the null hypothesis even though the alternative hypothesis is true is called a Type II error.

Thus, hypothesis testing is the process of evaluating the validity of the null hypothesis based on sample data, and through it we reach a conclusion about a particular hypothesis regarding the population.

References

JMP official docs (t-distribution, etc.)
MiniTab official docs
Wikipedia (definition reference)
Introduction to Statistics — Jayu Academy (textbook)
School course materials — Introduction to Data Analysis (major course)
My own thoughts and notes (feat. GPT)

📦 Migrated from the Tistory blog I used to run. Original: taehyuklee.tistory.com/15

Goal of this post

What is hypothesis testing?

Main body (Story: a one-sample t-test example for the mean)

(Q1) If the sample data don't match the hypothesis, is the hypothesis wrong?

(Q2) How will you verify your claim with a sample? (How will you do the hypothesis test?)

(Q3) Where is uncertainty/variability (variance) used?

Definitions of hypothesis-testing terms

* Reference (the actual Student's distribution)

(Q4) Are there cases of wrongly accepting or wrongly rejecting? (Type I error, Type II error)

Conclusion (summary)

References

Related posts

Comments