What Are Degrees of Freedom — Why We Divide by n-1 to Estimate Variance

Let's organize population-variance estimation and degrees of freedom in parametric statistics. What you'll understand after reading:

  1. The concept of degrees of freedom
  2. Why the population-variance estimator divides by n-1 - Underestimation view: to correct bias - Degrees-of-freedom view: the essential meaning of variance

Body

$$s^2 = \dfrac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2$$ Eq. 1. Sample variance estimate

$$\sigma^2 = \dfrac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2$$ Eq. 2. Population variance

Variance/standard deviation is intuitively interpreted as follows.

  • 0) To express the degree of uncertainty — quantifying how the same trial yields different results due to uncertainty
  • 1) The average distance of each data point from the mean — expressing how far the data lie from the mean, as distance
  • 2) Precision/explanatory power regarding the sample mean

Before the intuition that variance is an "average of distances," remember it's a basic statistic for quantifying uncertainty. For detailed background, see the related post "Uncertainty, Variability & Variance."

1. Why divide by n-1?

Deviation/variance is physically a distance concept, yet the sample-variance estimate divides by $n-1$, not $n$. Why?

Answer) If you only wanted the distance of the sample's own data from its mean, dividing by $n$ would be right. But this formula is an estimator that estimates the population from a sample. What matters is how well it estimates the parameter, not the sample's own statistic.

2. What's wrong with dividing by n?

Answer) It produces a biased estimate — specifically an underestimate. That is, it estimates smaller than the true population variance.

3. Why does underestimation happen?

Sampling from a population — denser intervals are reflected more Figure 1. Sampling from a population

Since the high-density "Interval 1" region in the population is sampled more often, the sampled distribution reflects the dense region more and becomes denser than the population. As a result, computing variance by dividing the sample directly by $n$ leads to underestimation. So we divide by $n-1$ to correct it.

As $n$ grows infinitely, the sample approaches the population and $n-1$ ≈ $n$, so the difference is negligible; but when $n$ is small, the effect of $-1$ is larger. In other words, the correction matters for small samples.

4. Is it mathematically proven?

$$\begin{aligned} &E\big[(X_1-\bar{X})^2 + (X_2-\bar{X})^2 + (X_3-\bar{X})^2 + \cdots + (X_n-\bar{X})^2\big] \\[4pt] &= E\big[X_1^2 + X_2^2 + X_3^2 + \cdots + X_n^2 - 2(X_1 + X_2 + X_3 + \cdots + X_n)\bar{X} + n\bar{X}^2\big] \\[4pt] &= E[X_1^2] + E[X_2^2] + E[X_3^2] + \cdots + E[X_n^2] - 2n\bar{X} + n\bar{X} \\[4pt] &= E[X_1^2] + E[X_2^2] + E[X_3^2] + \cdots + E[X_n^2] - n\bar{X} \\[4pt] &= n(\mu^2 + \sigma^2) - n\!\left(\mu^2 + \dfrac{\sigma^2}{n}\right) = n\sigma^2 - \sigma^2 = \sigma^2(n-1) \\[6pt] &\Rightarrow\ \sigma^2(n-1) = E\big[(X_1-\bar{X})^2 + (X_2-\bar{X})^2 + (X_3-\bar{X})^2 + \cdots + (X_n-\bar{X})^2\big] \\[6pt] &\Rightarrow\ \sigma^2 = E\!\left[\dfrac{(X_1-\bar{X})^2 + (X_2-\bar{X})^2 + (X_3-\bar{X})^2 + \cdots + (X_n-\bar{X})^2}{n-1}\right] \end{aligned}$$ Figure 2. Deriving that the population-variance estimator's denominator is $n-1$ (final eqn)

Helper equations (1)·(2):

$$\bar{X} = \dfrac{X_1 + X_2 + \cdots + X_n}{n} \;\Rightarrow\; n\bar{X} = X_1 + X_2 + \cdots + X_n \quad (1)$$

$$\mathrm{VAR}(X) = \sigma^2 = E[(X-\mu)^2] = E(X^2) - 2\mu E(X) + \mu^2 = E(X^2) - \mu^2 \;\Rightarrow\; E(X^2) = \mu^2 + \sigma^2 \quad (2)$$

So the $n-1$ correction is proven not just intuitively but mathematically.

5. What are degrees of freedom?

"The number of data points that can actually vary independently when computing some estimate."

Suppose we draw 5 samples from a population and the sample mean is 20.

  • $X_1$–$X_4$ : sampled freely (independently) from the population.
  • $X_5$ : automatically fixed by the others because of the constraint that the sample mean is 20.

E.g., if $X_1=25, X_2=10, X_3=15, X_4=40$, then to hit a mean of 20, $X_5=10$ is fixed. So the last one cannot be determined freely.

Generalization: of $n$ values, $n-1$ are drawn freely and the last is determined by the mean constraint. The number that can vary independently, $n-1$, is the degrees of freedom.

(Aside) In mechanical engineering too, motion along the x/y/z axes is DOF=3, and constraining the x-axis makes it DOF=2. The essence of degrees of freedom is the number of elements that can change.

6. What do degrees of freedom mean in variance? (the essence)

Answer) If the sample mean is taken as already fixed, only $n-1$ points have variability. Only $n-1$ can be randomly drawn from the population, and the remaining one is fixed and doesn't follow the population distribution.

n-1 points with variability Figure 4. The n-1 points with variability

So dividing the sum of $(\text{data} - \text{sample mean})^2$ — which expresses the variability of all the data — by the number of data points with variability, $n-1$ — yields the representative value, the variance. This is the essential way to see variance from an uncertainty perspective, prior to the "average distance" view.

The population already computes (not estimates) its parameter as an exact value, so there's no probability. The sample, by contrast, still has probability, so degrees of freedom exist.


References

  1. 12 Math — YouTube
  2. Introduction to Statistics — Jayu Academy (textbook)
  3. Course material — Introduction to Data Analysis

📦 Migrated from my own Korean blog (my own writing). Original: taehyuklee.tistory.com/14

Share𝕏f

Comments