Implementing the t-test (t test) [One, Two, Paired Samples t-test: Python feat. SciPy & Statsmodels]

2024.11.05 ·#statistics #t-test #Python #SciPy #statsmodels #mean-test

Statistics libraries in the Python ecosystem
The difference between SciPy and Statsmodels
How to do mean tests (t-test) in SciPy (feat. implementing the theory-post examples as-is) - One sample t-test - Two samples t-test (Student's & Welch's t-test) - Paired samples t-test

In this post I'll perform mean tests directly in Python. If you need to understand the theory of mean tests, please read the previous post first.

Testing Means: the t-test [One, Two (Student's & Welch's), Paired Samples t-test]

Is a difference in two group means a real difference or just variability — the rationale for the t-test, its assumptions (normality, equal variance), and worked calculations for each type.

taystudios.com/blog

1. Statistics libraries in the Python ecosystem

In the Python ecosystem, the representative libraries for statistical analysis are SciPy and statsmodels, and in machine learning scikit-learn is widely used. This time I plan to mainly use SciPy as the statistics library, but I'll mention the characteristics of statsmodels as well.

2. The difference between SciPy and Statsmodels

# Python Package Manager
pip install scipy
pip install statsmodels

Having used both libraries, I can summarize as follows.

SciPy : optimized for quickly producing p-values and basic statistics; it's often used for simple statistical computations when processing large data or when a fast judgment is needed.

Since the t-test results don't differ much, I'll show the output difference using linear regression as an example.

# SciPy output
Slope: -1.7
Intercept: 86.16666666666667
R-squared: 0.12690281030444972
P-value: 0.19248793653349897

Statsmodels : provides various features needed for diagnosing and evaluating statistical models. For example, after a linear regression it provides detailed statistics including each variable's test statistic (t-value), the model's test statistic (F-value), and model performance metrics (the coefficient of determination, R-squared). Through these various indicators, it can be used to interpret and validate statistical models.

Running a linear regression with Statsmodels gives the following output.

# Statsmodels output
                            OLS Regression Results
==============================================================================
Dep. Variable:                 scores   R-squared:                       0.127
Model:                            OLS   Adj. R-squared:                  0.060
Method:                 Least Squares   F-statistic:                     1.890
Date:                Mon, 04 Nov 2024   Prob (F-statistic):              0.192
Time:                        00:20:44   Log-Likelihood:                -40.667
No. Observations:                  15   AIC:                             85.33
Df Residuals:                      13   BIC:                             86.75
Df Model:                           1
Covariance Type:            nonrobust
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         86.1667      1.597     53.969      0.000      82.717      89.616
group_code    -1.7000      1.237     -1.375      0.192      -4.372       0.972
==============================================================================
Omnibus:                        1.370   Durbin-Watson:                   0.739
Prob(Omnibus):                  0.504   Jarque-Bera (JB):                0.911
Skew:                          -0.272   Prob(JB):                        0.634
Kurtosis:                       1.922   Cond. No.                         2.92
==============================================================================

To summarize: if you need a fast judgment and want to quickly produce p-values and test statistics, using SciPy seems suitable. On the other hand, if you need more precise validation and model diagnostics, I think using Statsmodels is the way to go.

3. Mean tests (t-test) using SciPy

To keep consistency in the examples, I'll bring over the examples from the t-test theory post exactly. Please check whether the test statistic (t-value) and p-value I computed by hand earlier match.

3.1 One sample t-test

[Table1. Patient number & blood pressure measured after drug administration]

Patient	Blood pressure after dosing (mm Hg)
Patient 1	118
Patient 2	121
Patient 3	119
Patient 4	117
Patient 5	120

Test whether this group's mean blood pressure is significantly different from the baseline (120 mm Hg).

# Use the ttest_1samp method in scipy.stats
from scipy import stats
import numpy as np

x1 = np.array([118, 121, 119, 117, 120])
mu0 = 120.0  # the mean value to compare against
result = stats.ttest_1samp(x1, popmean=mu0, alternative=alternative, axis=0)

# Result (t test statistic, p-value)
# TtestResult(statistic=np.float64(-1.414213562373095), pvalue=np.float64(0.23019964108049873), df=np.int64(4))

For the one-sample t-test, it's in scipy.stats. The method name is ttest_1samp. (per the official docs)

Positional argument

First argument (x1): you put the sample you want to test here. A Python list is fine, and slicing from Pandas is fine too. Internally it seems to convert to numpy. (per the official docs) Preferably, pass numpy.

Keyword arguments (can also be used positionally if you follow the official-docs order)

popmean = {popmean arg} — I arbitrarily named it 'mu0', but here you put the mean value you expect. In the example above I put '120'.

alternative = {alternative arg : str type}

"two-sided" : two-sided test : null hypothesis : sample mean != ref_avg (reference mean) (default)
"less" : one-tailed (left) : null hypothesis : sample mean < ref_avg (reference mean)
"greater" : one-tailed (right) : null hypothesis : sample mean > ref_avg (reference mean)

axis = {axis arg : int type}

0 : if set to 0, it reads by columns (default)
1 : if set to 1, it reads by rows (used for a 2D matrix)
None : raveled (flattened) — flattens 2D into 1D before the test.

from scipy import stats
import numpy as np

# Example of results depending on the axis argument
arr = np.array([[1.5, 2.3, 3.7],
                [4.1, 5.2, 6.8],
                [7.4, 8.5, 9.1],
                [10.2, 11.4, 12.9],
                [13.3, 14.1, 15.6]])

mu0 = 5

'''
For reference, the example above just printed the result object,
but you can unpack and return it split into the test statistic and p-value.
'''
t_stat, p_value = stats.ttest_1samp(arr, popmean=mu0, axis=0)
t_stat, p_value = stats.ttest_1samp(arr, popmean=mu0, axis=1)
t_stat_axis1, p_value_axis1 = stats.ttest_1samp(arr, popmean=mu, axis=None)

# result when axis = 0 (you can confirm 3 values come out, by column)
T-statistic: [1.09461774 1.56522962 2.1804585 ]
P-value: [0.33517446 0.19258282 0.09469596]

# result when axis = 1 (you can confirm 5 values come out, by row)
T-statistic: [-3.88856886  0.46776758  6.6964953   8.3223972  13.84510894]
P-value: [0.06022121 0.68597053 0.02158075 0.01413253 0.00517637]

# result when axis = None (everything is made 1D, so one value comes out)
T-statistic: 2.947475125290377
P-value: 0.010598724800782978

I checked how the result changes depending on the axis option in the code snippet above.

Besides that, there's nan_policy and the like for how to handle NaN values, but this part isn't really related to statistics, so I'll skip it.

3.2 Two samples t-test

[Table2. Patient number, group classification & blood pressure measured by dosing status]

Patient	Group	Blood pressure after dosing (mm Hg)
Patient 1	Dosed	115
Patient 2	Dosed	118
Patient 3	Dosed	116
Patient 4	Not dosed	122
Patient 5	Not dosed	124

Test whether the mean blood pressure of the dosed group and the not-dosed group is significantly different. (The two groups' sample sizes don't have to be equal.)

3.2.1 Student's t-test

from scipy import stats
import numpy as np

x1 = np.array([115, 118, 116])  # dosed group
x2 = np.array([122, 124])       # not-dosed group

t_stat, p_val = stats.ttest_ind(x1, x2, equal_var=True, alternative=alternative)

# Result - Student's t-test, when equal_var = True
# T-statistic: -4.898979485566359
# P-value: 0.016276603459428517

For the independent-samples t-test, it's in scipy.stats; the method name is ttest_ind. (official docs)

Positional arguments

First argument (x1) / second argument (x2) : these correspond to the first sample and the second sample, and you can think of them as corresponding to the numerator term of the test statistic, E[X1] - E[X2].

Keyword arguments (can also be used positionally if you follow the official-docs order)

eqaul_var = {equal_var arg : bool type}

"True" : the equal-variance assumption for the X1, X2 sample groups is satisfied.
"False" : the equal-variance assumption for the X1, X2 sample groups is not satisfied. (automatically goes to Welch's t-test)

alternative = {alternative arg : str type}

"two-sided" : two-sided test : null hypothesis : sample 1 mean != sample 2 mean (this arg is the default)
"less" : one-tailed (left) : null hypothesis : sample 1 mean < sample 2 mean
"greater" : one-tailed (right) : null hypothesis : sample 1 mean > sample 2 mean

axis = {axis arg : int type}

0 : if set to 0, it reads by columns (default)
1 : if set to 1, it reads by rows (used for a 2D matrix)
None : converts 2D to 1D before the test. (omitted since done above)

data = np.array([[115, 118, 116],
                 [122, 124, 123]])

data1 = np.array([[231, 123, 132],
                  [223, 321, 421]])

# 1. axis=0 (by column)
t_stat_axis0, p_val_axis0 = stats.ttest_ind(data, data1, axis=0, equal_var=True)

# 2. axis=1 (by row)
t_stat_axis1, p_val_axis1 = stats.ttest_ind(data, data1, axis=1, equal_var=True)

# results by axis
# axis=0
T-statistic: [-20.41364284  -1.01973393  -1.08618662]
P-value: [0.00239111 0.41512875 0.39087777]

# axis=1
T-statistic: [-1.31950545 -3.47552886]
P-value: [0.25745718 0.02545524]

3.2.2 Welch's t-test

from scipy import stats
import numpy as np

x1 = np.array([115, 118, 116])  # dosed group
x2 = np.array([122, 124])       # not-dosed group

t_stat, p_val = stats.ttest_ind(x1, x2, equal_var=False, alternative=alternative)

# Result - Welch's t-test, when equal_var = False
# T-statistic: -5.0000000000000036
# P-value: 0.025054956171761823

The method is the same as Student's t-test, but by setting the equal_var option to False you can use this test technique. (when the equal-variance assumption isn't satisfied)

The remaining arguments are the same as Student's t-test, so refer to the above.

3.3 Paired Samples t-test

[Table3. Patient number & blood pressure before/after drug administration]

Patient	Before dosing (mm Hg)	After dosing (mm Hg)
Patient 1	130	120
Patient 2	128	119
Patient 3	135	125
Patient 4	132	123
Patient 5	129	121

Test whether the mean blood-pressure difference before and after dosing significantly decreased.

In the theory post we checked whether they're significantly the same, but this time let's test whether it significantly decreased.

# blood-pressure data before dosing (data)
data = np.array([130, 128, 135, 132, 129])

# blood-pressure data after dosing (data1)
data1 = np.array([120, 119, 125, 123, 121])

# perform the paired-samples t-test
t_stat, p_val = stats.ttest_rel(data, data1, alternative='less')  # 'less' tests whether it decreased

# Result
# T-statistic: 24.58803425594304
# P-value: 0.9999918819424217

For the paired-samples t-test, it's in scipy.stats; the method name is ttest_rel. (official docs)

Taking the significance level as 0.05, since it's already 0.9999, we accept the null hypothesis. That is, we can see that the blood pressure significantly decreased due to the drug effect. The remaining arguments like axis and alternative were explained above, so I'll skip them.

⚠️ However, for the Paired Samples t-test, the number of pairs on both sides must match. If they don't match, a ValueError occurs. (ValueError: unequal length arrays)

Conclusion

This time I looked at how to implement the mean tests we explored earlier in theory, using the SciPy library. Comparing the t-statistics and p-values I computed by hand in the theory part with the values implemented in Python, I could see that they matched. I hope you'll both compute these by hand and implement them yourself.

References

SciPy official docs
Statsmodels official docs
Introduction to Data Analysis — school course (major course)

📦 Migrated from the Tistory blog I used to run. Original: taehyuklee.tistory.com/26

Implementing the t-test (t test) [One, Two, Paired Samples t-test: Python feat. SciPy & Statsmodels]

Table of contents

1. Statistics libraries in the Python ecosystem

2. The difference between SciPy and Statsmodels

3. Mean tests (t-test) using SciPy

3.1 One sample t-test

3.2 Two samples t-test

3.2.1 Student's t-test

3.2.2 Welch's t-test

3.3 Paired Samples t-test

Conclusion

References

Comments

Table of contents

1. Statistics libraries in the Python ecosystem

2. The difference between SciPy and Statsmodels

3. Mean tests (t-test) using SciPy

3.1 One sample t-test

3.2 Two samples t-test

3.2.1 Student's t-test

3.2.2 Welch's t-test

3.3 Paired Samples t-test

Conclusion

References

Related posts

Comments