Numpy random normal test shows unexpected results with low p-value

Question

I am performing some tests on the normality of the numpy random generator. Running the following code, the stats.normaltest shows some seeds with low pvalue (which highlights non-normal distribution).

import numpy as np
from scipy import stats

for i in range(100):
    rng = np.random.default_rng(i)
    x = rng.standard_normal(100000)
    if stats.normaltest(x).pvalue < 0.05:
        print(i)

How can this be?

Silikazi · Accepted Answer · 2025-10-07 21:38:18Z

3

You run 100 tests at 5 % significance even with perfect normal data, 5 will fail by chance. With n = 100 000, the normality test is hypersensitive and will flag tiny random deviations. If you just want to stop seeing spurious fails lower your sample size (like n=1000 instead of 100000).

answered Oct 7 at 21:38

Silikazi

4953 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Salvatore Daniele Bianco · Accepted Answer · 2025-10-09 10:08:50Z

I guess that you question hilights a statistical doubt rather than a numpy issue.

As @Shadrack Sylvestar Mbwagha mentioned, in statistics there's something called Multiple comparisons problem: in practice, under the null hypothesys (in your case under the hypothesis that the data you're sampling actually follows a gaussian distribution) the p-value follows an uniform distribution in [0, 1]. This means that if you choose alpha=0.05, you'will have a 5% probability of rejecting the null hypothesis even when it is true.

If you repeat your experiment with a consistent number of replicas you can easily appreciate this:

import numpy as np
from scipy import stats
from matplotlib import pyplot as plt

p_vals = []
for i in range(10_000):
    rng = np.random.default_rng(i)
    x = rng.standard_normal(100000)
    p_vals.append(stats.normaltest(x).pvalue)
p_vals = np.array(p_vals)

plt.hist(p_vals, bins=np.linspace(0,1,21))
plt.xlabel("p-value")
plt.ylabel("frequency")
plt.show()

Interestingly you see that the p-values follow a [0, 1] uniform distribution and (p_vals<0.05).mean() (i.e. the proportion of significant p-values) is actually very close to 0.05

For this reason, in multiple testing you want to adjust your p-values to correct the FDR (false discovery rate). If you correct your p-value with the Benjamini-Hochberg method you will indeed have the following output:

q_vals = stats.false_discovery_control(p_vals, method="bh")
plt.hist(q_vals, bins=np.linspace(0,1,21))
plt.xlabel("adjusted p-value")
plt.ylabel("frequency")

This hilights that, once you've adjusted your p-values for your multiple tests, the distribution you're sempling from is definitely a Normal distribution since you can't reject the null hypothesis.

Let me know if you've other doubts.

Collectives™ on Stack Overflow

Numpy random normal test shows unexpected results with low p-value

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related