I guess that you question hilights a statistical doubt rather than a numpy issue.
As @Shadrack Sylvestar Mbwagha mentioned, in statistics there's something called Multiple comparisons problem: in practice, under the null hypothesys (in your case under the hypothesis that the data you're sampling actually follows a gaussian distribution) the p-value follows an uniform distribution in [0, 1]. This means that if you choose alpha=0.05, you'will have a 5% probability of rejecting the null hypothesis even when it is true.
If you repeat your experiment with a consistent number of replicas you can easily appreciate this:
import numpy as np
from scipy import stats
from matplotlib import pyplot as plt
p_vals = []
for i in range(10_000):
rng = np.random.default_rng(i)
x = rng.standard_normal(100000)
p_vals.append(stats.normaltest(x).pvalue)
p_vals = np.array(p_vals)
plt.hist(p_vals, bins=np.linspace(0,1,21))
plt.xlabel("p-value")
plt.ylabel("frequency")
plt.show()

Interestingly you see that the p-values follow a [0, 1] uniform distribution and (p_vals<0.05).mean() (i.e. the proportion of significant p-values) is actually very close to 0.05
For this reason, in multiple testing you want to adjust your p-values to correct the FDR (false discovery rate). If you correct your p-value with the Benjamini-Hochberg method you will indeed have the following output:
q_vals = stats.false_discovery_control(p_vals, method="bh")
plt.hist(q_vals, bins=np.linspace(0,1,21))
plt.xlabel("adjusted p-value")
plt.ylabel("frequency")

This hilights that, once you've adjusted your p-values for your multiple tests, the distribution you're sempling from is definitely a Normal distribution since you can't reject the null hypothesis.
Let me know if you've other doubts.