I have two arrays - array1 and array2. How confidently can I state that they are generated from uncorrelated distributions? Just checking the correlation values would not suffice; I would require p-values as well.
Thought of using the following method -
import numpy as np
from scipy.stats import pearsonr
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([5, 4, 3, 2, 1])
# Calculate the Pearson correlation coefficient and p-value
res = pearsonr(array1, array2, alternative='two-sided')
correlation_coefficient, p_value = res.statistic, res.pvalue
# Print the results
print(f"Pearson Correlation Coefficient: {correlation_coefficient}")
print(f"P-value: {p_value}")
# Define the confidence level (e.g., 95%)
confidence_level = 0.95
# Make a conclusion based on the p-value
if p_value < (1 - confidence_level):
print(f"I am {confidence_level * 100}% confident that these two arrays are correlated.")
The output -
Pearson Correlation Coefficient: -1.0
P-value: 0.0
I am 95.0% confident that these two arrays are correlated.
However one problem with this approach is found in the docstring statement -
This function also performs a test of the null hypothesis that the distributions underlying the samples are uncorrelated and normally distributed.
I just want to test the hypothesis that the distributions underlying the samples are uncorrelated without caring about the nature of the distributions - normal or anything else.
Is there any alternative that solves this issue?
Two tentative solutions
Simulate the distribution of uncorrelated arrays
I can generate two random arrays (of same length as array1) and calculate their correlation. By repeating this many times I can get a distribution of the correlation values amongst uncorrelated arrays, without any reference to the underlying distribution type. I can use this distribution to get the p-value of the actual observed correlation between array1 and array2.
Of course the distribution followed by numpy random generators might have an impact. Besides, I want to avoid so much coding if a solution is already present.
Can I use method parameter of scipy.stats.pearsonr?
Am not sure what method parameter of pearsonr does. Particularly when method is set to PermutationMethod. Is that somehow a solution to my query?
MonteCarloMethodformethod,pearsonrcan do something similar to what you said in your first tentative solution, but yes, the distribution might matter. UsingPermutationMethodis similar, but rather than random samples from a distribution of your choice, it uses random permutations of one of your arrays. (Under the null hypothesis that there is no correlation, all would have been equally likely.) More about the theory is in docs.scipy.org/doc/scipy/tutorial/stats/resampling.html, but if you'd like, I can write an answer with some code.