0

Here’s the head of my dataframe:


There are 100 different loggers and 10 different years. I want to subset the table by logger and find the Pearson correlation values for year by avg_max_temp, avg_min_temp, and tot_precipitation for each logger. Because there are 100 loggers, I’d expect the resulting dataframe to have 100 rows of 3 output columns as well as a column for the logger ID..

Here’s how I would do this analysis for all the data combined:

# Create a new dataframe with the correlation values
corr_df = pd.DataFrame(df.corr(method='pearson'))

corr_df.drop(['year', 'yield'], axis=1, inplace=True)
corr_df.drop(['avg_max_temp', 'avg_min_temp', 'tot_precipitation','yield'], axis=0, inplace=True)
# Print the dataframe
corr_df.head()

However, I can’t figure out how to do this for each of the 100 dataloggers. Any help would be hugely appreciated. Thanks in advance!

1 Answer 1

1

You can loop through a groupby object to iterate through each portion of the df with a unique logger, and extract the Pearson correlation coefficients for each group, concatenating them together into your final corr_df DataFrame.

corr_df = pd.DataFrame()

for group, df_group in df.groupby('logger'):
    # Create a new dataframe with the correlation values
    group_corr_df = pd.DataFrame(df_group.corr(method='pearson'))

    group_corr_df.drop(['year', 'yield'], axis=1, inplace=True)
    group_corr_df.drop(['avg_max_temp', 'avg_min_temp', 'tot_precipitation','yield'], axis=0, inplace=True)
    group_corr_df['logger'] = group
    corr_df = pd.concat([corr_df, group_corr_df])
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks so much for the reply here. This is almost there. It gives me a df with the three output columns. Do you know if there is a way to also include the logger's id as a fourth column? Or in other words, the string saying what group each trio of data corresponds with?
@Trev we can add a new column called logger to group_corr_df just before we concat
Thank you @Derek O, that's a huge help. Is there any way you could explain how the loop works? What are group and df_group referring to in the original groupby dataframe?
@Trev sure, i'm happy to explain a bit further. you are looping through portions of the dataframe where logger = {each unique logger value}. so if the first logger value is '011072.txt', then on the first iteration of the loop, group = '011072.txt' and df_group = df[df['logger'] == '011072.txt']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.