1

I have 4 data frames (pandas) that are similar in the structure to this one:

index   day1    day2   day3   day4   day5 ....
0        1.23   5.41    0      0      2.31
1        2.31   7.15    0      0      1.32 
...

I want to calculate for each row the mean, std, kurtosis, and skewness, and add it as new columns to another existing data frame.Right now I do it using for loop, changing the names of the columns by count number of for loop and adding the number as a string to the columns name, so I don't run over the results of the previous for loop. This looks like this:

phen_1=rain_calc.iloc[:,:20]
phen_2=rain_calc.iloc[:,20:55]
phen_3=rain_calc.iloc[:,55:70]
phen_4=rain_calc.iloc[:,70:80]
phen_5=rain_calc.iloc[:,70:110]

dfs_phens=[phen_1,phen_2,phen_3,phen_4,phen_5]

phen=1

for df in dfs_phens:
    
    
    mean_col='mean_'+str(phen)
    std_col='std_'+str(phen)
    skew_col='skew_'+str(phen)
    kurt_col='mean_'+str(phen)
    total_col='total_'+str(phen)
    
    original_df[mean_col] =df.mean(axis=1)
    original_df[std_col] =df.std(axis=1)
    original_df[skew_col] =df.skew(axis=1)
    original_df[kurt_col]=df.kurt(axis=1)
    original_df[total_col]=df.sum(axis=1)
    
    phen=phen+1

This works and gives me the output I want - new columns with the calculated statistics. However, I wonder if there is a smarter and more esthetical code way to do so :)

So my goal is to improve my script- to give new columns names inside for loop without creating the strings every time, as I'm doing now.

2 Answers 2

1

You can try to aggregate these functions over the dataframe with pandas.DataFrame.aggregate.

dfs = []

for i, df in enumerate(dfs_phens):

    df_ = (df.agg(['mean', 'std', 'skew', 'kurt', 'sum'], axis='columns')
           .rename(columns=lambda col: f'{col}_{i+1}'))
    dfs.append(df_)

original_df = pd.concat(dfs, axis=1)
Sign up to request clarification or add additional context in comments.

4 Comments

This script saves only the last for loop results but not all of them
@Reut I don't know what you want to do exactly, the script override original_df in each loop.
in the end, the columns that were added to original_df are "mean_5","std_5","skew_5", without the data from previous loops (mean_4,mean_3 ect)
@Reut See the update.
0

Have you considered using a dict?

my_dict = {
    "mean" : df.mean(axis=1),
    "std"  : df.std(axis=1),
    }

for colname, data in my_dict.items():
    original_df[colname + "_" + str(phen)] = data

phen=phen+1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.