Create new columns in pandas dataframe inside for loop and give them different names

Question

I have 4 data frames (pandas) that are similar in the structure to this one:

index   day1    day2   day3   day4   day5 ....
0        1.23   5.41    0      0      2.31
1        2.31   7.15    0      0      1.32 
...

I want to calculate for each row the mean, std, kurtosis, and skewness, and add it as new columns to another existing data frame.Right now I do it using for loop, changing the names of the columns by count number of for loop and adding the number as a string to the columns name, so I don't run over the results of the previous for loop. This looks like this:

phen_1=rain_calc.iloc[:,:20]
phen_2=rain_calc.iloc[:,20:55]
phen_3=rain_calc.iloc[:,55:70]
phen_4=rain_calc.iloc[:,70:80]
phen_5=rain_calc.iloc[:,70:110]

dfs_phens=[phen_1,phen_2,phen_3,phen_4,phen_5]

phen=1

for df in dfs_phens:
    
    
    mean_col='mean_'+str(phen)
    std_col='std_'+str(phen)
    skew_col='skew_'+str(phen)
    kurt_col='mean_'+str(phen)
    total_col='total_'+str(phen)
    
    original_df[mean_col] =df.mean(axis=1)
    original_df[std_col] =df.std(axis=1)
    original_df[skew_col] =df.skew(axis=1)
    original_df[kurt_col]=df.kurt(axis=1)
    original_df[total_col]=df.sum(axis=1)
    
    phen=phen+1

This works and gives me the output I want - new columns with the calculated statistics. However, I wonder if there is a smarter and more esthetical code way to do so :)

So my goal is to improve my script- to give new columns names inside for loop without creating the strings every time, as I'm doing now.

Ynjxsjmh · Accepted Answer · 2022-04-20 08:13:29Z

1

You can try to aggregate these functions over the dataframe with pandas.DataFrame.aggregate.

dfs = []

for i, df in enumerate(dfs_phens):

    df_ = (df.agg(['mean', 'std', 'skew', 'kurt', 'sum'], axis='columns')
           .rename(columns=lambda col: f'{col}_{i+1}'))
    dfs.append(df_)

original_df = pd.concat(dfs, axis=1)

edited Apr 20, 2022 at 8:13

answered Apr 19, 2022 at 15:30

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Reut Over a year ago

This script saves only the last for loop results but not all of them

Ynjxsjmh Over a year ago

@Reut I don't know what you want to do exactly, the script override original_df in each loop.

Reut Over a year ago

in the end, the columns that were added to original_df are "mean_5","std_5","skew_5", without the data from previous loops (mean_4,mean_3 ect)

Ynjxsjmh Over a year ago

@Reut See the update.

PirateNinja · Accepted Answer · 2022-04-19 12:24:33Z

0

Have you considered using a dict?

my_dict = {
    "mean" : df.mean(axis=1),
    "std"  : df.std(axis=1),
    }

for colname, data in my_dict.items():
    original_df[colname + "_" + str(phen)] = data

phen=phen+1

answered Apr 19, 2022 at 12:24

PirateNinja

615 bronze badges

Collectives™ on Stack Overflow

Create new columns in pandas dataframe inside for loop and give them different names

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related