Sequential Calculation of Pandas Column without For Loop

Question

I have the sample dataframe below

        perc    2018_norm
0   0.009069    27.799849
1   0.011384    0.00
2   -0.000592   0.00
3   -0.002667   0.00

The value of the first row of 2018_norm comes from another DataFrame. I then want to calculate the value of the second row through the end of the DataFrame of the 2018_norm column using the percentage change in the perc column and previous row's value in 2018_norm column, which I can currently achieve using a For Loop to give the following result:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   28.099667
3   -0.002667   28.024713
4   -0.006538   27.841490

For Loops on DataFrames are just slow so I know I am missing something basic but my google searching hasn't yielded what I am looking for.

I've tried variations of y1df['2018_norm'].iloc[1:] = (y1df['perc'] * y1df['2018_norm'].shift(1)) + y1df['2018_norm'].shift(1) that just yield:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   0.00
3   -0.002667   0.00
4   -0.006538   0.00`

What am I missing?

EDIT: To clarify, a basic For loop and df.iloc were not preferable and a for loop with iterrows sped the computation up substantially such that a for loop using that function is a great solution for my use. Wen-Ben's respone also directly answers the question I didn't mean to ask in my original post.

I don't think you can do this without a loop. Your calculation in each row is based on the output of the previous calculation, so you have to do them in sequence rather than as an array operation. — Christoph Burschka
– Christoph Burschka, Commented Mar 15, 2019 at 19:16
You're correct, should've clarified that I needed something more streamlined with pandas like iterrows instead of a basic for loop and df.iloc — serranzau
– serranzau, Commented Mar 15, 2019 at 19:22

Nathaniel · Accepted Answer · 2019-03-15 19:14:10Z

4

You can use df.iterrows() to loop much more quickly through a pandas data frame:

for idx, row in y1df.iterrows():
    if idx > 0: # Skip first row
        y1df.loc[idx, '2018_norm'] = (1 + row['perc']) * y1df.loc[idx-1, '2018_norm']

print(y1df)

           perc  2018_norm
    0  0.009069  27.799849
    1  0.011384  28.116322
    2 -0.000592  28.099678
    3 -0.002667  28.024736

answered Mar 15, 2019 at 19:14

Nathaniel

3,32013 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

serranzau Over a year ago

Thank you, I knew there had to be something with iterrows I wasn't easily figuring out in the docs.

Nathaniel Over a year ago

No problem. Once you figure out how to use it, it makes pandas a whole lot faster!

BENY · Accepted Answer · 2019-03-15 19:17:43Z

3

This is just cumprod

s=(df.perc.shift(-1).fillna(1)+1).cumprod().shift().fillna(1)*df['2018_norm'].iloc[0]
df['2018_norm']=s
df
Out[390]: 
       perc  2018_norm
0  0.009069  27.799849
1  0.011384  28.116322
2 -0.000592  28.099678
3 -0.002667  28.024736

answered Mar 15, 2019 at 19:17

BENY

324k22 gold badges176 silver badges250 bronze badges

2 Comments

Nathaniel Over a year ago

This is a nice answer.

serranzau Over a year ago

Yes, thanks for submitting. This is surely because I am newer but the iterrows solution is easier for me to fully understand and ultimately re-use, but you're answer does directly address how I initially posed my question.

Collectives™ on Stack Overflow

Sequential Calculation of Pandas Column without For Loop

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related