2

I have the sample dataframe below

        perc    2018_norm
0   0.009069    27.799849
1   0.011384    0.00
2   -0.000592   0.00
3   -0.002667   0.00

The value of the first row of 2018_norm comes from another DataFrame. I then want to calculate the value of the second row through the end of the DataFrame of the 2018_norm column using the percentage change in the perc column and previous row's value in 2018_norm column, which I can currently achieve using a For Loop to give the following result:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   28.099667
3   -0.002667   28.024713
4   -0.006538   27.841490

For Loops on DataFrames are just slow so I know I am missing something basic but my google searching hasn't yielded what I am looking for.

I've tried variations of y1df['2018_norm'].iloc[1:] = (y1df['perc'] * y1df['2018_norm'].shift(1)) + y1df['2018_norm'].shift(1) that just yield:

       perc      2018_norm
0   0.009069    27.799849
1   0.011384    28.116324
2   -0.000592   0.00
3   -0.002667   0.00
4   -0.006538   0.00`

What am I missing?

EDIT: To clarify, a basic For loop and df.iloc were not preferable and a for loop with iterrows sped the computation up substantially such that a for loop using that function is a great solution for my use. Wen-Ben's respone also directly answers the question I didn't mean to ask in my original post.

3
  • I don't think you can do this without a loop. Your calculation in each row is based on the output of the previous calculation, so you have to do them in sequence rather than as an array operation. Commented Mar 15, 2019 at 19:16
  • 1
    You're correct, should've clarified that I needed something more streamlined with pandas like iterrows instead of a basic for loop and df.iloc Commented Mar 15, 2019 at 19:22
  • @ChristophBurschka you do not need for loop here Commented Mar 15, 2019 at 19:23

2 Answers 2

4

You can use df.iterrows() to loop much more quickly through a pandas data frame:

for idx, row in y1df.iterrows():
    if idx > 0: # Skip first row
        y1df.loc[idx, '2018_norm'] = (1 + row['perc']) * y1df.loc[idx-1, '2018_norm']

print(y1df)

           perc  2018_norm
    0  0.009069  27.799849
    1  0.011384  28.116322
    2 -0.000592  28.099678
    3 -0.002667  28.024736
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you, I knew there had to be something with iterrows I wasn't easily figuring out in the docs.
No problem. Once you figure out how to use it, it makes pandas a whole lot faster!
3

This is just cumprod

s=(df.perc.shift(-1).fillna(1)+1).cumprod().shift().fillna(1)*df['2018_norm'].iloc[0]
df['2018_norm']=s
df
Out[390]: 
       perc  2018_norm
0  0.009069  27.799849
1  0.011384  28.116322
2 -0.000592  28.099678
3 -0.002667  28.024736

2 Comments

This is a nice answer.
Yes, thanks for submitting. This is surely because I am newer but the iterrows solution is easier for me to fully understand and ultimately re-use, but you're answer does directly address how I initially posed my question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.