I have this pandas dataframe that looks like this:
index up_walk down_walk up_avg down_avg
0 0.000000 17.827148 0.36642 9.06815
1 1.550781 0.000000 NaN NaN
2 0.957031 0.000000 NaN NaN
3 0.000000 2.878906 NaN NaN
I wanted to calculate the missing values that currently are NAN by this formula:
df['up_avg'][i] = df['up_avg'][i-1] * 12 + df['up_walk'][i]
explanation: I want to calculate for every row the value based on the previous row in the same column, plus the value in the current row from a different column. And that for every row with missing values. and Continue this calculation to the end of the dataframe. In this case, I have a dependency in every new row calculation that is based on the previous up_avg value calculation.
The problem is that using a loop is very slow because of the large dataframe(10K)
Can anyone please help implement a lambda function for this?
if this is not possible, can anyone share a script for an efficient loop?
I tried a lot of things with no success like this:
df['up_avg'] = df.apply(lambda x: pd.Series(np.where((x.up_avg != None), x.up_avg.shift() * 12 + x.up_walk, x.up_avg)))
got an error - "AttributeError: 'Series' object has no attribute 'up_avg'"
and also using shift to create new columns and then using a lambda function with no success
I expect that my dataframe will look like this at the end:
index up_walk down_walk up_avg down_avg
0 0.000000 17.827148 0.36642 9.06815
1 1.550781 0.000000 5.947821 108.8178
2 0.957031 0.000000 72.330883 1305.8136
3 0.000000 2.878906 867.970596 15672.642106
Thanks a lot!