I have a pandas dataframe with over 10k rows. I need to iterate through every row and do math based on the value of the previous row after it's updated. For loop is very slow.
Example DF:
a b c
1 2 3
2 3 4
3 4 5
for loop example:
for i in range(1,len(DF)):
DF['b'] = DF['b'].[i-1]+DF['c']
I also tried
DF['b'] = DF['b'].shift(1)+DF['c']
but that won't use the updated value 'b'.
What is the best way to do such calculations?
ANSWERED: loc and iloc helps. The best way to do this is:
for i in range(1, len(DF)):
DF.loc[i, 'b'] = DF.loc[i-1, 'b'] + DF.loc[i, 'c']
range(1,len(DF)):?cumsum. If it is more complicated, there may be no way to avoid looping over the rows. In that case, you may need Cython to boost performance.cumsumon both columns? A minimal working example would be helpful. The devil is in the details...