I have excel data file with thousands of rows and columns.
I am using python and have started using pandas dataframes to analyze data.
What I want to do in column D is to calculate annual change for values in column C for each year for each ID.
I can use excel to do this – if the org ID is same are that in the prior row, calculate annual change (leaving the cells highlighted in blue because that’s the first period for that particular ID). I don’t know how to do this using python. Can anyone help?
Add a comment
|
1 Answer
Assuming the dataframe is already sorted
df.groupby(‘ID’).Cash.pct_change()
However, you can speed things up with the assumption things are sorted. Because it’s not necessary to group in order to calculate percentage change from one row to next
df.Cash.pct_change().mask(
df.ID != df.ID.shift()
)
These should produce the column values you are looking for. In order to add the column, you’ll need to assign to a column or create a new dataframe with the new column
df[‘AnnChange’] = df.groupby(‘ID’).Cash.pct_change()
3 Comments
ps1495
Thanks! Would this ignore calculating the % change for the rows highlighted (the first year for the ID)? Also - do I still have to loop through the data frame? Sorry for the basic questions - this is my second data into data frames.
piRSquared
Why don’t you try it and see 🙂
ps1495
Thanks! Both the options you proposed worked. I really appreciate your quick response. Have a good evening.