I have a Pandas dataframe that looks like something this:
Item1 Item2 Item3
Customer date
1 2014-03-24 0.0 10.0 50.0
2014-06-23 0.0 20.0 60.0
2014-09-22 0.0 20.0 40.0
2014-12-22 3.0 30.0 20.0
2014-12-29 0.0 30.0 20.0
2 2014-03-24 0.0 10.0 50.0
2014-06-23 0.0 20.0 60.0
2014-09-22 0.0 20.0 40.0
2014-12-22 4.0 30.0 20.0
2014-12-29 0.0 30.0 20.0
3 2014-03-24 0.0 10.0 50.0
2014-06-23 0.0 20.0 60.0
2014-09-22 0.0 20.0 40.0
2014-12-22 5.0 30.0 20.0
2014-12-29 0.0 30.0 20.0
It is multi indexed on customer number and date. I want to calculate the first difference in each item for reach customer while ignoring instances when the number goes from 0 to 0. Output would look like this:
Item1 Item2 Item3
Customer date
1 2014-03-24 NaN NaN NaN
2014-06-23 NaN 10.0 10.0
2014-09-22 NaN 0.0 20.0
2014-12-22 3.0 10.0 -20.0
2014-12-29 -3.0 0.0 0.0
2 2014-03-24 NaN NaN NaN
2014-06-23 NaN 10.0 10.0
2014-09-22 NaN 0.0 20.0
2014-12-22 4.0 10.0 -20.0
2014-12-29 -4.0 0.0 0.0
3 2014-03-24 NaN NaN NaN
2014-06-23 NaN 10.0 10.0
2014-09-22 NaN 0.0 20.0
2014-12-22 5.0 10.0 -20.0
2014-12-29 -5.0 0.0 0.0
If not for the need to exclude 0-to-0 changes, df.groupby(level=0).diff() would work fine.
I can devise a way to look through the rows to do this, but the dataframe is quite massive (tens of thousands of customers and dozens of items), so this will not fly. I reckon there is a way to do this with an .apply() operation, but I cannot quite sort it out at this point.