Python Pandas Conditional First Differencing

Question

I have a Pandas dataframe that looks like something this:

                          Item1  Item2  Item3
Customer  date                                                           
1         2014-03-24       0.0   10.0   50.0   
          2014-06-23       0.0   20.0   60.0   
          2014-09-22       0.0   20.0   40.0   
          2014-12-22       3.0   30.0   20.0
          2014-12-29       0.0   30.0   20.0   
2         2014-03-24       0.0   10.0   50.0   
          2014-06-23       0.0   20.0   60.0   
          2014-09-22       0.0   20.0   40.0   
          2014-12-22       4.0   30.0   20.0
          2014-12-29       0.0   30.0   20.0    
3         2014-03-24       0.0   10.0   50.0   
          2014-06-23       0.0   20.0   60.0   
          2014-09-22       0.0   20.0   40.0   
          2014-12-22       5.0   30.0   20.0
          2014-12-29       0.0   30.0   20.0

It is multi indexed on customer number and date. I want to calculate the first difference in each item for reach customer while ignoring instances when the number goes from 0 to 0. Output would look like this:

                          Item1  Item2  Item3
Customer  date                                                           
1         2014-03-24       NaN   NaN    NaN   
          2014-06-23       NaN   10.0   10.0   
          2014-09-22       NaN    0.0   20.0   
          2014-12-22       3.0   10.0  -20.0
          2014-12-29      -3.0    0.0    0.0  
2         2014-03-24       NaN   NaN    NaN   
          2014-06-23       NaN   10.0   10.0   
          2014-09-22       NaN    0.0   20.0   
          2014-12-22       4.0   10.0  -20.0
          2014-12-29      -4.0    0.0    0.0  
3         2014-03-24       NaN   NaN    NaN   
          2014-06-23       NaN   10.0   10.0   
          2014-09-22       NaN    0.0   20.0   
          2014-12-22       5.0   10.0  -20.0
          2014-12-29      -5.0    0.0    0.0

If not for the need to exclude 0-to-0 changes, df.groupby(level=0).diff() would work fine.

I can devise a way to look through the rows to do this, but the dataframe is quite massive (tens of thousands of customers and dozens of items), so this will not fly. I reckon there is a way to do this with an .apply() operation, but I cannot quite sort it out at this point.

BENY · Accepted Answer · 2017-11-02 14:33:14Z

1

you almost there, adding .mask

 df.groupby(level=0).diff().mask(df==0)
    Out[740]: 
                         Item1  Item2  Item3
    Customer date                           
    1        2014-03-24    NaN    NaN    NaN
             2014-06-23    NaN   10.0   10.0
             2014-09-22    NaN    0.0  -20.0
             2014-12-22    3.0   10.0  -20.0
    2        2014-03-24    NaN    NaN    NaN
             2014-06-23    NaN   10.0   10.0
             2014-09-22    NaN    0.0  -20.0
             2014-12-22    4.0   10.0  -20.0
    3        2014-03-24    NaN    NaN    NaN
             2014-06-23    NaN   10.0   10.0
             2014-09-22    NaN    0.0  -20.0
             2014-12-22    5.0   10.0  -20.0

EDIT :

df.groupby(level=0).diff().mask(df.groupby(level='Customer').apply(lambda x: (x==0).cumprod())==1)

edited Nov 2, 2017 at 14:33

answered Nov 1, 2017 at 21:58

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user7969724 Over a year ago

While this works for the sample I originally provided, it also masks non-zero-to-zero changes. So if I had included another row for 2014-12-29 that showed the values in Item1 returning to zero, they would NaN out the change, instead of showing -3.0,-4.0, -5.0. This is a good tip, though, and I might be able to make use of it. I updated my example datasets to reflect this aspect of the problem.

BENY Over a year ago

@BrianPreslopsky after the change , it almost different question ;.....

user7969724 Over a year ago

Not really. I specifically said excluding only 0-0 changes. Actually, I got the answer with a tiny modification to the recommended solution: df.groupby(level=0).diff().mask((df==0) & (df.shift(1)==0)). Thank you , Wen!

BENY Over a year ago

df.groupby(level=0).diff().mask((df==0) & (df.shift(1)==0)) that is not the safe way ...let me modify it .

Collectives™ on Stack Overflow

Python Pandas Conditional First Differencing

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related