71

Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) - (current change), and attribute it to the previous 'ChangeEvent' in new columns?

here is my DataFrame

>>> df
  ChangeEvent StartEvent  case              change      open  
0    Homeless   Homeless     1 2014-03-08 00:00:00 2014-02-08  
1       other   Homeless     1 2014-04-08 00:00:00 2014-02-08     
2    Homeless   Homeless     1 2014-05-08 00:00:00 2014-02-08      
3        Jail   Homeless     1 2014-06-08 00:00:00 2014-02-08     
4        Jail       Jail     2 2014-06-08 00:00:00 2014-02-08   

to add columns

Jail  Homeless case
 0    6        1
 0    30       1
 0    0        1

... and so on

here is the df build

import pandas as pd
import datetime as DT
d = {'case' : pd.Series([1,1,1,1,2]),
'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)

2 Answers 2

105

The way to get the previous is using the shift method:

In [11]: df1.change.shift(1)
Out[11]:
0          NaT
1   2014-03-08
2   2014-04-08
3   2014-05-08
4   2014-06-08
Name: change, dtype: datetime64[ns]

Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).

In [12]: df1.change.shift(1) - df1.change
Out[12]:
0        NaT
1   -31 days
2   -30 days
3   -31 days
4     0 days
Name: change, dtype: timedelta64[ns]

You can just apply this to each case/group:

In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x)
Out[13]:
0        NaT
1   -31 days
2   -30 days
3   -31 days
4        NaT
dtype: timedelta64[ns]
Sign up to request clarification or add additional context in comments.

1 Comment

your last can just be: df.groupby('case')['change'].diff() (though I don't think diff is cythonized so speed should be the same
0

In addition to the previous responses, I'll add a link to solving the NaT / NaN problem, so one has uninterrupted series: How to fill NaT and NaN values separately

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.