get previous row's value and calculate new column pandas python

Question

Is there a way to look back to a previous row, and calculate a new variable? so as long as the previous row is the same case what is the (previous change) - (current change), and attribute it to the previous 'ChangeEvent' in new columns?

here is my DataFrame

>>> df
  ChangeEvent StartEvent  case              change      open  
0    Homeless   Homeless     1 2014-03-08 00:00:00 2014-02-08  
1       other   Homeless     1 2014-04-08 00:00:00 2014-02-08     
2    Homeless   Homeless     1 2014-05-08 00:00:00 2014-02-08      
3        Jail   Homeless     1 2014-06-08 00:00:00 2014-02-08     
4        Jail       Jail     2 2014-06-08 00:00:00 2014-02-08

to add columns

Jail  Homeless case
 0    6        1
 0    30       1
 0    0        1

... and so on

here is the df build

import pandas as pd
import datetime as DT
d = {'case' : pd.Series([1,1,1,1,2]),
'open' : pd.Series([DT.datetime(2014, 3, 2), DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2),DT.datetime(2014, 3, 2)]),
'change' : pd.Series([DT.datetime(2014, 3, 8), DT.datetime(2014, 4, 8),DT.datetime(2014, 5, 8),DT.datetime(2014, 6, 8),DT.datetime(2014, 6, 8)]),
'StartEvent' : pd.Series(['Homeless','Homeless','Homeless','Homeless','Jail']),
'ChangeEvent' : pd.Series(['Homeless','irrelivant','Homeless','Jail','Jail']),
'close' : pd.Series([DT.datetime(2015, 3, 2), DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2),DT.datetime(2015, 3, 2)])}
df=pd.DataFrame(d)

Andy Hayden · Accepted Answer · 2014-02-27 23:04:08Z

105

The way to get the previous is using the shift method:

In [11]: df1.change.shift(1)
Out[11]:
0          NaT
1   2014-03-08
2   2014-04-08
3   2014-05-08
4   2014-06-08
Name: change, dtype: datetime64[ns]

Now you can subtract these columns. Note: This is with 0.13.1 (datetime stuff has had a lot of work recently, so YMMV with older versions).

In [12]: df1.change.shift(1) - df1.change
Out[12]:
0        NaT
1   -31 days
2   -30 days
3   -31 days
4     0 days
Name: change, dtype: timedelta64[ns]

You can just apply this to each case/group:

In [13]: df.groupby('case')['change'].apply(lambda x: x.shift(1) - x)
Out[13]:
0        NaT
1   -31 days
2   -30 days
3   -31 days
4        NaT
dtype: timedelta64[ns]

answered Feb 27, 2014 at 23:04

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jeff Over a year ago

your last can just be: df.groupby('case')['change'].diff() (though I don't think diff is cythonized so speed should be the same

Julian · Accepted Answer · 2020-04-28 16:52:57Z

0

In addition to the previous responses, I'll add a link to solving the NaT / NaN problem, so one has uninterrupted series: How to fill NaT and NaN values separately

answered Apr 28, 2020 at 16:52

Julian

1521 silver badge11 bronze badges

Collectives™ on Stack Overflow

get previous row's value and calculate new column pandas python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related