Difference between dates in Pandas dataframe

Question

This is related to this question, but now I need to find the difference between dates that are stored in 'YYYY-MM-DD'. Essentially the difference between values in the count column is what we need, but normalized by the number of days between each row.

My dataframe is:

date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0

And I'd like to find the difference between each date after grouping by date+site+country+kind+ID tuples.

[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1]

One option would be to convert the date column to a Pandas datetime one using pd.to_datetime() and use the diff function but that results in values of "x days", of type timetelda64. I'd like to use this difference to find the daily average count so if this can be accomplished in even a single/less painful step, that would work well.

MaxU - stand with Ukraine · Accepted Answer · 2017-10-17 20:09:25Z

you can use .dt.days accessor:

In [72]: df['date'] = pd.to_datetime(df['date'])

In [73]: df['day_diff'] = df.groupby(['site','country_code','kind','ID'])['date'] \
                            .diff().dt.days.fillna(0)

In [74]: df
Out[74]:
         date      site country_code  kind  ID  rank  votes  sessions  avg_score  count  day_diff
0  2017-03-20  website1           US     0  84   226    0.0      15.0   3.370812   53.0       0.0
1  2017-03-21  website1           US     0  84   214    0.0      15.0   3.370812   53.0       1.0
2  2017-03-22  website1           US     0  84   226    0.0      16.0   3.370812   53.0       1.0
3  2017-03-23  website1           US     0  84   234    0.0      16.0   3.369048   54.0       1.0
4  2017-03-24  website1           US     0  84   226    0.0      16.0   3.369048   54.0       1.0
5  2017-03-25  website1           US     0  84   212    0.0      16.0   3.369048   54.0       1.0
6  2017-03-27  website1           US     0  84   228    0.0      16.0   3.369048   58.0       2.0
7  2017-02-15  website2           AU     1  91   144    4.0     148.0   4.727272  521.0       0.0
8  2017-02-16  website2           AU     1  91   144    3.0     147.0   4.727272  524.0       1.0
9  2017-02-20  website2           AU     1  91   100    4.0     148.0   4.727272  531.0       4.0
10 2017-02-21  website2           AU     1  91   118    6.0     149.0   4.727272  533.0       1.0
11 2017-02-22  website2           AU     1  91   114    4.0     151.0   4.727272  534.0       1.0

Collectives™ on Stack Overflow

Difference between dates in Pandas dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related