This is related to this question, but now I need to find the difference between dates that are stored in 'YYYY-MM-DD'. Essentially the difference between values in the count column is what we need, but normalized by the number of days between each row.
My dataframe is:
date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,53.0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,53.0
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,53.0
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,54.0
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,54.0
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,54.0
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,58.0
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,521.0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,524.0
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,531.0
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,533.0
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,534.0
And I'd like to find the difference between each date after grouping by date+site+country+kind+ID tuples.
[date,site,country_code,kind,ID,rank,votes,sessions,avg_score,count,day_diff
2017-03-20,website1,US,0,84,226,0.0,15.0,3.370812,0,0
2017-03-21,website1,US,0,84,214,0.0,15.0,3.370812,0,1
2017-03-22,website1,US,0,84,226,0.0,16.0,3.370812,0,1
2017-03-23,website1,US,0,84,234,0.0,16.0,3.369048,0,1
2017-03-24,website1,US,0,84,226,0.0,16.0,3.369048,0,1
2017-03-25,website1,US,0,84,212,0.0,16.0,3.369048,0,1
2017-03-27,website1,US,0,84,228,0.0,16.0,3.369048,4,2
2017-02-15,website2,AU,1,91,144,4.0,148.0,4.727272,0,0
2017-02-16,website2,AU,1,91,144,3.0,147.0,4.727272,3,1
2017-02-20,website2,AU,1,91,100,4.0,148.0,4.727272,7,4
2017-02-21,website2,AU,1,91,118,6.0,149.0,4.727272,3,1
2017-02-22,website2,AU,1,91,114,4.0,151.0,4.727272,1,1]
One option would be to convert the date column to a Pandas datetime one using pd.to_datetime() and use the diff function but that results in values of "x days", of type timetelda64. I'd like to use this difference to find the daily average count so if this can be accomplished in even a single/less painful step, that would work well.