I converted below data into DataFrame which looks as below
data = [
{"start_ts": "2018-05-14 10:54:33", "end_ts": "2018-05-14 11:54:33", "product": "a", "value": 1},
{"start_ts": "2018-05-14 11:54:33", "end_ts": "2018-05-14 12:54:33", "product": "a", "value": 1},
{"start_ts": "2018-05-14 13:54:33", "end_ts": "2018-05-14 14:54:33", "product": "a", "value": 1},
{"start_ts": "2018-05-14 10:54:33", "end_ts": "2018-05-14 11:54:33", "product": "b", "value": 1}
]
product start_ts end_ts value
0 a 2018-05-14 10:54:33 2018-05-14 11:54:33 1
1 a 2018-05-14 11:54:33 2018-05-14 12:54:33 1
2 a 2018-05-14 13:54:33 2018-05-14 14:54:33 1
3 b 2018-05-14 10:54:33 2018-05-14 11:54:33 1
I'm trying to bucketize above DF rows into one row by finding contiguous timestamp fields (where start_ts is equal to the prior row's end_ts for a product) for a product and sum the value column like below.
Expected:
product start_ts end_ts value
0 a 2018-05-14 10:54:33 2018-05-14 12:54:33 2
1 a 2018-05-14 13:54:33 2018-05-14 14:54:33 1
2 b 2018-05-14 10:54:33 2018-05-14 11:54:33 1
I'm unable to get the expected above value using the code below
def merge_dates(grp):
date_groups = (grp['start_ts'] != grp['end_ts'].shift())
return grp.groupby(date_groups).agg({'start_ts': 'first', 'end_ts': 'last'})
df.groupby(["product"]).apply(merge_dates)
Need some advice. Any help would be greatly appreciated!
Thanks