I have a time series dataset as below. I would like to split this into multiple 20s bins, get the min and max timestamps in each bin and add a flag to each bin based on whether there is at least 1 successful result (success: result = 0; failed: result = 1)
data = [{"product": "abc", "test_tstamp": 1530693399, "result": 1},
{"product": "abc", "test_tstamp": 1530693405, "result": 0},
{"product": "abc", "test_tstamp": 1530693410, "result": 1},
{"product": "abc", "test_tstamp": 1530693411, "result": 0},
{"product": "abc", "test_tstamp": 1530693415, "result": 0},
{"product": "abc", "test_tstamp": 1530693420, "result": 0},
{"product": "abc", "test_tstamp": 1530693430, "result": 0},
{"product": "abc", "test_tstamp": 1530693431, "result": 0}]
I'm able to cut the data into 20s intervals using pandas.cut()and get the min and max timestamps for each bin
import numpy as np
import pandas as pd
arange = np.arange(1530693398, 1530693440, 20)
data = [{"product": "abc", "test_tstamp": 1530693399, "result": 1},
{"product": "abc", "test_tstamp": 1530693405, "result": 0},
{"product": "abc", "test_tstamp": 1530693410, "result": 1},
{"product": "abc", "test_tstamp": 1530693411, "result": 0},
{"product": "abc", "test_tstamp": 1530693415, "result": 0},
{"product": "abc", "test_tstamp": 1530693420, "result": 1},
{"product": "abc", "test_tstamp": 1530693430, "result": 1},
{"product": "abc", "test_tstamp": 1530693431, "result": 1}]
df = pd.DataFrame(data)
df['bins'] = pd.cut(df['test_tstamp'], arange)
output_1 = df.groupby(["bins"]).agg({'result': np.ma.count, 'test_tstamp': {'mindate': np.min, 'maxdate': np.max}})
test_tstamp result
maxdate mindate count
bins
(1530693398, 1530693418] 1530693415 1530693399 5
(1530693418, 1530693438] 1530693431 1530693420 3
and able to find result success and result failed using groupby()
output_2 = df.groupby(["bins", "result"]).result.count()
result
bins result
(1530693398, 1530693418] 0 3
1 2
(1530693418, 1530693438] 0 3
I'm not sure how to combine output_1 and output_2 so that instead of result count column above, I would like to have result success, result failed and flag columns associated with each bin.
Expected Output:
test_tstamp result flag
maxdate mindate success failed
bins
(1530693398, 1530693418] 1530693415 1530693399 3 2 True
(1530693418, 1530693438] 1530693431 1530693420 0 3 False
Any pointers would help! Thank you!