GroupBy and aggregate function in Pandas

Question

I have a time series dataset as below. I would like to split this into multiple 20s bins, get the min and max timestamps in each bin and add a flag to each bin based on whether there is at least 1 successful result (success: result = 0; failed: result = 1)

data = [{"product": "abc", "test_tstamp": 1530693399, "result": 1},
    {"product": "abc", "test_tstamp": 1530693405, "result": 0},
    {"product": "abc", "test_tstamp": 1530693410, "result": 1},
    {"product": "abc", "test_tstamp": 1530693411, "result": 0},
    {"product": "abc", "test_tstamp": 1530693415, "result": 0},
    {"product": "abc", "test_tstamp": 1530693420, "result": 0},
    {"product": "abc", "test_tstamp": 1530693430, "result": 0},
    {"product": "abc", "test_tstamp": 1530693431, "result": 0}]

I'm able to cut the data into 20s intervals using pandas.cut()and get the min and max timestamps for each bin

import numpy as np
import pandas as pd
arange = np.arange(1530693398, 1530693440, 20)
data = [{"product": "abc", "test_tstamp": 1530693399, "result": 1},
    {"product": "abc", "test_tstamp": 1530693405, "result": 0},
    {"product": "abc", "test_tstamp": 1530693410, "result": 1},
    {"product": "abc", "test_tstamp": 1530693411, "result": 0},
    {"product": "abc", "test_tstamp": 1530693415, "result": 0},
    {"product": "abc", "test_tstamp": 1530693420, "result": 1},
    {"product": "abc", "test_tstamp": 1530693430, "result": 1},
    {"product": "abc", "test_tstamp": 1530693431, "result": 1}]
df = pd.DataFrame(data)
df['bins'] = pd.cut(df['test_tstamp'], arange)
output_1 = df.groupby(["bins"]).agg({'result': np.ma.count, 'test_tstamp': {'mindate': np.min, 'maxdate': np.max}})

                         test_tstamp               result
                         maxdate     mindate       count
bins                                                   
(1530693398, 1530693418]  1530693415  1530693399      5
(1530693418, 1530693438]  1530693431  1530693420      3

and able to find result success and result failed using groupby()

output_2 = df.groupby(["bins", "result"]).result.count()
                                     result
 bins                     result        
 (1530693398, 1530693418] 0            3
                          1            2
 (1530693418, 1530693438] 0            3

I'm not sure how to combine output_1 and output_2 so that instead of result count column above, I would like to have result success, result failed and flag columns associated with each bin.

Expected Output:

                             test_tstamp               result    flag
                         maxdate     mindate      success failed  
bins                                                   
(1530693398, 1530693418]  1530693415  1530693399  3         2     True
(1530693418, 1530693438]  1530693431  1530693420  0         3    False

Any pointers would help! Thank you!

Worked? Didn't work?

cs95
– cs95

2018-07-09 15:26:15 +00:00
Commented Jul 9, 2018 at 15:26 — cs95
– cs95, Commented Jul 9, 2018 at 15:26

cs95 · Accepted Answer · 2018-07-09 04:25:06Z

1

Unstack outptut_2 and then concatenate the two outputs:

output_2 = (
    output_2
       .unstack(fill_value=0)
       .rename(columns={0 : 'success', 1 : 'failed'}))

df = (pd.concat([output_1.test_tstamp, output_2], axis=1, keys=['test_tstamp', 'result'])
        .assign(flag=output_2.success.gt(0)))

                         test_tstamp              result          flag
result                       mindate     maxdate success failed       
bins                                                                  
(1530693398, 1530693418]  1530693399  1530693415       3      2   True
(1530693418, 1530693438]  1530693420  1530693431       0      3  False

answered Jul 9, 2018 at 4:25

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

GroupBy and aggregate function in Pandas

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related