Pandas groupby for range of time strings

Question

I have a pandas dataframe (stored in a .csv file) with the following format.

val,date,time
0.001,01JAN90,0:00:00
0.002,01JAN90,0:01:00
0.005,01JAN90,0:02:00
0.056,01JAN90,0:03:00
...
0.067,31DEC90,23:55:00
0.007,31DEC90,23:56:00
0.006,31DEC90,23:57:00
0.004,31DEC90,23:58:00
0.003,31DEC90,23:59:00

This is: a single float (val column) for each minute (time column) of each day (time column) in a year. I need to group the val elements throughout the entire year, that belong to a given hour range. I define 15 hour ranges as:

t_range = [['5:30:00', '6:30:00'], ['6:30:00', '7:30:00'], ...,
['19:30:00', '20:30:00']]

The answer given here Pandas Groupby Range of Values deals with ranges defined as floats, but my ranges are defined as strings.

My idea is that I'd need to first convert all the HH:MM:SS values in time to floats, and then apply the solution based on groupby and pd.cut. Is this the proper approach? How should I be using pandas to do this if not?

You mean if any rows are missing? I'm sure there are not, the length of the columns is exactly (N days in year)*(24 hs)*(60 mins) — Gabriel
– Gabriel, Commented Apr 21, 2017 at 17:50
Then I guess a simple approach is to rely on the number of observations (1h = 60 obs). If you define a dummy variable "count" 1,2,3... you can use modulo to understand in which range you are and then use group bt — FLab
– FLab, Commented Apr 21, 2017 at 17:57
You are correct, but I'd prefer a general solution that does not make this assumption. — Gabriel
– Gabriel, Commented Apr 21, 2017 at 18:22

MaxU - stand with Ukraine · Accepted Answer · 2017-04-21 19:40:39Z

IIUC you can do it this way:

start = 5*60+30
end = 20*60+30
step = 60

df['ts'] = pd.to_datetime(df.date + ' ' + df.time, format='%d%b%y %H:%M:%S')
df['mins'] = df.ts.dt.hour*60 + df.ts.dt.minute

# filter out all "non-interesting" entries                           
x = df.query("@start <= mins <= @end")

bins = np.arange(start-step, end+step, step)
labels = ['({0[0]:02d}:{0[1]:02d}:00, {0[0]:02d}:{0[1]:02d}:00]'.format(divmod(x,60),
                                                                        divmod(x+step,60))
          for x in bins[:-1]]


x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()

Result:

In [164]: x.groupby(pd.cut(x['mins'], bins=bins, labels=labels))['val'].sum().dropna()
Out[164]:
mins
(05:30:00, 06:30:00]    0.006
(06:30:00, 07:30:00]    0.004
(07:30:00, 08:30:00]    0.003
(08:30:00, 09:30:00]    0.111
(09:30:00, 10:30:00]    0.001
(10:30:00, 11:30:00]    0.002
(11:30:00, 12:30:00]    0.005
(12:30:00, 13:30:00]    0.056
Name: val, dtype: float64

Source DF:

In [166]: df
Out[166]:
     val     date      time
0  0.067  01DEC90  04:00:00
1  0.007  01DEC90  05:00:00
2  0.006  01DEC90  06:00:00
3  0.004  01DEC90  07:00:00
4  0.003  01DEC90  08:00:00
5  0.111  01DEC90  09:00:00
6  0.001  01JAN90  10:00:00
7  0.002  01JAN90  11:00:00
8  0.005  01JAN90  12:00:00
9  0.056  01JAN90  13:00:00

Explanation:

bins: # of minutes

In [181]: bins
Out[181]: array([ 270,  330,  390,  450,  510,  570,  630,  690,  750,  810,  870,  930,  990, 1050, 1110, 1170, 1230])

labels

In [182]: labels
Out[182]:
['(04:30:00, 04:30:00]',
 '(05:30:00, 05:30:00]',
 '(06:30:00, 06:30:00]',
 '(07:30:00, 07:30:00]',
 '(08:30:00, 08:30:00]',
 '(09:30:00, 09:30:00]',
 '(10:30:00, 10:30:00]',
 '(11:30:00, 11:30:00]',
 '(12:30:00, 12:30:00]',
 '(13:30:00, 13:30:00]',
 '(14:30:00, 14:30:00]',
 '(15:30:00, 15:30:00]',
 '(16:30:00, 16:30:00]',
 '(17:30:00, 17:30:00]',
 '(18:30:00, 18:30:00]',
 '(19:30:00, 19:30:00]']

Collectives™ on Stack Overflow

Pandas groupby for range of time strings

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related