How to handle Time index with using groupby in python

Question

I have a csv file with several variables.
Among the variables, Date and time is included separately.
Following image is the form of my data:

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     20         0       40
2017-10-15    13:40:10     40         10      100
2017-10-15    13:40:20     50         0       0
2017-10-15    13:40:30     10         10      60
2017-10-15    13:40:40     0          0       20
2017-10-15    13:40:50     0          0       10
2017-10-16    06:20:30     10         0       10
2017-10-16    06:20:40     70         0       10
2017-10-16    06:20:50     20         100     80
   .             .         .          .       .
   .             .         .          .       .

and there is more rows.(more than ten thousands)
You may notice that there is some time gaps between 10/15 and 10/16.
I'd like to sum all three Axis values by minute.
What I expect is this structure:

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     120        20      230
2017-10-16    06:20:00     100        100     100
2017-10-16    06:21:00     ?          ?       ?
   .             .         .          .       .
   .             .         .          .       .

I tried to use groupby, resample and pd.Grouper, But it does not work for me.
The main problem is that time index is not start from 13:40:00, but start from 00:00:00 after I put time as an index and use groupby('Date') and resample('1Min').sum().

Thanks for your help!

You can use between_time after resample operation to filter out the time-range you don't want. — Ian
– Ian, Commented Sep 5, 2018 at 2:39
For example if you only need the data between 06:20 and 13:40 for every day, you can do df = df.between_time('06:20:00','13:40:00') — Ian
– Ian, Commented Sep 5, 2018 at 6:00

Scott Boston · Accepted Answer · 2018-09-05 02:43:56Z

1

Let's try:

df = df.set_index(pd.to_datetime(df['Date']+' '+df['Time'], format='%Y-%m-%d %H:%M:%S'))

df.groupby(df.index.floor('T')).sum()

Output:

                     Axis1  Axis2  Axis3
2017-10-15 13:40:00    120     20    230
2017-10-16 06:20:00    100    100    100

Note: Use format parmeter in pd.to_datetime to help with efficiency. Use floor to avoid resampling or grouping over missing times.

answered Sep 5, 2018 at 2:43

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Dougie Hwang Over a year ago

It seems not work for me :( The result is same as I did. First row looks like this : 2017-10-15 00:00:00

Dougie Hwang Over a year ago

Actually it does work in this case, but for some reasons, it's not working on my data.

Dougie Hwang Over a year ago

I think that it appears from 00:00:00 because of groupby. I'd like to keep the order. Could you give me an advise??

Scott Boston Over a year ago

Can you more carefully examine your input data? For example file one of the groups that has 00:00:00, query your input data for minimum time on that day.

Collectives™ on Stack Overflow

How to handle Time index with using groupby in python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related