2

I have a csv file with several variables.
Among the variables, Date and time is included separately.
Following image is the form of my data:

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     20         0       40
2017-10-15    13:40:10     40         10      100
2017-10-15    13:40:20     50         0       0
2017-10-15    13:40:30     10         10      60
2017-10-15    13:40:40     0          0       20
2017-10-15    13:40:50     0          0       10
2017-10-16    06:20:30     10         0       10
2017-10-16    06:20:40     70         0       10
2017-10-16    06:20:50     20         100     80
   .             .         .          .       .
   .             .         .          .       .

and there is more rows.(more than ten thousands)
You may notice that there is some time gaps between 10/15 and 10/16.
I'd like to sum all three Axis values by minute.
What I expect is this structure:

  Date         Time       Axis1     Axis2    Axis3
   .             .         .          .       .
   .             .         .          .       .
2017-10-15    13:40:00     120        20      230
2017-10-16    06:20:00     100        100     100
2017-10-16    06:21:00     ?          ?       ?
   .             .         .          .       .
   .             .         .          .       .

I tried to use groupby, resample and pd.Grouper, But it does not work for me.
The main problem is that time index is not start from 13:40:00, but start from 00:00:00 after I put time as an index and use groupby('Date') and resample('1Min').sum().

Thanks for your help!

3
  • You can use between_time after resample operation to filter out the time-range you don't want. Commented Sep 5, 2018 at 2:39
  • Could you show an example?? Commented Sep 5, 2018 at 4:13
  • For example if you only need the data between 06:20 and 13:40 for every day, you can do df = df.between_time('06:20:00','13:40:00') Commented Sep 5, 2018 at 6:00

1 Answer 1

1

Let's try:

df = df.set_index(pd.to_datetime(df['Date']+' '+df['Time'], format='%Y-%m-%d %H:%M:%S'))

df.groupby(df.index.floor('T')).sum()

Output:

                     Axis1  Axis2  Axis3
2017-10-15 13:40:00    120     20    230
2017-10-16 06:20:00    100    100    100

Note: Use format parmeter in pd.to_datetime to help with efficiency. Use floor to avoid resampling or grouping over missing times.

Sign up to request clarification or add additional context in comments.

4 Comments

It seems not work for me :( The result is same as I did. First row looks like this : 2017-10-15 00:00:00
Actually it does work in this case, but for some reasons, it's not working on my data.
I think that it appears from 00:00:00 because of groupby. I'd like to keep the order. Could you give me an advise??
Can you more carefully examine your input data? For example file one of the groups that has 00:00:00, query your input data for minimum time on that day.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.