1

I would like to do the sum of the column duration group by date but the column begin and end are datetime in this piece of df:

begin                       end                         duration
2020-10-14 19:17:52.724020  2020-10-14 19:21:40.179003  227.45
2020-10-14 19:21:40.179003  2020-10-14 19:21:44.037103  3.86
2020-10-14 19:59:27.183161  2020-10-14 20:00:43.847816  76.66
2020-10-14 20:00:43.847816  2020-10-14 20:00:43.847822  0
2020-10-14 20:02:14.341240  2020-10-14 23:59:59.900000  14265.56
2020-10-15 00:00:00.000000  2020-10-15 05:25:32.935971  19532.94
2020-10-15 05:25:32.935971  2020-10-15 05:25:33.068959  0.13

df.info()

begin       41763 non-null  datetime64[ns] 
end         41763 non-null  datetime64[ns] 
duration    41763 non-null  float64   

The result must be:

begin         duration
2020-10-14    14,573.53
2020-10-15    19,533.07

So I tried on my all df, this but its works for certain date and no for other. Because I do the same with excel and for a date I have a different result.

import pandas as pd
import datetime

df = df.groupby(df['begin_'].dt.date)['duration_'].sum()/3600
4
  • 2
    this but its works for certain date and no for other. - Can you add some rows with not working? Commented Oct 27, 2020 at 7:45
  • yeah but after I need to delete it Commented Oct 27, 2020 at 7:48
  • I can't it is too big... Commented Oct 27, 2020 at 7:50
  • you'll need to come up with a minimal reproducible example of the problem, otherwise this seems not reproducible Commented Oct 27, 2020 at 7:53

2 Answers 2

1

You can use the method date of the datetime object. Apply it to the column and you get the date. Afterwards grouping is fine.

def reduce_to_date(value):
    return value.date()

df['begin'] = df['begin'].apply(reduce_to_date)

df.groupby('begin')['duration'].sum()/3600
Sign up to request clarification or add additional context in comments.

Comments

1

The first step is to separate Time and Date in the timestamp you have. I give below and example where the dates are defined the same way they are defined in your dataframe.

0   2018-07-02 10:54:00 227.45
1   2018-07-02 10:54:00 3.86
2   2018-07-02 10:54:00 76.66
3   2018-07-02 10:54:00 14265.56
4   2018-07-02 10:54:00 19532.94

d ={'DATA':['2018-07-02 10:54:00','2018-07-02 10:54:00' , '2018-07-02 10:54:00' , '2018-07-02 10:54:00' ,'2018-07-02 10:54:00'],'duration': [227.45,3.86,76.66,14265.56,19532.94]}  
DF = df.assign(Date=df.Date.dt.date, Time=df.Date.dt.time, Duration = df.duration)

The next step is to groupby the way you did it, but by simple give information about which variable you group by:

DF.groupby(['Date']).sum()

which give

Date        Duration     duration
2018-07-02  34106.47    34106.47

2 Comments

hmmm, it working same like df.groupby(df['begin_'].dt.date)['duration_'].sum(), not necessary this
Can you single out parts of your dataframe for which the methods (yours and mine) do not work (Asssuming our methods are duplicates). Can it be so that some begin start a given day but end the next day?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.