How do the sum of a column with group by date from datetime ? Python Pandas

Question

I would like to do the sum of the column duration group by date but the column begin and end are datetime in this piece of df:

begin                       end                         duration
2020-10-14 19:17:52.724020  2020-10-14 19:21:40.179003  227.45
2020-10-14 19:21:40.179003  2020-10-14 19:21:44.037103  3.86
2020-10-14 19:59:27.183161  2020-10-14 20:00:43.847816  76.66
2020-10-14 20:00:43.847816  2020-10-14 20:00:43.847822  0
2020-10-14 20:02:14.341240  2020-10-14 23:59:59.900000  14265.56
2020-10-15 00:00:00.000000  2020-10-15 05:25:32.935971  19532.94
2020-10-15 05:25:32.935971  2020-10-15 05:25:33.068959  0.13

df.info()

begin       41763 non-null  datetime64[ns] 
end         41763 non-null  datetime64[ns] 
duration    41763 non-null  float64

The result must be:

begin         duration
2020-10-14    14,573.53
2020-10-15    19,533.07

So I tried on my all df, this but its works for certain date and no for other. Because I do the same with excel and for a date I have a different result.

import pandas as pd
import datetime

df = df.groupby(df['begin_'].dt.date)['duration_'].sum()/3600

this but its works for certain date and no for other. - Can you add some rows with not working? — jezrael
– jezrael, Commented Oct 27, 2020 at 7:45
you'll need to come up with a minimal reproducible example of the problem, otherwise this seems not reproducible — FObersteiner
– FObersteiner, Commented Oct 27, 2020 at 7:53

thomas · Accepted Answer · 2020-10-27 08:10:04Z

1

You can use the method date of the datetime object. Apply it to the column and you get the date. Afterwards grouping is fine.

def reduce_to_date(value):
    return value.date()

df['begin'] = df['begin'].apply(reduce_to_date)

df.groupby('begin')['duration'].sum()/3600

answered Oct 27, 2020 at 8:10

thomas

4693 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Serge de Gosson de Varennes · Accepted Answer · 2020-10-27 08:12:06Z

1

The first step is to separate Time and Date in the timestamp you have. I give below and example where the dates are defined the same way they are defined in your dataframe.

0   2018-07-02 10:54:00 227.45
1   2018-07-02 10:54:00 3.86
2   2018-07-02 10:54:00 76.66
3   2018-07-02 10:54:00 14265.56
4   2018-07-02 10:54:00 19532.94


d ={'DATA':['2018-07-02 10:54:00','2018-07-02 10:54:00' , '2018-07-02 10:54:00' , '2018-07-02 10:54:00' ,'2018-07-02 10:54:00'],'duration': [227.45,3.86,76.66,14265.56,19532.94]}  
DF = df.assign(Date=df.Date.dt.date, Time=df.Date.dt.time, Duration = df.duration)

The next step is to groupby the way you did it, but by simple give information about which variable you group by:

DF.groupby(['Date']).sum()

which give

Date        Duration     duration
2018-07-02  34106.47    34106.47

answered Oct 27, 2020 at 8:12

Serge de Gosson de Varennes

11.6k4 gold badges30 silver badges60 bronze badges

2 Comments

jezrael Over a year ago

hmmm, it working same like df.groupby(df['begin_'].dt.date)['duration_'].sum(), not necessary this

Serge de Gosson de Varennes Over a year ago

Can you single out parts of your dataframe for which the methods (yours and mine) do not work (Asssuming our methods are duplicates). Can it be so that some begin start a given day but end the next day?

Collectives™ on Stack Overflow

How do the sum of a column with group by date from datetime ? Python Pandas

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related