Pandas calculate based on multiple rows and conditions

Question

I'm novoce to pandas. Need to calculate time for each person, for each location and drop rows without pair in dates col. My data looks like this:

Unit    Name    Location    Date    Time
0  K1  Somebody1    LOC1  2020-05-12  07:00
1  K1  Somebody1    LOC1  2020-05-12  20:10
2  K1  Somebody1    LOC1  2020-05-13  06:00
3  K1  Somebody1    LOC1  2020-05-13  20:00
4  K1  Somebody1    LOC1  2020-05-14  06:37
5  K1  Somebody1    LOC2  2020-05-15  07:00
6  K1  Somebody1    LOC2  2020-05-15  20:10
7  K1  Somebody1    LOC2  2020-05-16  06:00
8  K1  Somebody1    LOC2  2020-05-16  20:00
9  K1  Somebody1    LOC2  2020-05-17  06:37
10  K1  Somebody2    LOC2  2020-05-13  07:00
11  K1  Somebody2    LOC2  2020-05-14  10:10
12  K1  Somebody2    LOC2  2020-05-14  16:50
13  K1  Somebody2    LOC2  2020-05-15  05:36
14  K1  Somebody3    LOC1  2020-05-13  07:00
15  K1  Somebody3    LOC1  2020-05-14  10:10
16  K1  Somebody3    LOC1  2020-05-14  16:50
17  K1  Somebody3    LOC1  2020-05-15  05:36

I only menaged to convert time to datetime object by

df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())

Tried using pivot tables, grouping by, for loops and I'm out of ideas. I wanted output to look like that:

LOC1
      Somebody1  2020-05-12  13h 10m
                 2020-05-13  14h 00m
TOTAL                        27h 00m
      Somebody2  date        hours
                 date        hours
TOTAL                        sum for somebody2
      Somebody3  date        hours
                 date        hours
TOTAL                        sum for somebody3

LOC2
      Somebody1  date        hours
                 date        hours
TOTAL                        sum for somebody1
      Somebody2  date        hours   
                 date        hours
TOTAL                        sum for somebody2

or something similar

Umar.H · Accepted Answer · 2020-05-20 12:50:33Z

1

IIUC groupby and combine first

import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' +  df['Time'])

df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
                                  .agg(start=('datetime','first'),
                                   end=('datetime','last'))

df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')

# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')



final = df1.drop(['start','end'],axis=1).combine_first(m)

#if you want to remove single entry days
final[final['timespent'] > 0]

                               timespent
Name      Location datetime             
Somebody1 LOC1     2020-05-12  13.166667
                   2020-05-13  14.000000
          TOTAL    NaT         27.166667
Somebody2 LOC2     2020-05-14   6.666667
          TOTAL    NaT          6.666667

answered May 20, 2020 at 12:50

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Abdul Alhazred Over a year ago

edited because I gave wrong example. Your advice is realy good, I tried to reverse groupby order for location to be first, but that didnt go well. it sums all work for one person, but I need to sum only for location

python kindergarden developer · Accepted Answer · 2020-05-20 13:59:05Z

0

You can begin with grep to collect times per two rows and then calculate the time difference. For example, parse the names of peoples into one list and then using grep do:

for i in $(cat list-names);do grep $i a.csv | awk '{print$6}';done

where a.csv:

0  K1  Somebody1    LOC1  2020-05-12  17:00
1  K1  Somebody1    LOC1  2020-05-12  20:10

Also, to grab the difference in Hours do:

awk '
    NR == 1{old = $6; next}     
    {print $6 - old; old = $6}  
' a.csv

answered May 20, 2020 at 13:59

python kindergarden developer

481 silver badge14 bronze badges

Collectives™ on Stack Overflow

Pandas calculate based on multiple rows and conditions

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related