3

I'm novoce to pandas. Need to calculate time for each person, for each location and drop rows without pair in dates col. My data looks like this:

Unit    Name    Location    Date    Time
0  K1  Somebody1    LOC1  2020-05-12  07:00
1  K1  Somebody1    LOC1  2020-05-12  20:10
2  K1  Somebody1    LOC1  2020-05-13  06:00
3  K1  Somebody1    LOC1  2020-05-13  20:00
4  K1  Somebody1    LOC1  2020-05-14  06:37
5  K1  Somebody1    LOC2  2020-05-15  07:00
6  K1  Somebody1    LOC2  2020-05-15  20:10
7  K1  Somebody1    LOC2  2020-05-16  06:00
8  K1  Somebody1    LOC2  2020-05-16  20:00
9  K1  Somebody1    LOC2  2020-05-17  06:37
10  K1  Somebody2    LOC2  2020-05-13  07:00
11  K1  Somebody2    LOC2  2020-05-14  10:10
12  K1  Somebody2    LOC2  2020-05-14  16:50
13  K1  Somebody2    LOC2  2020-05-15  05:36
14  K1  Somebody3    LOC1  2020-05-13  07:00
15  K1  Somebody3    LOC1  2020-05-14  10:10
16  K1  Somebody3    LOC1  2020-05-14  16:50
17  K1  Somebody3    LOC1  2020-05-15  05:36

I only menaged to convert time to datetime object by

df['Time'] = df['Time'].apply(lambda x: datetime.strptime(x,'%H:%M').time())

Tried using pivot tables, grouping by, for loops and I'm out of ideas. I wanted output to look like that:

LOC1
      Somebody1  2020-05-12  13h 10m
                 2020-05-13  14h 00m
TOTAL                        27h 00m
      Somebody2  date        hours
                 date        hours
TOTAL                        sum for somebody2
      Somebody3  date        hours
                 date        hours
TOTAL                        sum for somebody3

LOC2
      Somebody1  date        hours
                 date        hours
TOTAL                        sum for somebody1
      Somebody2  date        hours   
                 date        hours
TOTAL                        sum for somebody2

or something similar

2 Answers 2

1

IIUC groupby and combine first

import numpy as np
df['datetime'] = pd.to_datetime(df['Date'] + ' ' +  df['Time'])

df1 = df.groupby(['Name','Location', df['datetime'].dt.normalize()])\
                                  .agg(start=('datetime','first'),
                                   end=('datetime','last'))

df1['timespent'] = (df1['end'] - df1['start']) / np.timedelta64(1,'h')

# create total row.
m = df1.unstack(['Name','Location'])['timespent'].sum().unstack()
m = m.assign(TOTAL=m.sum(1)).stack().to_frame('timespent')



final = df1.drop(['start','end'],axis=1).combine_first(m)

#if you want to remove single entry days
final[final['timespent'] > 0]

                               timespent
Name      Location datetime             
Somebody1 LOC1     2020-05-12  13.166667
                   2020-05-13  14.000000
          TOTAL    NaT         27.166667
Somebody2 LOC2     2020-05-14   6.666667
          TOTAL    NaT          6.666667
Sign up to request clarification or add additional context in comments.

1 Comment

edited because I gave wrong example. Your advice is realy good, I tried to reverse groupby order for location to be first, but that didnt go well. it sums all work for one person, but I need to sum only for location
0

You can begin with grep to collect times per two rows and then calculate the time difference. For example, parse the names of peoples into one list and then using grep do:

for i in $(cat list-names);do grep $i a.csv | awk '{print$6}';done 

where a.csv:

0  K1  Somebody1    LOC1  2020-05-12  17:00
1  K1  Somebody1    LOC1  2020-05-12  20:10

Also, to grab the difference in Hours do:

awk '
    NR == 1{old = $6; next}     
    {print $6 - old; old = $6}  
' a.csv

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.