1

Here is my dataframe that I am working on. There are two pay periods defined: first 15 days and last 15 days for each month.

         date  employee_id hours_worked   id job_group  report_id
0  2016-11-14            2         7.50  385         B         43
1  2016-11-15            2         4.00  386         B         43
2  2016-11-30            2         4.00  387         B         43
3  2016-11-01            3        11.50  388         A         43
4  2016-11-15            3         6.00  389         A         43
5  2016-11-16            3         3.00  390         A         43
6  2016-11-30            3         6.00  391         A         43

I need to group by employee_id and job_group but at the same time I have to achieve date range for that grouped row.

For example grouped results would be like following for employee_id 1:

Expected Output:

         date  employee_id hours_worked  job_group  report_id
1  2016-11-15            2         11.50        B         43
2  2016-11-30            2         4.00         B         43
4  2016-11-15            3         17.50        A         43
5  2016-11-16            3         9.00         A         43

Is this possible using pandas dataframe groupby?

2 Answers 2

1

Use SM with Grouper and last add SemiMonthEnd:

df['date'] = pd.to_datetime(df['date'])

d = {'hours_worked':'sum','report_id':'first'}
df = (df.groupby(['employee_id','job_group',pd.Grouper(freq='SM',key='date', closed='right')])
       .agg(d)
       .reset_index())

df['date'] = df['date'] + pd.offsets.SemiMonthEnd(1)
print (df)
   employee_id job_group       date  hours_worked  report_id
0            2         B 2016-11-15          11.5         43
1            2         B 2016-11-30           4.0         43
2            3         A 2016-11-15          17.5         43
3            3         A 2016-11-30           9.0         43
Sign up to request clarification or add additional context in comments.

3 Comments

Ok tried but somehow its not working with data set i had provided.
@Kishan - Can you create minimal, complete, and verifiable example ? Or more data rows with expected output?
First row data is 2016-11-14 which should come under 2016-11-15 bracket but it went to 2016-10-31 bracket
1

a. First, (for each employee_id) use multiple Grouper with the .sum() on the hours_worked column. Second, use DateOffset to achieve bi-weekly date column. After these 2 steps, I have assigned the date in the grouped DF based on 2 brackets (date ranges) - if day of month (from the date column) is <=15, then I set the day in date to 15, else I set the day to 30. This day is then used to assemble a new date. I calculated month end day based on 1, 2.

b. (For each employee_id) get the .last() record for the job_group and report_id columns

c. merge a. and b. on the employee_id key

# a.
hours = (df.groupby([
            pd.Grouper(key='employee_id'),
            pd.Grouper(key='date', freq='SM')
                    ])['hours_worked']
            .sum()
            .reset_index())
hours['date'] = pd.to_datetime(hours['date'])
hours['date'] = hours['date'] + pd.DateOffset(days=14)

# Assign day based on bracket (date range) 0-15 or bracket (date range) >15
from pandas.tseries.offsets import MonthEnd
hours['bracket'] = hours['date'] + MonthEnd(0)
hours['bracket'] = pd.to_datetime(hours['bracket']).dt.day
hours.loc[hours['date'].dt.day <= 15, 'bracket'] = 15
hours['date'] = pd.to_datetime(dict(year=hours['date'].dt.year,
                                    month=hours['date'].dt.month,
                                    day=hours['bracket']))
hours.drop('bracket', axis=1, inplace=True)

# b.
others = (df.groupby('employee_id')['job_group','report_id']
            .last()
            .reset_index())

# c.
merged = hours.merge(others, how='inner', on='employee_id')

Raw data for employee_id==1 and employeeid==3

df.sort_values(by=['employee_id','date'], inplace=True)
print(df[df.employee_id.isin([1,3])])

    index       date  employee_id  hours_worked   id job_group  report_id
0       0 2016-11-14            1           7.5  481         A         43
10     10 2016-11-21            1           6.0  491         A         43
11     11 2016-11-22            1           5.0  492         A         43
15     15 2016-12-14            1           7.5  496         A         43
25     25 2016-12-21            1           6.0  506         A         43
26     26 2016-12-22            1           5.0  507         A         43
6       6 2016-11-02            3           6.0  487         A         43
4       4 2016-11-08            3           6.0  485         A         43
3       3 2016-11-09            3          11.5  484         A         43
5       5 2016-11-11            3           3.0  486         A         43
20     20 2016-11-12            3           3.0  501         A         43
21     21 2016-12-02            3           6.0  502         A         43
19     19 2016-12-08            3           6.0  500         A         43
18     18 2016-12-09            3          11.5  499         A         43

Output

print(merged)

    employee_id       date  hours_worked job_group  report_id
0             1 2016-11-15           7.5         A         43
1             1 2016-11-30          11.0         A         43
2             1 2016-12-15           7.5         A         43
3             1 2016-12-31          11.0         A         43
4             2 2016-11-15          31.0         B         43
5             2 2016-12-15          31.0         B         43
6             3 2016-11-15          29.5         A         43
7             3 2016-12-15          23.5         A         43
8             4 2015-03-15           5.0         B         43
9             4 2016-02-29           5.0         B         43
10            4 2016-11-15           5.0         B         43
11            4 2016-11-30          15.0         B         43
12            4 2016-12-15           5.0         B         43
13            4 2016-12-31          15.0         B         43

1 Comment

This answers is updated based on new data and explanation provided in comments section of OP - see here - direct source for new data is here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.