1

I have an DataFrame like this:


    item_id     facility_id timestamp            min_inv     current_inv    demand
0   12          1185        2022-01-11 00:00:00  2.0         3.0            1
1   12          1185        2022-01-11 01:00:00  2.0         3.0            1
2   12          1185        2022-01-11 02:00:00  2.0         3.0            2
3   12          1185        2022-01-11 03:00:00  2.0         3.0            2
4   12          1185        2022-01-11 04:00:00  2.0         3.0            1
5   23          1185        2022-01-11 00:00:00  2.0         3.0            1
6   23          1185        2022-01-11 01:00:00  2.0         3.0            2
7   23          1185        2022-01-11 02:00:00  2.0         3.0            1
8   23          1185        2022-01-11 03:00:00  2.0         3.0            2
9   23          1185        2022-01-11 04:00:00  2.0         3.0            1
10  34          1185        2022-01-11 00:00:00  2.0         3.0            1
11  34          1185        2022-01-11 01:00:00  2.0         3.0            1
12  34          1185        2022-01-11 02:00:00  2.0         3.0            4
13  34          1185        2022-01-11 03:00:00  2.0         3.0            2
14  34          1185        2022-01-11 04:00:00  2.0         3.0            2
15  45          1185        2022-01-11 00:00:00  2.0         3.0            3
16  45          1185        2022-01-11 01:00:00  2.0         3.0            2
17  45          1185        2022-01-11 02:00:00  2.0         3.0            3
18  45          1185        2022-01-11 03:00:00  2.0         3.0            1
19  45          1185        2022-01-11 04:00:00  2.0         3.0            3

I want to groupby item_id and resample it for each 2 hour. In the resample, I want to use different aggfunc for different columns.

The way I'm doing this is:

tmp = input_df.groupby('item_id')
new_df = pd.DataFrame()
for group, df in tmp:
    tmp_df = df.set_index('timestamp').resample('2H').agg({'current_inv':'last',
                                                           'min_inv':'last',
                                                           'demand':'sum'})
    tmp_df['item_id'] = group

    new_df = pd.concat([new_df, tmp_df])

new_df gives the output.

Is there more pythonic way for doing this? I don't want to use for loop.

1 Answer 1

3

IIUC, you can use on='timestamp' as parameter of resample method to get your expect result in a vectorized way:

out = df.groupby('item_id').resample('2H', on='timestamp') \
        .agg({'current_inv': 'last', 'min_inv':'last', 'demand':'sum'}).reset_index()
print(out)

# Output
    item_id           timestamp  current_inv  min_inv  demand
0        12 2022-01-11 00:00:00          3.0      2.0       2
1        12 2022-01-11 02:00:00          3.0      2.0       4
2        12 2022-01-11 04:00:00          3.0      2.0       1
3        23 2022-01-11 00:00:00          3.0      2.0       3
4        23 2022-01-11 02:00:00          3.0      2.0       3
5        23 2022-01-11 04:00:00          3.0      2.0       1
6        34 2022-01-11 00:00:00          3.0      2.0       2
7        34 2022-01-11 02:00:00          3.0      2.0       6
8        34 2022-01-11 04:00:00          3.0      2.0       2
9        45 2022-01-11 00:00:00          3.0      2.0       5
10       45 2022-01-11 02:00:00          3.0      2.0       4
11       45 2022-01-11 04:00:00          3.0      2.0       3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.