3

I am trying to resample a pandas dataframe, and for some columns I would like to sum on. additionally, I want to get None/nan as result when there is no rows in a resampling period. For aggregation on a single column, I can do the following:

df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg("sum", min_count=1)

according to pandas doc, the keyword argument min_count will be passed to resample.Resampler.sum associated with the string "sum". and the result is desired.

           value
2020-01-01  None

However, this won't work if I pass a dictionary as agg input, e.g.

df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')], columns=['value'])
df.resample('5min').agg({'value': 'sum'}, min_count=1)

will output:

           value
2020-01-01     0

I would like to know the right way to pass arguments to the aggregation functions specified inside the dict.

1 Answer 1

2

This is currently not possible. There is/was a similar issue with agg.

Assuming multiple columns:

df = pd.DataFrame(index=[pd.to_datetime('2020-01-01')],
                  columns=['value', 'value2', 'value3'])

If you want to apply the same aggregation, just slice before resample.agg:

out = df.resample('5min')[['value', 'value2']].agg('sum', min_count=1)

Output:

           value value2
2020-01-01  None   None

If you need different aggregation functions, use a dictionary and concat:

funcs = {'value': 'sum', 'value2': 'min'}

r = df.resample('5min')
out = pd.concat({k: r[k].agg([v], min_count=1)
                 for k, v in funcs.items()}, axis=1)

Output:

           value value2
             sum    min
2020-01-01  None    NaN

And if you need different aggregation functions and different kwargs:

funcs = {'value': 'sum', 'value2': 'min'}
kwargs = {'value2': {'min_count': 1}}

r = df.resample('5min')

out = pd.concat({k: r[k].agg([v], **kwargs.get(k, {}))
                 for k, v in funcs.items()}, axis=1)

Output:

           value value2
             sum    min
2020-01-01     0    NaN
Sign up to request clarification or add additional context in comments.

3 Comments

will the method you proposed has impact on performance much?
It will have an impact, but using a dictionary already has a significant impact in the first place. For instance, using a input with 10K rows and 3 columns and pre-computing the resampler, this gives r.agg('sum') -> 402 µs ± 40.6 µs ; r.agg({'value': 'sum', 'value2': 'sum', 'value3': 'sum'}) -> 1.46 ms ± 112 µs ; and for the concat approach -> 2.15 ms ± 60.4 µs.
If you have multiple columns with the same aggregation/kwargs, then the best would be to combine those in a single agg call, then concat with other aggregations. If you want more details oriented to performance you might want to provide a reproducible example, and maybe open a follow-up question?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.