3

I know that in pandas, I can do something like this, where I apply multiple aggregations to the same column:

import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2], 'x1':[0,1,0,1], 'x2':[1,0,1,0],'x3':[0,1,0,1], 'x4':[1,0,1,0]})
df.groupby('id').agg({'x1':['sum', 'max'], 'x2':['sum','max']})

Is there a syntax shortcut to do a similar thing, except this time apply the same aggregation to multiple columns? However, I am also looking to perform more than one type of aggregation.

Valid Syntax Example

df.groupby('id').agg({'x1':sum, 'x2':sum, 'x3':mean, 'x4':mean})

Desired Outcome Example

df.groupby('id').agg({['x1', 'x2']:sum, ['x3', 'x4']:mean})

I know this isn't a valid key-value pair, but hopefully illustrates what I'm aiming for. As to why I want to do this, my current aggregation statement is getting long and I am looking for ways to shorten it.

2 Answers 2

3

If want use list in keys of dictionary it is not valid in python.


Close, what you need is specify columns after groupby, but it working only for one aggregate function:

df.groupby('id')['x1', 'x2'].sum()

Or:

df.groupby('id')['x1', 'x2'].agg('sum')

If want some more dynamic solution one is create dictionary of tuples and then flatten values, only is necessary all values unique in tuples, because dict by definition has unique keys:

d = {('x1', 'x2'):['sum','max'], ('x3', 'x4'):'mean'}
d1 = {x:v for k, v in d.items() for x in k}
print (d1)
{'x1': ['sum', 'max'], 'x2': ['sum', 'max'], 'x3': 'mean', 'x4': 'mean'}

print (df.groupby('id').agg(d1))
    x1      x2       x3   x4
   sum max sum max mean mean
id                          
1    1   1   1   1  0.5  0.5
2    1   1   1   1  0.5  0.5
Sign up to request clarification or add additional context in comments.

1 Comment

This does work for a single aggregation, but I realize I need to update my question with more detail.
0

Consider a dictionary comprehension using zip on equal length lists/tuples of multiple columns and aggregates. Then pass dictionary into groupby().agg:

cols = [['x1', 'x2'], ['x3', 'x4']]
aggs = ['sum', 'mean']

d = {c:a for col,a in zip(cols, aggs) for c in col}

df.groupby('id').agg(d) 
#     x1  x2   x3   x4
# id                  
# 1    1   1  0.5  0.5
# 2    1   1  0.5  0.5

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.