Python pandas perform same aggregation on multiple columns

Question

I know that in pandas, I can do something like this, where I apply multiple aggregations to the same column:

import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2], 'x1':[0,1,0,1], 'x2':[1,0,1,0],'x3':[0,1,0,1], 'x4':[1,0,1,0]})
df.groupby('id').agg({'x1':['sum', 'max'], 'x2':['sum','max']})

Is there a syntax shortcut to do a similar thing, except this time apply the same aggregation to multiple columns? However, I am also looking to perform more than one type of aggregation.

Valid Syntax Example

df.groupby('id').agg({'x1':sum, 'x2':sum, 'x3':mean, 'x4':mean})

Desired Outcome Example

df.groupby('id').agg({['x1', 'x2']:sum, ['x3', 'x4']:mean})

I know this isn't a valid key-value pair, but hopefully illustrates what I'm aiming for. As to why I want to do this, my current aggregation statement is getting long and I am looking for ways to shorten it.

jezrael · Accepted Answer · 2020-01-05 17:49:48Z

3

If want use list in keys of dictionary it is not valid in python.

Close, what you need is specify columns after groupby, but it working only for one aggregate function:

df.groupby('id')['x1', 'x2'].sum()

Or:

df.groupby('id')['x1', 'x2'].agg('sum')

If want some more dynamic solution one is create dictionary of tuples and then flatten values, only is necessary all values unique in tuples, because dict by definition has unique keys:

d = {('x1', 'x2'):['sum','max'], ('x3', 'x4'):'mean'}
d1 = {x:v for k, v in d.items() for x in k}
print (d1)
{'x1': ['sum', 'max'], 'x2': ['sum', 'max'], 'x3': 'mean', 'x4': 'mean'}

print (df.groupby('id').agg(d1))
    x1      x2       x3   x4
   sum max sum max mean mean
id                          
1    1   1   1   1  0.5  0.5
2    1   1   1   1  0.5  0.5

edited Jan 5, 2020 at 17:49

answered Jan 5, 2020 at 17:25

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ZachTurn Over a year ago

This does work for a single aggregation, but I realize I need to update my question with more detail.

Parfait · Accepted Answer · 2020-01-05 18:28:55Z

0

Consider a dictionary comprehension using zip on equal length lists/tuples of multiple columns and aggregates. Then pass dictionary into groupby().agg:

cols = [['x1', 'x2'], ['x3', 'x4']]
aggs = ['sum', 'mean']

d = {c:a for col,a in zip(cols, aggs) for c in col}

df.groupby('id').agg(d) 
#     x1  x2   x3   x4
# id                  
# 1    1   1  0.5  0.5
# 2    1   1  0.5  0.5

answered Jan 5, 2020 at 18:28

Parfait

108k19 gold badges103 silver badges138 bronze badges

Collectives™ on Stack Overflow

Python pandas perform same aggregation on multiple columns

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related