11

Best to illustrate by example:

I would like to aggregate a DataFrame by col1 and col2, summing results on col3 and col4 and averaging results on col5

If I just wanted to sum on col3-5 I'd use df.groupby(['col1','col2']).sum()

1
  • Would be good to have sample data and expected result? Commented Oct 19, 2015 at 15:00

1 Answer 1

20

You can use the Groupby.agg() (or Groupby.aggregate()) method for this.

aggregate() function can accept a dictionary as argument, in which case it treats the keys as the column names and the value as the function to use for aggregating. As given in the documentation -

By passing a dict to aggregate you can apply a different aggregation to the columns of a DataFrame.

Example -

import numpy as np
result = df.groupby(['col1','col2']).agg({'col3':'sum','col4':'sum','col5':np.average})

Demo -

In [50]: df = pd.DataFrame([[1,2,3,4,5],[1,2,6,7,8],[2,3,4,5,6]],columns=list('ABCDE'))

In [51]: df
Out[51]:
   A  B  C  D  E
0  1  2  3  4  5
1  1  2  6  7  8
2  2  3  4  5  6

In [52]: df.groupby(['A','B']).aggregate({'C':np.sum,'D':np.sum,'E':np.average})
Out[52]:
     C    E   D
A B
1 2  9  6.5  11
2 3  4  6.0   5
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks, is there a default type for all columns not mentioned?
I am sorry didn't get your question.
Say I want to sum over two specific columns, and average over all the rest, without specifically naming them
I don't think you can do that, but you can use dictionary comprehension to create the dictionary , example - {k:np.sum if k in {'col3','col4'} else k:np.average for k in df.columns if k not in {'col1','col2'} .
Great. Thank you very much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.