define aggfunc for each values column in pandas pivot table

Question

Was trying to generate a pivot table with multiple "values" columns. I know I can use aggfunc to aggregate values the way I want to, but what if I don't want to sum or avg both columns but instead I want sum of one column while mean of the other one. So is it possible to do so using pandas?

df = pd.DataFrame({
          'A' : ['one', 'one', 'two', 'three'] * 6,
          'B' : ['A', 'B', 'C'] * 8,
          'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
          'D' : np.random.randn(24),
          'E' : np.random.randn(24)
})

Now this will get a pivot table with sum:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.sum)

And this for mean:

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=np.mean)

How can I get sum for D and mean for E?

Hope my question is clear enough.

DataSwede · Accepted Answer · 2014-07-02 23:13:24Z

76

You can apply a specific function to a specific column by passing in a dict.

pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc={'D':np.sum, 'E':np.mean})

answered Jul 2, 2014 at 23:13

DataSwede

5,62111 gold badges45 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

whytheq Over a year ago

Very nice answer. Elegant compared to the marked answer.

Philip Egger Over a year ago

I agree with the answer. However, in the most recent version of pandas, the keyword argument rows has been replaced by index. Running pd.pivot_table(df, values=['D','E'], index=['B'], aggfunc={'D':np.sum, 'E':np.mean}) worked for me.

Vinh Over a year ago

@PhilipEgger Could you share what worked with the new pandas version? That would help. Thank you

roman · Accepted Answer · 2013-11-21 13:18:17Z

28

You can concat two DataFrames:

>>> df1 = pd.pivot_table(df, values=['D'], rows=['B'], aggfunc=np.sum)
>>> df2 = pd.pivot_table(df, values=['E'], rows=['B'], aggfunc=np.mean)
>>> pd.concat((df1, df2), axis=1)
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

or you can pass list of functions as aggfunc parameter and then reindex:

>>> df3 = pd.pivot_table(df, values=['D','E'], rows=['B'], aggfunc=[np.sum, np.mean])
>>> df3
        sum                mean          
          D         E         D         E
B                                        
A  1.810847 -4.193425  0.226356 -0.524178
B  2.762190 -3.544245  0.345274 -0.443031
C  0.867519  0.627677  0.108440  0.078460
>>> df3 = df3.ix[:, [('sum', 'D'), ('mean','E')]]
>>> df3.columns = ['D', 'E']
>>> df3
          D         E
B                    
A  1.810847 -0.524178
B  2.762190 -0.443031
C  0.867519  0.078460

Alghouth, it would be nice to have an option to defin aggfunc for each column individually. Don't know how it could be done, may be pass into aggfunc dict-like parameter, like {'D':np.mean, 'E':np.sum}.

update Actually, in your case you can pivot by hand:

>>> df.groupby('B').aggregate({'D':np.sum, 'E':np.mean})
          E         D
B                    
A -0.524178  1.810847
B -0.443031  2.762190
C  0.078460  0.867519

edited Nov 21, 2013 at 13:18

answered Nov 21, 2013 at 11:50

roman

118k30 gold badges205 silver badges209 bronze badges

3 Comments

VIKASH JAISWAL Over a year ago

Thanks, both ways would work. however what I was hoping to get was some single step way. As you said being able to define function for each individual columns. I will wait if anyone else knows of any such way else I will accept yours as an answer in a while.

roman Over a year ago

@VIKASHJAISWAL see third method, I think this is what you need

VIKASH JAISWAL Over a year ago

Excellent. Exactly what I was looking for, infact this works for grouping by multi columns as well, df.groupby(['B','C']).aggregate({'D':np.sum, 'E':np.mean}). Thanks for your effort.

user10987461 · Accepted Answer · 2019-02-02 11:56:27Z

0

table = pivot_table(df, values=['D', 'E'], index=['A', 'C'],
                aggfunc={'D': np.mean,'E': np.sum})

table D E mean sum A C bar large 5.500000 7.500000 small 5.500000 8.500000 foo large 2.000000 4.500000 small 2.333333 4.333333

answered Feb 2, 2019 at 11:56

user10987461

1

Collectives™ on Stack Overflow

define aggfunc for each values column in pandas pivot table

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related