How to aggregate unique count with pandas pivot_table

Question

This code:

df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)
g = df2.groupby('X')
pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')

returns the following error:

Traceback (most recent call last): ... 
AttributeError: 'Index' object has no attribute 'index'

How do I get a Pivot Table with counts of unique values of one DataFrame column for two other columns?
Is there aggfunc for count unique? Should I be using np.bincount()?

NB. I am aware of pandas.Series.values_counts() however I need a pivot table.

EDIT: The output should be:

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Trenton McKinney · Accepted Answer · 2021-08-19 20:43:53Z

140

Do you mean something like this?

>>> df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=lambda x: len(x.unique()))

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Note that using len assumes you don't have NAs in your DataFrame. You can do x.value_counts().count() or len(x.dropna().unique()) otherwise.

edited Aug 19, 2021 at 20:43

Trenton McKinney

63.2k41 gold badges169 silver badges212 bronze badges

answered Oct 12, 2012 at 15:19

Chang She

17k8 gold badges43 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

telianer Jan 14 at 18:29

The issue with pd.NA can be avoided by the native pandas unique counter: lambda x: x.nunique() as the aggfunc. I briefly tested this with float("nan") and pd.NA and the function does not count the nulls as unique values.

Jaroslav Bezděk · Accepted Answer · 2021-04-07 11:31:31Z

66

This is a good way of counting entries within .pivot_table:

>>> df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')

        X1  X2
Y   Z       
Y1  Z1   1   1
    Z2   1  NaN
Y2  Z3   1  NaN

edited Apr 7, 2021 at 11:31

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Oct 28, 2013 at 8:48

julian peng

6855 silver badges2 bronze badges

3 Comments

Alper Over a year ago

This does exactly what is required without an obscure lambda.

RockyK Over a year ago

Note: Pandas no longer accepts rows/cols as parameters. pandas.pydata.org/pandas-docs/stable/generated/…

Fernando Wittmann Over a year ago

@Alper that's incorrect. Using 'count' will count all instances, not only unique ones. This reply does not answer the question.

Javier · Accepted Answer · 2018-07-16 17:45:37Z

46

Since at least version 0.16 of pandas, it does not take the parameter "rows"

As of 0.23, the solution would be:

df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)

which returns:

Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

answered Jul 16, 2018 at 17:45

Javier

4614 silver badges4 bronze badges

Comments

Jaroslav Bezděk · Accepted Answer · 2021-04-07 11:32:17Z

13

aggfunc=pd.Series.nunique provides distinct count. Full code is following:

df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)

Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.

edited Apr 7, 2021 at 11:32

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Jul 6, 2018 at 3:06

Manavalan Gajapathy

4,0902 gold badges23 silver badges49 bronze badges

Comments

Trenton McKinney · Accepted Answer · 2022-10-16 18:04:33Z

The aggfunc parameter in pandas.DataFrame.pivot_table will take 'nunique' as a string, or in a list
- pandas.Series.nunique or pandas.core.groupby.DataFrameGroupBy.nunique
Tested in pandas 1.5.0

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])

[out]:
             nunique           count           <lambda>            len          
Z       Z1   Z2   Z3    Z1   Z2   Z3       Z1   Z2   Z3   Z1   Z2   Z3
Y                                                                     
Y1     1.0  1.0  NaN   2.0  1.0  NaN      1.0  1.0  NaN  2.0  1.0  NaN
Y2     NaN  NaN  1.0   NaN  NaN  1.0      NaN  NaN  1.0  NaN  NaN  1.0


out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')

[out]:
Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])

[out]:
             nunique          
Z       Z1   Z2   Z3
Y                   
Y1     1.0  1.0  NaN
Y2     NaN  NaN  1.0

Pablo Navarro · Accepted Answer · 2012-10-12 15:21:39Z

4

You can construct a pivot table for each distinct value of X. In this case,

for xval, xgroup in g:
    ptable = pd.pivot_table(xgroup, rows='Y', cols='Z', 
        margins=False, aggfunc=numpy.size)

will construct a pivot table for each value of X. You may want to index ptable using the xvalue. With this code, I get (for X1)

     X        
Z   Z1  Z2  Z3
Y             
Y1   2   1 NaN
Y2 NaN NaN   1

answered Oct 12, 2012 at 15:21

Pablo Navarro

8,2742 gold badges45 silver badges52 bronze badges

1 Comment

dmi Over a year ago

Thank you. However I am not counting the number of occurrences of each distinct value of X, I am counting the number of distinct values in X for Y and Z.

Debajyoti Dutta · Accepted Answer · 2020-12-02 10:35:55Z

1

aggfunc=pd.Series.nunique will only count unique values for a series - in this case count the unique values for a column. But this doesn't quite reflect as an alternative to aggfunc='count'

For simple counting, it better to use aggfunc=pd.Series.count

answered Dec 2, 2020 at 10:35

Debajyoti Dutta

1111 silver badge4 bronze badges

Comments

Jaroslav Bezděk · Accepted Answer · 2021-04-07 11:35:54Z

1

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:

import pandas as pd

# Set example
df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)

# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)

which returns:

Z   Z1  Z2  Z3
Y           
Y1  1.0 1.0 NaN
Y2  NaN NaN 1.0

edited Apr 7, 2021 at 11:35

Jaroslav Bezděk

7,7156 gold badges34 silver badges59 bronze badges

answered Aug 8, 2019 at 18:33

Benoit Drogou

9691 gold badge5 silver badges15 bronze badges

1 Comment

Kardi Teknomo Over a year ago

Actually, for count the frequency, pd.crosstab is preferable than pivot table.

william_grisaitis · Accepted Answer · 2019-12-26 21:49:50Z

0

For best performance I recommend doing DataFrame.drop_duplicates followed up aggfunc='count'.

Others are correct that aggfunc=pd.Series.nunique will work. This can be slow, however, if the number of index groups you have is large (>1000).

So instead of (to quote @Javier)

df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)

I suggest

df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')

This works because it guarantees that every subgroup (each combination of ('Y', 'Z')) will have unique (non-duplicate) values of 'X'.

answered Dec 26, 2019 at 21:49

william_grisaitis

6,1304 gold badges46 silver badges57 bronze badges

Collectives™ on Stack Overflow

How to aggregate unique count with pandas pivot_table

9 Answers 9

1 Comment

3 Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

1 Comment

3 Comments

Comments

Comments

Comments

1 Comment

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related