93

This code:

df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)
g = df2.groupby('X')
pd.pivot_table(g, values='X', rows='Y', cols='Z', margins=False, aggfunc='count')

returns the following error:

Traceback (most recent call last): ... 
AttributeError: 'Index' object has no attribute 'index'

How do I get a Pivot Table with counts of unique values of one DataFrame column for two other columns?
Is there aggfunc for count unique? Should I be using np.bincount()?

NB. I am aware of pandas.Series.values_counts() however I need a pivot table.


EDIT: The output should be:

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1
0

9 Answers 9

140

Do you mean something like this?

>>> df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=lambda x: len(x.unique()))

Z   Z1  Z2  Z3
Y             
Y1   1   1 NaN
Y2 NaN NaN   1

Note that using len assumes you don't have NAs in your DataFrame. You can do x.value_counts().count() or len(x.dropna().unique()) otherwise.

Sign up to request clarification or add additional context in comments.

1 Comment

The issue with pd.NA can be avoided by the native pandas unique counter: lambda x: x.nunique() as the aggfunc. I briefly tested this with float("nan") and pd.NA and the function does not count the nulls as unique values.
66

This is a good way of counting entries within .pivot_table:

>>> df2.pivot_table(values='X', index=['Y','Z'], columns='X', aggfunc='count')

        X1  X2
Y   Z       
Y1  Z1   1   1
    Z2   1  NaN
Y2  Z3   1  NaN

3 Comments

This does exactly what is required without an obscure lambda.
Note: Pandas no longer accepts rows/cols as parameters. pandas.pydata.org/pandas-docs/stable/generated/…
@Alper that's incorrect. Using 'count' will count all instances, not only unique ones. This reply does not answer the question.
46

Since at least version 0.16 of pandas, it does not take the parameter "rows"

As of 0.23, the solution would be:

df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=pd.Series.nunique)

which returns:

Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

Comments

13

aggfunc=pd.Series.nunique provides distinct count. Full code is following:

df2.pivot_table(values='X', rows='Y', cols='Z', aggfunc=pd.Series.nunique)

Credit to @hume for this solution (see comment under the accepted answer). Adding as an answer here for better discoverability.

Comments

8
out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique', 'count', lambda x: len(x.unique()), len])

[out]:
             nunique           count           <lambda>            len          
Z       Z1   Z2   Z3    Z1   Z2   Z3       Z1   Z2   Z3   Z1   Z2   Z3
Y                                                                     
Y1     1.0  1.0  NaN   2.0  1.0  NaN      1.0  1.0  NaN  2.0  1.0  NaN
Y2     NaN  NaN  1.0   NaN  NaN  1.0      NaN  NaN  1.0  NaN  NaN  1.0


out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc='nunique')

[out]:
Z    Z1   Z2   Z3
Y                
Y1  1.0  1.0  NaN
Y2  NaN  NaN  1.0

out = df2.pivot_table(values='X', index='Y', columns='Z', aggfunc=['nunique'])

[out]:
             nunique          
Z       Z1   Z2   Z3
Y                   
Y1     1.0  1.0  NaN
Y2     NaN  NaN  1.0

Comments

4

You can construct a pivot table for each distinct value of X. In this case,

for xval, xgroup in g:
    ptable = pd.pivot_table(xgroup, rows='Y', cols='Z', 
        margins=False, aggfunc=numpy.size)

will construct a pivot table for each value of X. You may want to index ptable using the xvalue. With this code, I get (for X1)

     X        
Z   Z1  Z2  Z3
Y             
Y1   2   1 NaN
Y2 NaN NaN   1

1 Comment

Thank you. However I am not counting the number of occurrences of each distinct value of X, I am counting the number of distinct values in X for Y and Z.
1

aggfunc=pd.Series.nunique will only count unique values for a series - in this case count the unique values for a column. But this doesn't quite reflect as an alternative to aggfunc='count'

For simple counting, it better to use aggfunc=pd.Series.count

Comments

1

Since none of the answers are up to date with the last version of Pandas, I am writing another solution for this problem:

import pandas as pd

# Set example
df2 = (
    pd.DataFrame({
        'X' : ['X1', 'X1', 'X1', 'X1'], 
        'Y' : ['Y2', 'Y1', 'Y1', 'Y1'], 
        'Z' : ['Z3', 'Z1', 'Z1', 'Z2']
    })
)

# Pivot
pd.crosstab(index=df2['Y'], columns=df2['Z'], values=df2['X'], aggfunc=pd.Series.nunique)

which returns:

Z   Z1  Z2  Z3
Y           
Y1  1.0 1.0 NaN
Y2  NaN NaN 1.0

1 Comment

Actually, for count the frequency, pd.crosstab is preferable than pivot table.
0

For best performance I recommend doing DataFrame.drop_duplicates followed up aggfunc='count'.

Others are correct that aggfunc=pd.Series.nunique will work. This can be slow, however, if the number of index groups you have is large (>1000).

So instead of (to quote @Javier)

df2.pivot_table('X', 'Y', 'Z', aggfunc=pd.Series.nunique)

I suggest

df2.drop_duplicates(['X', 'Y', 'Z']).pivot_table('X', 'Y', 'Z', aggfunc='count')

This works because it guarantees that every subgroup (each combination of ('Y', 'Z')) will have unique (non-duplicate) values of 'X'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.