Python Pandas: Group by and count distinct value over all columns?

Question

I have df

      column1  column2  column3  column4
0    name        True        True         NaN
1    name        NaN        True         NaN
2   name1        NaN        True         True 
3   name1        True        True       True

and I would like to Group by and count distinct value over all columnsI am trying :

df.groupby('column1').nunique()

but I am receiving this error.

AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'

Anybody have a suggestion?

jezrael · Accepted Answer · 2016-10-13 00:00:37Z

2

You can use stack for Series and then Series.groupby with SeriesGroupBy.nunique:

df1 = df.set_index('column1').stack()

print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack())

Sample:

print (df)
  column1 column2 column3 column4
0    name    True    True     NaN
1    name     NaN    True     NaN
2   name1     NaN    True    True
3   name1    True    True    True

df1 = df.set_index('column1').stack(dropna=False)
print (df1)
column1         
name     column2    True
         column3    True
         column4     NaN
         column2     NaN
         column3    True
         column4     NaN
name1    column2     NaN
         column3    True
         column4    True
         column2    True
         column3    True
         column4    True
dtype: object

print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack(fill_value=0))
         column2  column3  column4
column1                           
name           2        1        1
name1          2        1        1

print (df1.groupby(level=[0,1]).nunique().unstack(fill_value=0))
         column2  column3  column4
column1                           
name           1        1        0
name1          1        1        1

Another solution with double apply:

print (df.groupby('column1')
         .apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique(dropna=False))))
         column2  column3  column4
column1                           
name           2        1        1
name1          2        1        1

print (df.groupby('column1').apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())))
         column2  column3  column4
column1                           
name           1        1        0
name1          1        1        1

edited Oct 13, 2016 at 0:00

answered Oct 12, 2016 at 17:19

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

UserYmY Over a year ago

This only gives me count of the first column. I need unique counts of all columns

jezrael Over a year ago

Sorry, I bad understand it. But now I think it is correct - I add groupby by both level of index of Series.

UserYmY Over a year ago

yeah with the correction it works only that i need to unstack it to get a proper dataframe

UserYmY Over a year ago

just found out that the counts are not correct. because my data only has value True or NaN. How is that possible that the counts are wrong with your method?

jezrael Over a year ago

You need add only parameter dropna=False to nunique - print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack())

|

Collectives™ on Stack Overflow

Python Pandas: Group by and count distinct value over all columns?

1 Answer 1

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related