0

I have df

      column1  column2  column3  column4
0    name        True        True         NaN
1    name        NaN        True         NaN
2   name1        NaN        True         True 
3   name1        True        True       True 

and I would like to Group by and count distinct value over all columnsI am trying :

df.groupby('column1').nunique()

but I am receiving this error.

AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'

Anybody have a suggestion?

1 Answer 1

2

You can use stack for Series and then Series.groupby with SeriesGroupBy.nunique:

df1 = df.set_index('column1').stack()

print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack())

Sample:

print (df)
  column1 column2 column3 column4
0    name    True    True     NaN
1    name     NaN    True     NaN
2   name1     NaN    True    True
3   name1    True    True    True

df1 = df.set_index('column1').stack(dropna=False)
print (df1)
column1         
name     column2    True
         column3    True
         column4     NaN
         column2     NaN
         column3    True
         column4     NaN
name1    column2     NaN
         column3    True
         column4    True
         column2    True
         column3    True
         column4    True
dtype: object

print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack(fill_value=0))
         column2  column3  column4
column1                           
name           2        1        1
name1          2        1        1

print (df1.groupby(level=[0,1]).nunique().unstack(fill_value=0))
         column2  column3  column4
column1                           
name           1        1        0
name1          1        1        1

Another solution with double apply:

print (df.groupby('column1')
         .apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique(dropna=False))))
         column2  column3  column4
column1                           
name           2        1        1
name1          2        1        1

print (df.groupby('column1').apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())))
         column2  column3  column4
column1                           
name           1        1        0
name1          1        1        1
Sign up to request clarification or add additional context in comments.

6 Comments

This only gives me count of the first column. I need unique counts of all columns
Sorry, I bad understand it. But now I think it is correct - I add groupby by both level of index of Series.
yeah with the correction it works only that i need to unstack it to get a proper dataframe
just found out that the counts are not correct. because my data only has value True or NaN. How is that possible that the counts are wrong with your method?
You need add only parameter dropna=False to nunique - print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack())
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.