2

I'm using pandas to count unique combinations of sets of variables in a dataframe. I'm currently using the .groupby() function, but I think I'm missing part of it's functionality.

Example code:

import pandas
df = pd.DataFrame([['A','C','G'],
                   ['A','C','H'],
                   ['A','D','G'],
                   ['A','D','H'],
                   ['B','E','I'],
                   ['B','F','I']], columns=['a','b','c'])
df

   a  b  c
0  A  C  G
1  A  C  H
2  A  D  G
3  A  D  H
4  B  E  I
5  B  F  I

Say I want to know, for every unique value a, how many different b's does it have? In this example, the desired output is A: 2, B:2 because A has two unique b values and B has two unique b values.

If I were counting the unique c's per a, I would expect A: 2, B: 1.

My current code is:

df.groupby(['a','b'],as_index=False).count().groupby(['a'], as_index=False).count()[['a','b']]

   a  b
0  A  2
1  B  2

df.groupby(['a','c'], as_index=False).count().groupby(['a'],as_index=False).count()[['a','c']]

   a  c
0  A  2
1  B  1

This gives me the correct result, but I think there should be a way to avoid two sets of groupby() and count(), no?

1 Answer 1

7

How about nunique?

df.groupby('a')['b'].nunique()
Out[36]: 
a
A    2
B    2
Name: b, dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

Great suggestion, this is exactly what I was looking for!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.