Groupby multiple column to find the unique count of one column using python pandas

Question

I have dataframe like:

column1    column2    column3
 ram        tall        good
 rohan      short       fine
 ajay       tall        best
 alia       tall        good
 aman       medium      fine
 john       short       good
 jack       short       fine

now i need output like:

unique count of good in tall, short, medium on basis of column1->

tall=2 , short=1 , medium=0

unique count of fine in tall, short, medium on basis of column1->

tall=0 , short=2 , medium=1

unique count of best in tall, short, medium on basis of column1->

tall=1 , short=0 , medium=0

I am beginner in pandas. Thanks in advance

Shubham Sharma · Accepted Answer · 2020-12-26 09:36:58Z

5

Let's try pd.crosstab:

pd.crosstab(df['column3'], df['column2'])

column2  medium  short  tall
column3                     
best          0      0     1
fine          1      2     0
good          0      1     2

answered Dec 26, 2020 at 9:36

Shubham Sharma

71.8k6 gold badges26 silver badges58 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dani Mesejo · Accepted Answer · 2020-12-26 09:41:54Z

1

Use value_counts + unstack

res = df[['column3', 'column2']].value_counts().unstack('column2', fill_value=0)
print(res)

Output

column2  medium  short  tall
column3                     
best          0      0     1
fine          1      2     0
good          0      1     2

As an alternative groupby + unstack:

res = df.groupby(['column3', 'column2']).count().unstack('column2', fill_value=0)
print(res)

Output (groupby)

        column1           
column2  medium short tall
column3                   
best          0     0    1
fine          1     2    0
good          0     1    2

The idea behind both approaches is to create an index and then unstack it. If you want to match the same order as specify in your question, convert to Categorical first:

df['column2'] = pd.Categorical(df['column2'], categories=['tall', 'short', 'medium'], ordered=True)
res = df[['column3', 'column2']].value_counts().unstack('column2', fill_value=0)
print(res)

Output

column2  tall  short  medium
column3                     
best        1      0       0
fine        0      2       1
good        2      1       0

edited Dec 26, 2020 at 9:41

answered Dec 26, 2020 at 9:31

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

3 Comments

ash Over a year ago

Thanks dani, one small doubt if i have same row multiple times in dataframe, then to find count can i replace .value_counts() with nunique() ?

Dani Mesejo Over a year ago

If you want to count the duplicates I would say use .value_counts, notice that you have the [fine, short] column 2 times

ash Over a year ago

yes correct, but for [fine, short] the column1 name is different. So it will be count as different rows. Okay got it thanks

Collectives™ on Stack Overflow

Groupby multiple column to find the unique count of one column using python pandas

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related