Pandas: how to do value counts within groups

Question

I have the following dataframe. I want to group by a and b first. Within each group, I need to do a value count based on c and only pick the one with most counts. If there are more than one c values for one group with the most counts, just pick any one.

a    b    c
1    1    x
1    1    y
1    1    y
1    2    y
1    2    y
1    2    z
2    1    z
2    1    z
2    1    a
2    1    a

The expected result would be

a    b    c
1    1    y
1    2    y
2    1    z

What is the right way to do it? It would be even better if I can print out each group with c's value counts sorted as an intermediate step.

@anky there are lots of duplicates in the data, not just for a and b but for a, b and c too. Part of the reason of doing this is to remove most of the duplicates — ddd
– ddd, Commented Apr 10, 2020 at 16:04
I get your point , but as per your question - If there are more than one c values for one group with the most counts, just pick any one , so a=2 and b=1 group has both z and a appearing twice , hence shouldnt just 1 be taken in the output? — anky
– anky, Commented Apr 10, 2020 at 16:06
What exactly is the issue? Have you tried anything, done any research? — AMC
– AMC, Commented Apr 10, 2020 at 17:19

gosuto · Accepted Answer · 2020-04-10 16:15:24Z

8

You are looking for .value_counts():

df.groupby(['a', 'b'])['c'].value_counts()

a  b  c
1  1  y    2
      x    1
   2  y    2
      z    1
2  1  a    2
      z    2
Name: c, dtype: int64

answered Apr 10, 2020 at 16:15

gosuto

5,8316 gold badges42 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

J. Doe · Accepted Answer · 2020-04-10 16:18:38Z

3

group the original dataframe by ['a', 'b'] and get the .max() should work

df.groupby(['a', 'b'])['c'].max()

you can also aggregate 'count' and 'max' values

df.groupby(['a', 'b'])['c'].agg({'max': max, 'count': 'count'}).reset_index()

edited Apr 10, 2020 at 16:18

answered Apr 10, 2020 at 15:51

J. Doe

3,6443 gold badges26 silver badges44 bronze badges

1 Comment

ddd Over a year ago

I know this is the final result I wanted but is there a way to sort c's occurrences within a group first?

Georgina Skibinski · Accepted Answer · 2020-04-10 18:26:46Z

1

Try:

df=df.groupby(["a", "b", "c"])["c"].count().sort_values(ascending=False).reset_index(name="dropme").drop_duplicates(subset=["a", "b"], keep="first").drop("dropme", axis=1)

Outputs:

answered Apr 10, 2020 at 18:26

Georgina Skibinski

13.5k2 gold badges16 silver badges44 bronze badges

Collectives™ on Stack Overflow

Pandas: how to do value counts within groups

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related