0

I have this simple dataframe df:

User,C,G
111,ar,1
112,es,1
112,es,1
112,es,2
113,ca,2
113,ca,3
113,ca,3
114,en,4

I grouped that by:

result = df.groupby(['User','G'])['C'].value_counts()

obtaining:

User  G    
111   1  ar    1
112   1  es    2
      2  es    1
113   2  ca    1
      3  ca    2
114   4  en    1

My goal is then to keep only the rows with the maximum value_count per group, so that the resulting dataframe appears:

User  G    
111   1  ar    1
112   1  es    2
113   3  ca    2
114   4  en    1

I found also this question related to a similar issues, but I can't figured out how to apply that method in my case.

2 Answers 2

2

You can first create a mask to indicate whether a particular row is maximum in its group.

mask = result.groupby(level='User').apply(lambda g: g == g[g.idxmax()])
mask

User  G    
111   1  ar     True
112   1  es     True
      2  es    False
113   2  ca    False
      3  ca     True
114   4  en     True
dtype: bool

And then, select using this boolean mask

result[mask]

User  G    
111   1  ar    1
112   1  es    2
113   3  ca    2
114   4  en    1
dtype: int64
Sign up to request clarification or add additional context in comments.

1 Comment

What about multiple equal occurrences? Is there a way to select only single maximum values? I mean, sometimes can happen that the maximum values is not unique...
0

Years later :

import pandas as pd

data_rows = [
    {'User': 111, 'C': 'ar', 'G': 1},
    {'User': 112, 'C': 'es', 'G': 1},
    {'User': 112, 'C': 'es', 'G': 1}, 
    {'User': 113, 'C': 'es', 'G': 2},
    {'User': 113, 'C': 'ca', 'G': 2},
    {'User': 113, 'C': 'ca', 'G': 3},
    {'User': 113, 'C': 'ca', 'G': 3}, 
    {'User': 114, 'C': 'en', 'G': 4},
]
df = pd.DataFrame(data_rows)

'''
 User     C  G
0   111  ar  1
1   112  es  1
2   112  es  1
3   113  es  2
4   113  ca  2
5   113  ca  3
6   113  ca  3
7   114  en  4
'''

res = df.groupby(['User','G'])['C'].value_counts() 
'''
User  G  C 
111   1  ar    1
112   1  es    2
113   2  ca    1
         es    1
      3  ca    2
114   4  en    1
Name: count, dtype: int64
'''

res = res.loc[res.groupby('User').idxmax()]
print(res)
'''
User  G  C 
111   1  ar    1
112   1  es    2
113   3  ca    2
114   4  en    1
Name: count, dtype: int64
'''

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.