PANDAS Group By with Multiple Functions Applied

Question

I have a PANDAS dataframe with the following structure for example:

id,sex,age,rank,skill
1,M,9,1,A
1,M,8,2,G
1,M,10,3,F
2,F,10,3,M
2,F,8,4,W
2,F,6,4,O
3,M,5,1,Q
3,M,4,3,N
3,M,9,4,Y

Where my desired output after the groupby/apply operation to the dataframe is:

id,sex,age,rank,skill
1,M,8,1,A
2,F,6,3,M
3,M,4,1,Q

In other words, I am looking to groupby the id field, sex field does not change, the min() of age value, the min() of rank value, and the skill value that was present at the the min() of rank value.

I understand that multiple agg functions can be passed to the groupby in a dict, but it how to handle the values that are constant or depend on the results of a function in another field of the groupby I do not understand.

BENY · Accepted Answer · 2017-08-16 21:22:18Z

3

In you expected out put , it is min of rank , but in your explanation you mentioned it is max

My answer base on you expected output

df.groupby(['id','sex'],as_index=False).agg({'age':'min','rank':'min'}).\
merge(df.drop('age',1),on=['id','sex','rank'],how='left')

Out[931]: 
   id sex  age  rank skill
0   1   M    8     1     A
1   2   F    6     3     M
2   3   M    4     1     Q

answered Aug 16, 2017 at 21:22

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pylander Over a year ago

This is definitely the best approach. I find kbball's restructured version a little easier to follow.

BENY Over a year ago

@Pylander glad it help ~ also , step by step always is good to follow , will take consider in my future post.,Thank you ~ Nice Day

user2285236 · Accepted Answer · 2017-08-16 21:21:16Z

1

For columns that have constant values, you have several options: first, last, etc. For the skill value that corresponds to the highest (or minimum in your example) rank value, you need to use idxmin. For idxmin to work, skill should be the index so as the first step set it as index.

df.set_index('skill').groupby('id').agg({'sex': 'first', 
                                         'age': 'min', 
                                         'rank': ['min', 'idxmin']})
Out: 
     sex age rank       
   first min  min idxmin
id                      
1      M   8    1      A
2      F   6    3      M
3      M   4    1      Q

answered Aug 16, 2017 at 21:21

user2285236

3 Comments

Pylander Over a year ago

Thanks for explaining how to integrate idxmin. I also corrected the rank value to min(), my confusion. One additional question, is it simple to rename the idxmin output to the the original field name? I am likely to have many columns that need to be set this way in the real world example.

user2285236 Over a year ago

@Pylander Since the function is called on the rank column, the name will be associated with that. It is really a pain to rename multiindexes though. I'd construct a flat columns list from scratch myself. Wen's approach might be more suitable may be?

Pylander Over a year ago

Yes, I ended up seeing the limitations of this approach unfortunately. Very clean format though which I like. Wen's answer reformatted by kbball will work best in the end.

kjmerf · Accepted Answer · 2017-08-16 22:03:57Z

1

+1 for Wen.

Mine has a few more steps but it's the same idea and perhaps easier to read if you're not following:

func = {'sex': 'min', 'age': 'min', 'rank': 'min'}

df_agg = df.groupby('id').agg(func)
df_agg = df_agg.reset_index()

df = df.drop('age', 1)
df = pd.merge(df_agg, df, on = ['id', 'sex', 'rank'])

Set the aggregations you want to apply to each column. Then group by id, using agg. You need to reset the index at this point or else you won't be able to perform the merge in the next step, as id will be treated as the index.

df still stores your original data-frame. Drop age from df, as you'll only need the minimized age, stored in df_agg. Then perform the merge on the columns you'd expect to match: id, sex and rank. You are merging on rank to pull the correct skill along for the ride.

answered Aug 16, 2017 at 22:03

kjmerf

4,3853 gold badges24 silver badges29 bronze badges

1 Comment

Pylander Over a year ago

I have to give the credit to Wen, but I am using your adapted solution in the end. Thanks!

Collectives™ on Stack Overflow

PANDAS Group By with Multiple Functions Applied

3 Answers 3

2 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related