Python Pandas max value in a group as a new column

Question

I am trying to calculate a new column which contains maximum values for each of several groups. I'm coming from a Stata background so I know the Stata code would be something like this:

by group, sort: egen max = max(odds)

For example:

data = {'group' : ['A', 'A', 'B','B'],
    'odds' : [85, 75, 60, 65]}

Then I would like it to look like:

    group    odds    max
     A        85      85
     A        75      85
     B        60      65
     B        65      65

Eventually I am trying to form a column that takes 1/(max-min) * odds where max and min are for each group.

jpp · Accepted Answer · 2019-01-09 19:47:15Z

55

Use groupby + transform:

df['max'] = df.groupby('group')['odds'].transform('max')

This is equivalent to the verbose:

maxima = df.groupby('group')['odds'].max()
df['max'] = df['group'].map(maxima)

The transform method aligns the groupby result to the groupby indexer, so no explicit mapping is required.

answered Jan 9, 2019 at 19:47

jpp

166k37 gold badges301 silver badges363 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

BPC · Accepted Answer · 2020-01-10 19:16:17Z

4

Using the approach from jpp above works, but it also gives a "SettingWithCopyWarning". While this may not be an issue, I believe the code below would remove that warning:

df = df.assign(max = df.groupby('group')['odds'].transform('max')).values

answered Jan 10, 2020 at 19:16

BPC

1097 bronze badges

2 Comments

jpp Over a year ago

Be careful, you are assigning a NumPy array (values attribute of a dataframe) to df. I don't think that's what you want.

Jinto Lonappan Over a year ago

I had to convert the NumPy Array to DF again. Still, this was a much faster solution.

toniitony · Accepted Answer · 2017-05-12 04:38:06Z

0

df['max'] = df.group_col.map(lambda x: df.groupby('group_col').odds.max()[x])

answered May 12, 2017 at 4:38

toniitony

931 silver badge10 bronze badges

4 Comments

Adnan Umer Over a year ago

It will be better if you can explain a bit your answer. Only code is not acceptable on SO.

toniitony Over a year ago

The lambda function does a groupby on group_col and returns the maximum values of the odds column in each group. The indices of these returned values are the name of the group they belong to. So for each element in group_col, we map the appropriate maximum value by doing (lambda x (the group name): groupby_returns_max_values [x]).

jpp Over a year ago

A lambda function isn't necessary here; you could use a series mapping directly. But, better, use groupby + transform (as show in another answer).

nick Over a year ago

The answer from @jpp is much faster for large dataframes

Collectives™ on Stack Overflow

Python Pandas max value in a group as a new column

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related