32

I am trying to calculate a new column which contains maximum values for each of several groups. I'm coming from a Stata background so I know the Stata code would be something like this:

by group, sort: egen max = max(odds) 

For example:

data = {'group' : ['A', 'A', 'B','B'],
    'odds' : [85, 75, 60, 65]}

Then I would like it to look like:

    group    odds    max
     A        85      85
     A        75      85
     B        60      65
     B        65      65

Eventually I am trying to form a column that takes 1/(max-min) * odds where max and min are for each group.

0

3 Answers 3

55

Use groupby + transform:

df['max'] = df.groupby('group')['odds'].transform('max')

This is equivalent to the verbose:

maxima = df.groupby('group')['odds'].max()
df['max'] = df['group'].map(maxima)

The transform method aligns the groupby result to the groupby indexer, so no explicit mapping is required.

Sign up to request clarification or add additional context in comments.

Comments

4

Using the approach from jpp above works, but it also gives a "SettingWithCopyWarning". While this may not be an issue, I believe the code below would remove that warning:

df = df.assign(max = df.groupby('group')['odds'].transform('max')).values

2 Comments

Be careful, you are assigning a NumPy array (values attribute of a dataframe) to df. I don't think that's what you want.
I had to convert the NumPy Array to DF again. Still, this was a much faster solution.
0
df['max'] = df.group_col.map(lambda x: df.groupby('group_col').odds.max()[x])

4 Comments

It will be better if you can explain a bit your answer. Only code is not acceptable on SO.
The lambda function does a groupby on group_col and returns the maximum values of the odds column in each group. The indices of these returned values are the name of the group they belong to. So for each element in group_col, we map the appropriate maximum value by doing (lambda x (the group name): groupby_returns_max_values [x]).
A lambda function isn't necessary here; you could use a series mapping directly. But, better, use groupby + transform (as show in another answer).
The answer from @jpp is much faster for large dataframes

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.