10

I have a large dataframe (from 500k to 1M rows) which contains for example these 3 numeric columns: ID, A, B

I want to filter the results in order to obtain a table like the one in the image below, where, for each unique value of column id, i have the maximum and minimum value of A and B. How can i do?

EDIT: i have updated the image below in order to be more clear: when i get the max or min from a column i need to get also the data associated to it of the others columns

enter image description here

2
  • what do you do if several rows have the same min or max value/ Commented Nov 13, 2016 at 0:27
  • good question by @Boud. It didn't matter in my first answer (which interpreted the question incorrectly). My current answer is using idxmax/min which will take the first row in the event of ties. I think this becomes a much harder question without some assumption like that... Commented Nov 13, 2016 at 17:53

2 Answers 2

9

Sample data (note that you posted an image which can't be used by potential answerers without retyping, so I'm making a simple example in its place):

df=pd.DataFrame({ 'id':[1,1,1,1,2,2,2,2],
                   'a':range(8), 'b':range(8,0,-1) })

The key to this is just using idxmax and idxmin and then futzing with the indexes so that you can merge things in a readable way. Here's the whole answer and you may wish to examine intermediate dataframes to see how this is working.

df_max = df.groupby('id').idxmax()
df_max['type'] = 'max'
df_min = df.groupby('id').idxmin()
df_min['type'] = 'min'

df2 = df_max.append(df_min).set_index('type',append=True).stack().rename('index')

df3 = pd.concat([ df2.reset_index().drop('id',axis=1).set_index('index'), 
                  df.loc[df2.values] ], axis=1 )

df3.set_index(['id','level_2','type']).sort_index()

                 a  b
id level_2 type      
1  a       max   3  5
           min   0  8
   b       max   0  8
           min   3  5
2  a       max   7  1
           min   4  4
   b       max   4  4
           min   7  1

Note in particular that df2 looks like this:

id  type   
1   max   a    3
          b    0
2   max   a    7
          b    4
1   min   a    0
          b    3
2   min   a    4
          b    7

The last column there holds the index values in df that were derived with idxmax & idxmin. So basically all the information you need is in df2. The rest of it is just a matter of merging back with df and making it more readable.

Sign up to request clarification or add additional context in comments.

1 Comment

many thanks for your response. This is almost what i need. I have updated the question in order to be more clear
4

For anyone looking to get min and max values of a specific column where there is a unique ID, this is how I modified the above code:

df_maxA = df.groupby('id').max()['A']
df_maxA['type'] = 'max'
df_minA = df.groupby('id').max()['A']
df_minA['type'] = 'min'

df_maxB = df.groupby('id').max()['B']
df_maxB['type'] = 'max'
df_minB = df.groupby('id').max()['B']
df_minB['type'] = 'min'

Then you can merge these together to create a single dataframe.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.