Get index of the minimum of multi-index Pandas DataFrame using level

Question

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows.

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
                                  ['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140],
                   [2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140]],
                  columns=['col1', 'col2'],
                  index=idx)

My structure:

                 col1  col2
level1 level2
v1     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140
v2     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140

Example desired output I want the minimum value of another column where col1 == 5:

                 col1  col2
level1 level2
v1     record5    5.0   114
v2     record5    5.0   114

I know that I can get a subset of rows by using a comparison statement.

df.ix[df['col1'] == 5]

And I also know that I can get the minimum values of a column within that subset from all levels.

df['col2'][df['col1'] == 5].min(level='level1')

And if I want to specify the level, then I can get the index of 1 row on specific level.

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]

But I cannot figure out if there is an efficient way to get the indexes from all levels

There does not seem to be a method available along the lines of this:

df['col2'][df['col1'] == 5].idxmin(level='level1')

I can get to what I want with this:

df.ix[
  (df['col1'] == 5) & 
  (df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]

But with everything else that is in Pandas, is there a better way to get to my output?

It is at the top, but i bolded it and put some larger line breaks to make it more clear — getglad
– getglad, Commented Jun 16, 2016 at 18:00

piRSquared · Accepted Answer · 2016-06-16 18:29:42Z

6

This should work:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note

I'm using idxmin as you thought you ought to. But the context matters. I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should.

edited Jun 16, 2016 at 18:29

answered Jun 16, 2016 at 18:02

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Alexander · Accepted Answer · 2016-06-17 00:07:22Z

1

>>> (df[df.col1 == 5]
     .groupby(level=0, as_index=False).col2
     .apply(lambda group: group.nsmallest(1))
0  v1  record5    114
1  v2  record5    114
dtype: int64

Or...

>>> df[df.col1 == 5].groupby(level=0).col2.nsmallest(1)
v1  v1  record5    114
v2  v2  record5    114
dtype: int64

But I'm not sure why the first level shows twice (i.e. 'v1' 'v1' ...).

edited Jun 17, 2016 at 0:07

answered Jun 16, 2016 at 18:29

Alexander

111k32 gold badges212 silver badges208 bronze badges

1 Comment

getglad Over a year ago

Wouldn't .nth(0) just take the first row, rather than the minimum? In my example, the minimum was always the first, but that might not always be the case in production

Collectives™ on Stack Overflow

Get index of the minimum of multi-index Pandas DataFrame using level

2 Answers 2

Note

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Note

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related