7

I have a Pandas DataFrame that is multiindexed and want to find the minimum value of a certain column in a subset of rows on each level, and get the entire contents of those rows.

import pandas as pd

idx = pd.MultiIndex.from_product([['v1', 'v2'],
                                  ['record' + str(i) for i in range(1, 7)]])

df = pd.DataFrame([[2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140],
                   [2., 114], [2., 1140],
                   [3., 114], [3., 1140],
                   [5., 114], [5., 1140]],
                  columns=['col1', 'col2'],
                  index=idx)

My structure:

                 col1  col2
level1 level2
v1     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140
v2     record1    2.0   114
       record2    2.0  1140
       record3    3.0   114
       record4    3.0  1140
       record5    5.0   114
       record6    5.0  1140

Example desired output I want the minimum value of another column where col1 == 5:

                 col1  col2
level1 level2
v1     record5    5.0   114
v2     record5    5.0   114

I know that I can get a subset of rows by using a comparison statement.

df.ix[df['col1'] == 5]

And I also know that I can get the minimum values of a column within that subset from all levels.

df['col2'][df['col1'] == 5].min(level='level1')

And if I want to specify the level, then I can get the index of 1 row on specific level.

df.ix['v1', pay_up_file.ix['v1']['col2'][(df.ix['v1']['col1'] == 5)].idxmin()]

But I cannot figure out if there is an efficient way to get the indexes from all levels

There does not seem to be a method available along the lines of this:

df['col2'][df['col1'] == 5].idxmin(level='level1')

I can get to what I want with this:

df.ix[
  (df['col1'] == 5) & 
  (df['col2'].isin(df['col2'][df['col1'] == 5].min(level='level1').values))
]

But with everything else that is in Pandas, is there a better way to get to my output?

1
  • It is at the top, but i bolded it and put some larger line breaks to make it more clear Commented Jun 16, 2016 at 18:00

2 Answers 2

6

This should work:

df.loc[df.loc[df.col1 == 5.].groupby(level=0).col2.idxmin()]

            col1  col2
v1 record5   5.0   114
v2 record5   5.0   114

Note

I'm using idxmin as you thought you ought to. But the context matters. I'm using it following a groupby(level=0).col2.idxmin() which acts as you thought col2.idxmin(level=...) should.

Sign up to request clarification or add additional context in comments.

Comments

1
>>> (df[df.col1 == 5]
     .groupby(level=0, as_index=False).col2
     .apply(lambda group: group.nsmallest(1))
0  v1  record5    114
1  v2  record5    114
dtype: int64

Or...

>>> df[df.col1 == 5].groupby(level=0).col2.nsmallest(1)
v1  v1  record5    114
v2  v2  record5    114
dtype: int64

But I'm not sure why the first level shows twice (i.e. 'v1' 'v1' ...).

1 Comment

Wouldn't .nth(0) just take the first row, rather than the minimum? In my example, the minimum was always the first, but that might not always be the case in production

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.