Slicing a MultiIndex DataFrame by multiple values from a specified level

Question

I want to slice a MultiIndex DataFrame by multiple values from a secondary level. For example, in the following DataFrame:

                val1  val2
ind1 ind2 ind3            
1    6    s1      10     8
2    7    s1      20     6
3    8    s2      30     4
4    9    s2      50     2
5    10   s3      60     0

I wish to slice only the rows in which ind3 == s1 or ind3 == s3:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Best hypothetical option would be to pass multiple arguments to .xs, since it is possible to explicitly state the desired level.

I could obviously concat all the sliced-by-single-value DataFrames:

In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists).

Here's the code to build the example DataFrame:

import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])

Alex · Accepted Answer · 2016-08-04 18:15:49Z

14

As with most selection from a DataFrame, you can use a mask or an indexer (loc in this case).

To get the mask, you can use get_level_values (docs) on the MultiIndex followed by isin (docs).

m = df.index.get_level_values('ind3').isin(['s1', 's3'])
df[m].reset_index(level=2, drop=True)

To use loc:

df.loc[(slice(None), slice(None), ['s1', 's3']), :].reset_index(level=2, drop=True)

both output

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Note: the loc way can also be written as seen in Alberto Garcia-Raboso's answer. Many people prefer that syntax as it is more consistent with loc syntax for an Index. Both syntax styles are discussed in the docs.

edited Aug 4, 2016 at 18:15

answered Aug 4, 2016 at 17:53

Alex

19.2k9 gold badges65 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

danielhadar Over a year ago

Thanks for the reply! Notice I'm looking for a result w.o. 'ind3'.

Alicia Garcia-Raboso · Accepted Answer · 2016-08-04 17:54:49Z

11

You can use an IndexSlice:

idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)

Output:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

The second line above can also be written as

result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]

answered Aug 4, 2016 at 17:54

Alicia Garcia-Raboso

14k1 gold badge47 silver badges48 bronze badges

3 Comments

Alex Over a year ago

IndexSlice - nice!

danielhadar Over a year ago

@AlbertoGarcia-Raboso Thanks for your reply! 2nd answer was first so I approved it, but yours works as well.

ylnor Over a year ago

That should be the chosen answer. Beautiful!

Collectives™ on Stack Overflow

Slicing a MultiIndex DataFrame by multiple values from a specified level

2 Answers 2

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related