15

I want to slice a MultiIndex DataFrame by multiple values from a secondary level. For example, in the following DataFrame:

                val1  val2
ind1 ind2 ind3            
1    6    s1      10     8
2    7    s1      20     6
3    8    s2      30     4
4    9    s2      50     2
5    10   s3      60     0

I wish to slice only the rows in which ind3 == s1 or ind3 == s3:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Best hypothetical option would be to pass multiple arguments to .xs, since it is possible to explicitly state the desired level.

I could obviously concat all the sliced-by-single-value DataFrames:

In[2]: pd.concat([df.xs('s1',level=2), df.xs('s3',level=2)])
Out[2]:
           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

But (a) it's tedious and not so readable when using more than 2 values, and (b) for large DataFrames it's quite heavy (or at least heavier than a multi-value slicing option, if that exists).

Here's the code to build the example DataFrame:

import pandas as pd
df = pd.DataFrame({'ind1':[1,2,3,4,5], 'ind2':[6,7,8,9,10], 'ind3':['s1','s1','s2','s2','s3'], 'val1':[10,20,30,50,60], 'val2':[8,6,4,2,0]}).set_index(['ind1','ind2','ind3'])

2 Answers 2

14

As with most selection from a DataFrame, you can use a mask or an indexer (loc in this case).

To get the mask, you can use get_level_values (docs) on the MultiIndex followed by isin (docs).

m = df.index.get_level_values('ind3').isin(['s1', 's3'])
df[m].reset_index(level=2, drop=True)

To use loc:

df.loc[(slice(None), slice(None), ['s1', 's3']), :].reset_index(level=2, drop=True)

both output

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

Note: the loc way can also be written as seen in Alberto Garcia-Raboso's answer. Many people prefer that syntax as it is more consistent with loc syntax for an Index. Both syntax styles are discussed in the docs.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply! Notice I'm looking for a result w.o. 'ind3'.
11

You can use an IndexSlice:

idx = pd.IndexSlice
result = df.loc[idx[:, :, ['s1', 's3']], idx[:]]
result.index = result.index.droplevel('ind3')
print(result)

Output:

           val1  val2
ind1 ind2            
1    6       10     8
2    7       20     6
5    10      60     0

The second line above can also be written as

result = df.loc(axis=0)[idx[:, :, ['s1', 's3']]]

3 Comments

IndexSlice - nice!
@AlbertoGarcia-Raboso Thanks for your reply! 2nd answer was first so I approved it, but yours works as well.
That should be the chosen answer. Beautiful!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.