Simple pandas MultiIndex slicing

Question

I have the following DataFrame:

df_1 = DataFrame({
        "alpha" : [1,1,1,2,2,2,3,3,3] ,
        "beta" : [3,4,5,3,4,5,3,4,5] ,
        "val_1" : ["x", "y" , "z", "w", "a", "b", "v1" , "v2" , "v3" ] ,
        "val_2" : ["z1", "z2" , "z3", "w1", "w2", "w3" , "zz1" , "zz2" , "zz3" ]
    })
df_1.set_index(["alpha", "beta"], inplace=True)

I am trying to select the following highlighted rows:

That is, every row where beta is either 3 or 5.

I have gone through the pandas documentation multiple times and cannot find a way to do this. The closest I've come to what I think must be the answer is:

df_1.xs((3,5), level="beta", drop_level=False)

Which now currently fails. What is the proper indexing/slicing way to get this?

couldn't you just brute force it? or it the dataset to large? — Olian04
– Olian04, Commented Jan 17, 2017 at 17:16
In this question stackoverflow.com/questions/15463729/… they solve basically the same problem by converting it to a Panel. — javidcf
– javidcf, Commented Jan 17, 2017 at 17:23

Nickil Maveli · Accepted Answer · 2017-01-17 17:35:07Z

5

You can use the DF.query() method to subset based on the specified values:

df_1.query('beta == 3 or beta == 5')  # More succintly : df_1.query('beta == [3,5]')

edited Jan 17, 2017 at 17:35

answered Jan 17, 2017 at 17:19

Nickil Maveli

29.8k10 gold badges86 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sparc_spread Over a year ago

This is pretty awesome. I have avoided query() till now because it has been marked "experimental" but honestly, this seems so much more intuitive than the indexing methods. Is there any reason not to use query() all the time (i.e. performance, etc.)?

Nickil Maveli Over a year ago

AFAIK, .query() can be efficient when you have a large sized DF (maybe of the order of a million rows) when compared to the normal boolean indexing logic. Also, currently it lacks various string handling capabilities.

akuiper · Accepted Answer · 2017-01-17 17:26:44Z

2

Another option is to use get_level_values and isin to construct a logical series for indexing:

df_1[df_1.index.get_level_values(1).isin([3,5])]

answered Jan 17, 2017 at 17:26

akuiper

216k33 gold badges362 silver badges379 bronze badges

Comments

Ted Petrou · Accepted Answer · 2017-01-17 17:29:50Z

2

You can use pd.IndexSlice. There is a very similar example directly in the documentation.

df_1.loc[pd.IndexSlice[:, [3,5]], :]


           val_1 val_2
alpha beta            
1     3        x    z1
      5        z    z3
2     3        w    w1
      5        b    w3
3     3       v1   zz1
      5       v3   zz3

answered Jan 17, 2017 at 17:29

Ted Petrou

62.4k19 gold badges139 silver badges139 bronze badges

1 Comment

Zeugma Over a year ago

For the record a variation of this answer: df.loc[(slice(None), [3,5]),:]

Collectives™ on Stack Overflow

Simple pandas MultiIndex slicing

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related