3

I have the following DataFrame:

df_1 = DataFrame({
        "alpha" : [1,1,1,2,2,2,3,3,3] ,
        "beta" : [3,4,5,3,4,5,3,4,5] ,
        "val_1" : ["x", "y" , "z", "w", "a", "b", "v1" , "v2" , "v3" ] ,
        "val_2" : ["z1", "z2" , "z3", "w1", "w2", "w3" , "zz1" , "zz2" , "zz3" ]
    })
df_1.set_index(["alpha", "beta"], inplace=True)

I am trying to select the following highlighted rows:

enter image description here

That is, every row where beta is either 3 or 5.

I have gone through the pandas documentation multiple times and cannot find a way to do this. The closest I've come to what I think must be the answer is:

df_1.xs((3,5), level="beta", drop_level=False)

Which now currently fails. What is the proper indexing/slicing way to get this?

2
  • couldn't you just brute force it? or it the dataset to large? Commented Jan 17, 2017 at 17:16
  • 1
    In this question stackoverflow.com/questions/15463729/… they solve basically the same problem by converting it to a Panel. Commented Jan 17, 2017 at 17:23

3 Answers 3

5

You can use the DF.query() method to subset based on the specified values:

df_1.query('beta == 3 or beta == 5')  # More succintly : df_1.query('beta == [3,5]')

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

This is pretty awesome. I have avoided query() till now because it has been marked "experimental" but honestly, this seems so much more intuitive than the indexing methods. Is there any reason not to use query() all the time (i.e. performance, etc.)?
AFAIK, .query() can be efficient when you have a large sized DF (maybe of the order of a million rows) when compared to the normal boolean indexing logic. Also, currently it lacks various string handling capabilities.
2

Another option is to use get_level_values and isin to construct a logical series for indexing:

df_1[df_1.index.get_level_values(1).isin([3,5])]

enter image description here

Comments

2

You can use pd.IndexSlice. There is a very similar example directly in the documentation.

df_1.loc[pd.IndexSlice[:, [3,5]], :]


           val_1 val_2
alpha beta            
1     3        x    z1
      5        z    z3
2     3        w    w1
      5        b    w3
3     3       v1   zz1
      5       v3   zz3

1 Comment

For the record a variation of this answer: df.loc[(slice(None), [3,5]),:]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.