2

Assume we have the following dataframe:

d = {'col1': ['a1', 'b1', 'c1', 'a1'], 'col2': ['a2', 'b2', 'b2', 'c2'], 'year':[2011, 2011, 2012, 2012], 'rank':[1, 2, 1, 2]}
df = pd.DataFrame(data=d).set_index(['year', 'rank']).sort_index()

          col1 col2
year rank          
2011 1      a1   a2
     2      b1   b2
2012 1      c1   b2
     2      a1   c2

How can I select all columns where col1 != 'a1' or year != 2011?

If year wouldn't be an index, I could do this by

df[(df.col1 != 'a1') | (df.year != 2011)]

However, as year is an index, df.year would throw an AttributeError.

How can I formulate the condition for the index? Thanks in advance!

1
  • Do you absolutely need to use those columns as the index? Commented Jan 18, 2020 at 15:43

4 Answers 4

3

You can access the index by the method df.index.get_level_values, e.g. you can gain the searched result by

In [29]: df[(df.col1 != 'a1') | (df.index.get_level_values('year') != 2011)]
Out[29]:
          col1 col2
year rank
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

Some Sidenote:

The comparison df.index.get_level_values('year') != 2011 will be an numpy array, therefore we need to get the values from the pd.Series for comparing with df.col1 != 'a1' (in some older pandas versions you may have used to acess the values with .values or similar, because comparing of an series with index with some array was not possible. Howewer, at least with 0.24 and above this is not necessary anymore).

Sign up to request clarification or add additional context in comments.

1 Comment

Oh yes, that's true. That behaviour seem to have changed some while ago, when it didn't worked to compare some indexed series with an array, I'll adjust that. Thanks for pointing out!
3

You can use the method query() that treats both the index and columns of the frame as a column:

df.query("col1 != 'a1' | year != 2011")

Output:

          col1 col2
year rank          
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

Comments

0

You can access the index through loc and iloc operators.

df[df['col1'] != 'a1'].loc[2011]

To access both the year and rank index together, df.loc[2011,1], which will output a1 and a2

Comments

0

You can try

df1 = df[df.index.get_level_values('year').isin([2011])]
df2 = df[df.col1 == 'a1']
result = pd.concat([df1,df2]).drop_duplicates()

Output

        col1    col2
year    rank        
2011    1   a1  a2
        2   b1  b2
2012    2   a1  c2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.