Pandas: Selecting rows by condition on column and index

Question

Assume we have the following dataframe:

d = {'col1': ['a1', 'b1', 'c1', 'a1'], 'col2': ['a2', 'b2', 'b2', 'c2'], 'year':[2011, 2011, 2012, 2012], 'rank':[1, 2, 1, 2]}
df = pd.DataFrame(data=d).set_index(['year', 'rank']).sort_index()

          col1 col2
year rank          
2011 1      a1   a2
     2      b1   b2
2012 1      c1   b2
     2      a1   c2

How can I select all columns where col1 != 'a1' or year != 2011?

If year wouldn't be an index, I could do this by

df[(df.col1 != 'a1') | (df.year != 2011)]

However, as year is an index, df.year would throw an AttributeError.

How can I formulate the condition for the index? Thanks in advance!

Do you absolutely need to use those columns as the index?

AMC
– AMC

2020-01-18 15:43:08 +00:00
Commented Jan 18, 2020 at 15:43 — AMC
– AMC, Commented Jan 18, 2020 at 15:43

Nico Albers · Accepted Answer · 2020-01-18 15:13:41Z

3

You can access the index by the method df.index.get_level_values, e.g. you can gain the searched result by

In [29]: df[(df.col1 != 'a1') | (df.index.get_level_values('year') != 2011)]
Out[29]:
          col1 col2
year rank
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

Some Sidenote:

The comparison df.index.get_level_values('year') != 2011 will be an numpy array, therefore we need to get the values from the pd.Series for comparing with df.col1 != 'a1' (in some older pandas versions you may have used to acess the values with .values or similar, because comparing of an series with index with some array was not possible. Howewer, at least with 0.24 and above this is not necessary anymore).

edited Jan 18, 2020 at 15:13

answered Jan 18, 2020 at 14:47

Nico Albers

1,6961 gold badge19 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Nico Albers Over a year ago

Oh yes, that's true. That behaviour seem to have changed some while ago, when it didn't worked to compare some indexed series with an array, I'll adjust that. Thanks for pointing out!

Mykola Zotko · Accepted Answer · 2020-01-18 15:06:18Z

3

You can use the method query() that treats both the index and columns of the frame as a column:

df.query("col1 != 'a1' | year != 2011")

Output:

          col1 col2
year rank          
2011 2      b1   b2
2012 1      c1   b2
     2      a1   c2

edited Jan 18, 2020 at 15:06

answered Jan 18, 2020 at 14:52

Mykola Zotko

18.2k6 gold badges88 silver badges90 bronze badges

Comments

Raunaq Jain · Accepted Answer · 2020-01-18 14:50:03Z

0

You can access the index through loc and iloc operators.

df[df['col1'] != 'a1'].loc[2011]

To access both the year and rank index together, df.loc[2011,1], which will output a1 and a2

answered Jan 18, 2020 at 14:50

Raunaq Jain

9177 silver badges13 bronze badges

Comments

abhilb · Accepted Answer · 2020-01-18 15:19:48Z

0

You can try

df1 = df[df.index.get_level_values('year').isin([2011])]
df2 = df[df.col1 == 'a1']
result = pd.concat([df1,df2]).drop_duplicates()

Output

        col1    col2
year    rank        
2011    1   a1  a2
        2   b1  b2
2012    2   a1  c2

answered Jan 18, 2020 at 15:19

abhilb

5,7672 gold badges22 silver badges26 bronze badges

Collectives™ on Stack Overflow

Pandas: Selecting rows by condition on column and index

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related