pandas: efficient row selection based on multiple condition columns

Question

Suppose that I have a data frame with 3 columns A, B, C. I would like select the rows for which column B satisfies some condition or column C satisfies some condition. Is there an efficient way of doing this?

To be concrete, suppose I have:

import pandas as pd
df = pd.DataFrame({'A':['mary','john','ashley'],\
               'B':['xiao','derric','john'],\
               'C':['faye','linnett','bruce']})

I would like to select the rows where column B is John or column C is john. Is there a more elegant to do this than:

df[(df['B']=='John') | (df['C']=='John')]

In my real application, df will have many rows and this row selection is done many times. So efficiency is desirable.

The basic options are chaining with | or query. See this post for timings. — user7864386
– user7864386, Commented May 4, 2022 at 18:50

Ian Wright · Accepted Answer · 2022-05-04 22:20:33Z

1

cols = ['A','B'...]

(df[cols] == 'John').any(axis='columns')

answered May 4, 2022 at 22:20

Ian Wright

3661 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user17242583 Over a year ago

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.

Collectives™ on Stack Overflow

pandas: efficient row selection based on multiple condition columns

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related