0

Suppose that I have a data frame with 3 columns A, B, C. I would like select the rows for which column B satisfies some condition or column C satisfies some condition. Is there an efficient way of doing this?

To be concrete, suppose I have:

import pandas as pd
df = pd.DataFrame({'A':['mary','john','ashley'],\
               'B':['xiao','derric','john'],\
               'C':['faye','linnett','bruce']})

I would like to select the rows where column B is John or column C is john. Is there a more elegant to do this than:

df[(df['B']=='John') | (df['C']=='John')]

In my real application, df will have many rows and this row selection is done many times. So efficiency is desirable.

1
  • The basic options are chaining with | or query. See this post for timings. Commented May 4, 2022 at 18:50

1 Answer 1

1
cols = ['A','B'...]

(df[cols] == 'John').any(axis='columns')
Sign up to request clarification or add additional context in comments.

1 Comment

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.