0

For checking if a single string is contained in rows of one column. (for example, "abc" is contained in "abcdef"), the following code is useful:

df_filtered = df.filter(df.columnName.contains('abc'))

The result would be for example "_wordabc","thisabce","2abc1".

How can I check for multiple strings (for example ['ab1','cd2','ef3']) at the same time?

I'm ideally searching for something like this:

df_filtered = df.filter(df.columnName.contains(['word1','word2','word3']))

The result would be for example "x_ab1","_cd2_","abef3".

Please, post scalable solutions (no for loops, for example) because the aim is to check a big list around 1000 elements.

1 Answer 1

2

All you need is isin

df_filtered = df.filter(df['columnName'].isin('word1','word2','word3') 

Edit

You need rlike function to achieve your result

words="(aaa|bbb|ccc)"

df.filter(df['columnName'].rlike(words))
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, but this isn't what I'm aiming for. I'll clarify my question because i meant "contains" not "equal".
@Manrique Please check edit part now

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.