0

My question is a different way as we usually check string by using str.contains(). I want to check if a substring in the dataframe is contained in a long string variable.

The dataframe looks like this:

Account Substring Category
1001 Cash Payment Category #1
1002 Credit Card Payment Category #2

The long string variable is long_str = “Cash Payment by Customer”.

So when using .loc to search/filter records in dataframe tha the substring that is contained in the long_str, is there any similar function like str.contains() but in the opposite way?

Below is the code I want to try to filter the dataframe, except str.contains() that won’t work. Thanks!

df.loc[df[‘Substring’].str.contains(long_str)]

1 Answer 1

1

You can simply use pandas.Series.apply method for that:

>>> long_str = "Cash Payment by Customer"
>>> df.loc[df.Substring.apply(lambda x: x in long_str)]
   Account     Substring     Category
0     1001  Cash Payment  Category #1
Sign up to request clarification or add additional context in comments.

2 Comments

Hi @ank, thank you so much ! This is working..i didn’t really we can use the apply function within the .loc[]. The only problem from that is the performance when we have millions of records to search, wondering if there will be any better solution with better performance? Thank you so much!
Hi @XavierSun . Not aware of any better solution for this. However a faster solution could be to use list comprehension directly instead of using the apply method. Give this a try and check the performance: df.loc[[x in long_str if x is not None else False for x in df.Substring]]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.