0

Need to search a string column values from a list of strings. The strings in the search list are only a substring of the values in the column

df = pd.DataFrame(data={'text':['abc def', 'def ghi', 'poi opo', 'aswwf', 'abcs  sd'], 'id':[1, 2, 3, 4, 5]})

Out [1]:
    text     id
0   abc def  1
1   def ghi  2
2   poi opo  3
3   aswwf    4
4   abcs sd  5

search = ['abc', 'poi']

Required:


Out [2]:
    text     id
0   abc def  1
1   poi opo  3
2   abcs sd  5
0

2 Answers 2

2

Use Series.str.contains with boolean indexing - all values of list are joined by | for regex OR:

pat = '|'.join(search)
df1 = df[df['text'].str.contains(pat)]
print (df1)
       text  id
0   abc def   1
2   poi opo   3
4  abcs  sd   5
Sign up to request clarification or add additional context in comments.

Comments

0

@jezrael'answer is great, provided the patterns to search contain no special characters like |. But you can process every element at a time and do a global or at the end. If you want to search strings containing special characters, you can use:

df[pd.concat([df.text.str.contains(i, regex=False) for i in search], axis=1).any(axis=1)]

it gives as expected:

       text  id
0   abc def   1
2   poi opo   3
4  abcs  sd   5

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.