Check which elements of series are substring of a given text - Python, Pandas

Question

What I am trying to do is:

Given a series with strings, to find all the indexes of the strings, that are substring of another main string, in a vectorize manner.

The Input:

series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'

# The series:
0      ab
1    abcd
2     bcc
3     abc
Name: text, dtype: object

The desired output:

0      ab
3     abc
Name: text, dtype: object

What I tried:

df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)

# And this of course won't work, since it check if the main string is a 
# substring of the series strings:
series.str.contains(main_text, regex=True)

Thanks!

mozway · Accepted Answer · 2022-03-04 09:00:34Z

0

You don't need a regex, simply use in:

series[[e in main_text for e in series]]

output:

0     ab
3    abc
Name: text, dtype: object

answered Mar 4, 2022 at 9:00

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Ilan12 Over a year ago

Yes but this is not efficient, I want to do it in a vectorize manner

mozway Over a year ago

@Ilan12 you won't be able to do better with pandas, you can't vectorize here

Ilan12 Over a year ago

maybe with apply it would be faster than with regular loop

mozway Over a year ago

No, regular loops are most often faster than apply ;) see here or here for example.

Collectives™ on Stack Overflow

Check which elements of series are substring of a given text - Python, Pandas

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related