-1

What I am trying to do is:

Given a series with strings, to find all the indexes of the strings, that are substring of another main string, in a vectorize manner.

The Input:

series = pd.Series(['ab', 'abcd', 'bcc', 'abc'], name='text')
main_text = 'abcX'

# The series:
0      ab
1    abcd
2     bcc
3     abc
Name: text, dtype: object

The desired output:

0      ab
3     abc
Name: text, dtype: object

What I tried:

df_test = pd.DataFrame(series)
df_test['text2'] = main_text
df_test['text'].isin(df_test)

# And this of course won't work, since it check if the main string is a 
# substring of the series strings:
series.str.contains(main_text, regex=True)

Thanks!

1 Answer 1

0

You don't need a regex, simply use in:

series[[e in main_text for e in series]]

output:

0     ab
3    abc
Name: text, dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

Yes but this is not efficient, I want to do it in a vectorize manner
@Ilan12 you won't be able to do better with pandas, you can't vectorize here
maybe with apply it would be faster than with regular loop
No, regular loops are most often faster than apply ;) see here or here for example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.