2

Okay so this may be just googling wrong or not reading documentation correctly, but I couldn't find anything on this.

Say I have:

sample_str = "rose aaron robert moro"
pat = 'ro'

I want to find all instances of words (preferably using re.search()) which DON'T end OR begin in 'ro'. That is, I want one or more character to be before and after 'ro'. So I would want 'aaron' to match, but not at any of the other words in sample_str.

How would I do this? I tried a bunch of things, including '+ro+', but it gave me an error. I am not new to Python but have some trouble with the Regex, so if anyone can please explain that would be great.

Thanks

2
  • regex101.com Commented Jan 25, 2018 at 23:15
  • Is regex required? How about str methods? Commented Jan 26, 2018 at 4:37

3 Answers 3

6

I believe you can use a negative look-ahead/look-behind for this.

\b(?!ro)\w+(?<!ro)\b

When applied to rose aaron robert moro will match only aaron.

Explanation

\b = a word boundary
(?!ro) = not followed by ro
\w+ = one or more word characters
(?<!ro)\b = another word boundary, not preceded by ro

Working Example

https://regex101.com/r/WcSlsx/2/

Sign up to request clarification or add additional context in comments.

1 Comment

I think this is the simplest and best answer. Thanks. Also, so the '\b' indicate word boundaries? You can use them to mark the beginning and end of a word?
2

This problem is simple enough to use str methods. For a non-regex approach:

[x for x in sample_str.split() if (not x.startswith(pat)) and (not x.endswith(pat))]
# ['aaron']

Note: this will include any string that does not start or end with the pattern. If you wish to extend the condition to include words that contain the pattern, try this:

sample_str = "rose aaron robert moro nopattern"
pat = "ro"

[x for x in sample_str.split() if (not x.startswith(pat)) and (not x.endswith(pat)) and (pat in x)]
# ['aaron']

Comments

1

I believe hoipolloi has the best answer using look-ahead/look-behind. Although, I spent a considerable amount of time trying to figure out how to do this specific case without using extended regexp, enough that I would be disappointed not to share it. Here is the pattern I came up with.

r'(?:\b)((?:[^r\s]|(r[^o\s]))\S*(?:([^r\s]o)|[^o\s])|\w|(?:[^r]\s\w)|(?:\w[^o\s]))(?:\b)'

You can then use re.findall to find all occurence of the pattern.

import re

sample_str = "rose aaron robert moro"
pattern = r'(?:\b)((?:[^r\s]|(r[^o\s]))\S*(?:([^r\s]o)|[^o\s])|\w|(?:[^r]\s\w)|(?:\w[^o\s]))(?:\b)'
matchs = re.findall(pattern, sample_str)

This is overly complex, impossible to generalize and very ugly. But hey, it was fun.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.