How to find a phrase that is NOT at a word boundary from a string in Regex?

Question

Okay so this may be just googling wrong or not reading documentation correctly, but I couldn't find anything on this.

Say I have:

sample_str = "rose aaron robert moro"
pat = 'ro'

I want to find all instances of words (preferably using re.search()) which DON'T end OR begin in 'ro'. That is, I want one or more character to be before and after 'ro'. So I would want 'aaron' to match, but not at any of the other words in sample_str.

How would I do this? I tried a bunch of things, including '+ro+', but it gave me an error. I am not new to Python but have some trouble with the Regex, so if anyone can please explain that would be great.

Thanks

regex101.com

Tyler Cowan
– Tyler Cowan

2018-01-25 23:15:15 +00:00
Commented Jan 25, 2018 at 23:15 — Tyler Cowan
– Tyler Cowan, Commented Jan 25, 2018 at 23:15
Is regex required? How about str methods?

pylang
– pylang

2018-01-26 04:37:25 +00:00
Commented Jan 26, 2018 at 4:37 — pylang
– pylang, Commented Jan 26, 2018 at 4:37

hoipolloi · Accepted Answer · 2018-01-25 23:32:48Z

6

I believe you can use a negative look-ahead/look-behind for this.

\b(?!ro)\w+(?<!ro)\b

When applied to rose aaron robert moro will match only aaron.

Explanation

\b = a word boundary
(?!ro) = not followed by ro
\w+ = one or more word characters
(?<!ro)\b = another word boundary, not preceded by ro

Working Example

https://regex101.com/r/WcSlsx/2/

edited Jan 25, 2018 at 23:32

answered Jan 25, 2018 at 23:15

hoipolloi

8,0442 gold badges29 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

LearningCoding Over a year ago

I think this is the simplest and best answer. Thanks. Also, so the '\b' indicate word boundaries? You can use them to mark the beginning and end of a word?

pylang · Accepted Answer · 2018-01-26 04:48:55Z

2

This problem is simple enough to use str methods. For a non-regex approach:

[x for x in sample_str.split() if (not x.startswith(pat)) and (not x.endswith(pat))]
# ['aaron']

Note: this will include any string that does not start or end with the pattern. If you wish to extend the condition to include words that contain the pattern, try this:

sample_str = "rose aaron robert moro nopattern"
pat = "ro"

[x for x in sample_str.split() if (not x.startswith(pat)) and (not x.endswith(pat)) and (pat in x)]
# ['aaron']

answered Jan 26, 2018 at 4:48

pylang

45.4k16 gold badges137 silver badges133 bronze badges

Comments

Olivier Melançon · Accepted Answer · 2018-01-26 13:33:30Z

1

I believe hoipolloi has the best answer using look-ahead/look-behind. Although, I spent a considerable amount of time trying to figure out how to do this specific case without using extended regexp, enough that I would be disappointed not to share it. Here is the pattern I came up with.

r'(?:\b)((?:[^r\s]|(r[^o\s]))\S*(?:([^r\s]o)|[^o\s])|\w|(?:[^r]\s\w)|(?:\w[^o\s]))(?:\b)'

You can then use re.findall to find all occurence of the pattern.

import re

sample_str = "rose aaron robert moro"
pattern = r'(?:\b)((?:[^r\s]|(r[^o\s]))\S*(?:([^r\s]o)|[^o\s])|\w|(?:[^r]\s\w)|(?:\w[^o\s]))(?:\b)'
matchs = re.findall(pattern, sample_str)

This is overly complex, impossible to generalize and very ugly. But hey, it was fun.

edited Jan 26, 2018 at 13:33

answered Jan 26, 2018 at 3:58

Olivier Melançon

22.5k4 gold badges48 silver badges81 bronze badges

Collectives™ on Stack Overflow

How to find a phrase that is NOT at a word boundary from a string in Regex?

3 Answers 3

Explanation

Working Example

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Explanation

Working Example

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related