9

The following regex is supposed to match any :text: that's preceeded by start-of-string, whitespace or :, and succeeded by end-of-string, whitespace or : (Along with a few extra rules)

I'm not great at regex but I've come up with the desired solution in regexr.com:

(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

Result: :match1:, :match2:, :match3:, :match4:

But on Python 3 this raises an error.

re.search("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", txt)

re.error: look-behind requires fixed-width pattern

Anyone know a good workaround for this issue? Any tips are appreciated.

2

3 Answers 3

13

Possibly the easiest solution would be to use the newer regex module which supports infinite lookbehinds:

import regex as re

data = """:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:"""

for match in re.finditer("(?<=\s|:|^)(:[^\s|:]+:)(?=\s|:|$)", data):
    print(match.group(0))

This yields

:match1:
:match2:
:match3:
:match4:
Sign up to request clarification or add additional context in comments.

3 Comments

This is definitely easier than splitting up look-behinds. If you can use a different library, of course.
This just saved me from my hell-long pattern to break :)
I was using pandas str.findall and it apparently uses re under the bonnet. Had to use regex with .apply instead, but it works now.
7

In python, you may use this work-around to avoid this error:

(?:^|(?<=[\s:]))(:[^\s:]+:)(?=[\s:]|$)

Anchors ^ and $ are zero-width matchers anyway.

RegEx Demo

Comments

1

Another option would be to install regex:

$ pip3 install regex

then, we'd write some expression and (*SKIP)(*FAIL) the patterns that we wouldn't want to be there:

import regex as re

expression = r'(?:^\d+:[^:\r\n]+:$|^:[^:\r\n]+:\d+$|^(?!.*:\b\S+\b:).*$)(*SKIP)(*FAIL)|:[a-z0-9]+:'
string = '''
:match1::match2: :match3:
:match4:
000:matchNot:
:matchNot:000
:match Not:

'''

print(re.findall(expression, string))

Output

[':match1:', ':match2:', ':match3:', ':match4:']

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.