0

I want to match:

first second

and

second first

so the regular expression:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'first second')

matches, but this one:

re.match(r'(?:(?P<f>first) (?P<s>second)|(?P=s) (?P=f))', 'second first')

does not matches. Is this a bug on backreference in A|B ?

5
  • 2
    When in doubt, do not blame the regular expression engine; it rarely is a bug in the engine. Commented Apr 10, 2014 at 15:05
  • That's ok of course, but if you try it doesn't work, documentation does not report anything about that, and the syntax is right to me. so if you find the problem anywhere else you're right Commented Apr 10, 2014 at 15:07
  • There is more information about backreferences in the "Groups" section of the Stack Overflow Regular Expressions FAQ. Commented Apr 10, 2014 at 15:11
  • @aliteralmind: the FAQ doesn't (yet) have any proper references for backreferences; the best that's there now is a specific phone-number pattern that doesn't say much about how they work. Commented Apr 10, 2014 at 15:16
  • Hm. Okay. Then perhaps I'll look around for better ones, and please let me know if you find (or create! :) any...or feel free to add them yourself. Commented Apr 10, 2014 at 15:24

2 Answers 2

2

You've misunderstood how backreferences work. For a backreference to match anything, the original reference must have matched too.

In your second example, the (?P<f>first) group didn't match anything, so the (?P=f) back reference cannot match anything either.

Back references are the wrong tool here; you'll have to repeat at least one of your groups, literally:

r'(?:(?P<f>first )?(?P<s>second)(?(f)| first))'

would use a conditional pattern that only matches first after second if there was no f match before second:

>>> import re
>>> pattern = re.compile(r'(?:(?P<f>first )?(?P<s>second)(?(f)$| first))')
>>> pattern.match('first second').group()
'first second'
>>> pattern.match('second first').group()
'second first'
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, I understand. Do you have any idea of doing that (first second|second first) without making copy/paste in the regex?
0

How about:

(?=.*(?P<f>first))(?=.*(?P<s>second))

(?=...) is a positive lookahead it assumes that the word first is present somewhere in the string without making it part of the match (it's a zero length assertion). It's the same for second.

This regex is true if there is first and second in any order in the string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.