Is there any mechanism in Python for combining compiled regular expressions?
I know it's possible to compile a new expression by extracting the plain-old-string .pattern property from existing pattern objects. But this fails in several ways. For example:
import re
first = re.compile(r"(hello?\s*)")
# one-two-three or one/two/three - but not one-two/three or one/two-three
second = re.compile(r"one(?P<r1>[-/])two(?P=r1)three", re.IGNORECASE)
# Incorrect - back-reference \1 would refer to the wrong capturing group now,
# and we get an error "redefinition of group name 'r1' as group 3; was
# group 2 at position 47" for the `(?P)` group.
# Result is also now case-sensitive, unlike 'second' which is IGNORECASE
both = re.compile(first.pattern + second.pattern + second.pattern)
The result I'm looking for is achievable like so in Perl:
$first = qr{(hello?\s*)};
# one-two-three or one/two/three - but not one-two/three or one/two-three
$second = qr{one([-/])two\g{-1}three}i;
$both = qr{$first$second$second};
A test shows the results:
test($second, "...one-two-three..."); # Matches
test($both, "...hello one-two-THREEone-two-three..."); # Matches
test($both, "...hellone/Two/ThreeONE-TWO-THREE..."); # Matches
test($both, "...HELLO one/Two/ThreeONE-TWO-THREE..."); # No match
sub test {
my ($pat, $str) = @_;
print $str =~ $pat ? "Matches\n" : "No match\n";
}
Is there a library somewhere that makes this use case possible in Python? Or a built-in feature I'm missing somewhere?
(Note - one very useful feature in the Perl regex above is \g{-1}, which unambiguously refers to the immediately preceding capture group, so that there are no collisions of the type that Python is complaining about when I try to compile the combined expression. I haven't seen that anywhere in Python world, not sure if there's an alternative I haven't thought of.)
firstregex shouldn't be required here to reproduce the problem, right? (Or maybe it causes the issue with the case sensitivity?)'([0-9]+)one(?P<r1>[-/])two(?P=r1)threeone(?P<r1>[-/])two(?P=r1)three'which is likely not what you want. Adding the patterns simply concatenates them as strings. Perhaps you want to join them with '|'?(?i)in the pattern itself.(?i), I would definitely have tripped on that.