3

How to combine Groups (p1 and p2) in the following code?

import re

txt = "Sab11Mba11"
p1 = "(S(a|b)(a|b))"
p2 = "(M(a|b)(a|b))"
px = "(" + p1 + '|' + p2 + ")"

print(re.findall(p1, txt)) # [('Sab', 'a', 'b')]
print(re.findall(p2, txt)) # [('Mba', 'b', 'a')]
print(re.findall(px, txt)) # [('Sab', 'Sab', 'a', 'b', '', '', ''), ('Mba', '', '', '', 'Mba', 'b', 'a')]

Can you please explain why do I get empty strings and how to get [('Sab', 'a', 'b'), ('Mba', 'b', 'a')]?

2 Answers 2

1

The empty values of capturing groups that did not participate in the match still get output.

You need to remove the outer parentheses and filter the resulting tuples from empty values:

import re

txt = "Sab11Mba11"
p1 = "(S(a|b)(a|b))"
p2 = "(M(a|b)(a|b))"
px = p1 + '|' + p2
print([tuple(filter(lambda m: m != '', x)) for x in re.findall(px, txt)])
# => [('Sab', 'a', 'b'), ('Mba', 'b', 'a')]

See the Python demo.

Sign up to request clarification or add additional context in comments.

4 Comments

Should I use the same approach (filter empty strings) in case of sub? For example: def repl(x): c = list(filter(lambda m: m != None, x.groups())) return c[2] + c[1] re.sub(px, repl, txt)
@OlegDats If that is what you need, why not?
I hoped to get object without needing to filter. Performance is important in my case. My real task includes much more cases and longer strings. It is not efficient to filter each time.
@OlegDats there is no other way here, I explained why at the top of the answer.
1

You can try to use a branch reset group. It would require PyPi's regex module instead:

import regex as re

txt = 'Sab11Mba11'
p1 = r'(S(a|b)(a|b))'
p2 = r'(M(a|b)(a|b))'

px = r'(?|' + p1 + '|' + p2 + ')'
print(re.findall(px, txt))

Prints:

[('Sab', 'a', 'b'), ('Mba', 'b', 'a')]

Group numbers will be reused across different branches of a branch reset.

In general don't forget to use raw-string notation when working with regular expressions assuming S, a and b are placeholders for other constructs. Also note that you don't need 'px' per se if you'd use f-string construct. For example:

re.findall(fr'(?|{p1}|{p2})', txt)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.