15

I have python regex objects - say, re_first and re_second - I would like to concatenate.

import re
FLAGS_TO_USE = re.VERBOSE | re.IGNORECASE
re_first = re.compile( r"""abc #Some comments here """, FLAGS_TO_USE )
re_second = re.compile( r"""def #More comments here """, FLAGS_TO_USE )

I want one regex expression that matches either one of the above regex expressions. So far, I have

pattern_combined = re_first.pattern + '|' + re_second.pattern
re_combined = re.compile( pattern_combined, FLAGS_TO_USE ) 

This doesn't scale very well the more python objects. I end up with something looking like:

pattern_combined = '|'.join( [ first.pattern, second.pattern, third.pattern, etc ] )

The point is that the list to concatenate can be very long. Any ideas how to avoid this mess? Thanks in advance.

2
  • 1
    Why would you want to compile them individually before concatenating? Commented Feb 28, 2014 at 18:22
  • 2
    @1_CR That's because that's how they are given; they are inputs. To get the string patterns individually, I would have to do even more gruesome acrobatics. Commented Feb 28, 2014 at 19:01

3 Answers 3

20

I don't think you will find a solution that doesn't involve creating a list with the regex objects first. I would do it this way:

# create patterns here...
re_first = re.compile(...)
re_second = re.compile(...)
re_third = re.compile(...)

# create a list with them
regexes = [re_first, re_second, re_third]

# create the combined one
pattern_combined = '|'.join(x.pattern for x in regexes)

Of course, you can also do the opposite: Combine the patterns and then compile, like this:

pattern1 = r'pattern-1'
pattern2 = r'pattern-2'
pattern3 = r'pattern-3'

patterns = [pattern1, pattern2, pattern3]

compiled_combined = re.compile('|'.join(x for x in patterns), FLAGS_TO_USE)
Sign up to request clarification or add additional context in comments.

4 Comments

You can't join patterns that contain comments with |, you must use \n|.
what if some of the sub-patterns include the | operator inside them? doesn't it make problem this way? for example consider the case where `pattern1=r'[a-z]|[0-9]'
@ShahryarSaljoughi Why would it be a problem? If both [a-z] and [0-9] are valid inputs for pattern 1 and your full patterns is pattern1|pattern2|pattern3 then substituting pattern1 with [a-z]|[0-9] is totally valid: [a-z]|[0-9]|pattern2|pattern3. And for any or that's not on a global level, you need groups anyway
@h345k34cr You're right, it completely makes sense now. I don't know what I was thinking back then.
5

Toss them on a list, and then

'|'.join(your_list)

Comments

1

One can also directly concatenate r strings, for example:

prep_re = r"\b" + r"\b|\b".join(prepositions) + r"\b"
re.findall(prep_re, paragraph, re.IGNORECASE)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.