1

I'm matching multiple regex patterns vs very longs strings (up to 10e8 characters). Is there a way to know which of my regex patterns report match in Python? Or should I rather do regex separately for each pattern?

pat=re.compile('C[GT]GG|A[AT]TA|T[TG]TA')
for m in pat.finditer(longString):
  print m.start(), m.end()
  # how to know which pat matched? 
0

2 Answers 2

1

You can use

m.group()

to see which part of regex matched your input.

>> for m in pat.finditer('pat290'):
...     print m.start(), m.end(), m.group()
...
0 4 pat2

>>> for m in pat.finditer('pat3789'):
...     print m.start(), m.end(), m.group()
...
0 4 pat3

>>> for m in pat.finditer('some-pat1234567'):
...     print m.start(), m.end(), m.group()
...
5 9 pat1
Sign up to request clarification or add additional context in comments.

2 Comments

thanks anubhava, actually I know about m.group(), but my patterns are a bit more complicated than 'pat1', it maybe something more like 'A[AC][GC]T'. I have specified it in the question now.
Hmm I don't think you can get C[GT]GG from matched results.
0

You can use the lastindex (or lastgroup) attribute of re match objects, which reports the index (or group name) of the last matched group.

However you have to modify your regular expression in order to transform each subexpression into a group, by enclosing them between brackets:

pat=re.compile('(C[GT]GG)|(A[AT]TA)|(T[TG]TA)')
for m in pat.finditer(longString):
    print m.start(), m.end(), 'group index:', m.lastindex

If you like to use symbolic names (thus improving readability), the pattern syntax is a little more complicated:

pat=re.compile('(?P<C_CG_GG>C[GT]GG)|(?P<A_AT_TA>A[AT]TA)|(?P<T_TG_TA>T[TG]TA)')
for m in pat.finditer(longString):
    print m.start(), m.end(), 'group index:', m.lastindex, 'group name:', m.lastgroup

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.