5

I have a question regarding regular expressions. When using or construct

$ python
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> for mo in re.finditer('a|ab', 'ab'):
...     print mo.start(0), mo.end(0)
... 
0 1

we get only one match, which is expected as the first leftmost branch, that gets accepted is reported. My question is that is it possible and how to construct a regular expression, which would yield both (0,1) and (0,2). And also, how to do that in general for any regex in form r1 | r2 | ... | rn .

Similarly, is it possible to achieve this for *, +, and ? constructs? As by default:

>>> for mo in re.finditer('a*', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 3
3 3
>>> for mo in re.finditer('a+', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 3
>>> for mo in re.finditer('a?', 'aaa'):
...     print mo.start(0), mo.end(0)
... 
0 1
1 2
2 3
3 3

Second question is that why do empty strings match at ends, but not anywhere else as is case with * and ? ?

EDIT:

I think I realize now that both questions were nonsense: as @mgilson said, re.finditer only returns non-overlapping matches and I guess whenever a regular expression accepts a (part of a) string, it terminates the search. Thus, it is impossible with default settings of the Python matching engine.

Although I wonder that if Python uses backtracking in regex matching, it should not be very difficult to make it continue searching after accepting strings. But this would break the usual behavior of regular expressions.

EDIT2:

This is possible in Perl. See answer by @Qtax below.

2 Answers 2

1

I don't think this is possible. The docs for re.finditer state:

Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string

(emphasis is mine)


In answer to your other question about why empty strings don't match elsewhere, I think it is because the rest of the string is already matched someplace else and finditer only gives matches for non-overlapping patterns which match (see answer to first part ;-).

Sign up to request clarification or add additional context in comments.

2 Comments

@answerers -- If you prove me wrong on this point, please @notify me. I'm interested to know how this one turns out :)
Of course, the second question was foolish, should have read the docs :)
1

Just want to mention that you can do such things in Perl, using an expression like:

(?:a|ab)(?{ say $& })(?!)

The (?{ code }) construct executes the code every time the regex engine gets to that position in the pattern. Here right after your regex, and it prints the content of the match so far. The (?!) after that fails the match, making the regex engine backtrack, and giving us the next possible match, and so on.

This will work for any kind of expression.

Example:

perl -E "$_='ab'; /(?:a|ab)(?{ say $& })(?!)/"

Output:

a
ab

Another example:

perl -E "$_='aaaa'; /a+(?{ say $& })(?!)/"

Output:

aaaa
aaa
aa
a
aaa
aa
a
aa
a
a

7 Comments

Very cool, I did not knew about that extension before. I wonder if something similar exists in Python or other languages such as Javascript? I am currently reading the Python docs and hope I find something similar :)
@Timo, surely not in JavaScript, and no other languages/libs that I know of have such execute code features. But some libs (PCRE?) probably allow you to set some settings to get the same result as in this case.
Yup, did not find anything about that in Python docs. I guess this proves that Perl is still the best tool to use with anything related to regular expressions :D
@Timo -- It looks like there might be something similar to what you are looking for in the regex module ...
although, I can't seem to get it to work. . .(it seemed like the overlapped=True keyword would be what you wanted)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.