I have a question regarding regular expressions. When using or construct
$ python
Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> for mo in re.finditer('a|ab', 'ab'):
... print mo.start(0), mo.end(0)
...
0 1
we get only one match, which is expected as the first leftmost branch, that gets accepted is reported. My question is that is it possible and how to construct a regular expression, which would yield both (0,1) and (0,2). And also, how to do that in general for any regex in form r1 | r2 | ... | rn .
Similarly, is it possible to achieve this for *, +, and ? constructs? As by default:
>>> for mo in re.finditer('a*', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 3
3 3
>>> for mo in re.finditer('a+', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 3
>>> for mo in re.finditer('a?', 'aaa'):
... print mo.start(0), mo.end(0)
...
0 1
1 2
2 3
3 3
Second question is that why do empty strings match at ends, but not anywhere else as is case with * and ? ?
EDIT:
I think I realize now that both questions were nonsense: as @mgilson said, re.finditer only returns non-overlapping matches and I guess whenever a regular expression accepts a (part of a) string, it terminates the search. Thus, it is impossible with default settings of the Python matching engine.
Although I wonder that if Python uses backtracking in regex matching, it should not be very difficult to make it continue searching after accepting strings. But this would break the usual behavior of regular expressions.
EDIT2:
This is possible in Perl. See answer by @Qtax below.