I have the following Python regex:
>>> p = re.compile(r"(\b\w+)\s+\1")
\b : word boundary
\w+ : one or more alphanumerical characters
\s+ : one or more whitespaces (can be , \t, \n, ..)
\1 : backreference to group 1 ( = the part between (..))
This regex should find all double occurences of a word - if the two occurences are next to each other with some whitespace in between.
The regex seems to work fine when using the search function:
>>> p.search("I am in the the car.")
<_sre.SRE_Match object; span=(8, 15), match='the the'>
The found match is the the, just as I had expected. The weird behaviour is in the findall function:
>>> p.findall("I am in the the car.")
['the']
The found match is now only the. Why the difference?
findallreturns only the capturing groups if there are any (or the complete match otherwise).sre_constants.error: invalid group reference 1 at position 13error when changing my group(...)by(?:...)to make it non-capturing. Perhaps that's because using a backrefence to a non-capturing group is impossible?\1to match against..