I am having trouble understanding the output of this regular expression. I am using the following regex to find a dates in text:
^(?:(1[0-2]|0?[1-9])-(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])-(1[0-2]|0?[1-9]))-(?:[0-9]{2})?[0-9]{2}$
It appears to be matching the pattern within text correctly, but I'm confused by the return values.
For this test string:
TestString = "10-20-2015"
It's returning this:
[('10', '20', '', '')]
If I put () around the entire regex, I get this returned:
[('10-20-2015', '10', '20', '', '')]
I would expect it to simply return the full date string, but it appears to be breaking the results up and I don't understand why. Wrapping my regex in () returns the full date string, but it also returns 4 extra values.
How do I make this ONLY match the full date string and not small parts of the string?
from my console:
Python 3.4.2 (default, Oct 8 2014, 10:45:20)
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> pattern = "^(?:(1[0-2]|0?[1-9])-(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])-(1[0-2]|0?[1-9]))-(?:[0-9]{2})?[0-9]{2}$"
>>> TestString = "10-20-2015"
>>> re.findall(pattern, TestString, re.I)
[('10', '20', '', '')]
>>> pattern = "(^(?:(1[0-2]|0?[1-9])-(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])-(1[0-2]|0?[1-9]))-(?:[0-9]{2})?[0-9]{2}$)"
>>> re.findall(pattern, TestString, re.I)
[('10-20-2015', '10', '20', '', '')]
>>>
>>> TestString = "10--2015"
>>> re.findall(pattern, TestString, re.I)
[]
>>> pattern = "^(?:(1[0-2]|0?[1-9])-(3[01]|[12][0-9]|0?[1-9])|(3[01]|[12][0-9]|0?[1-9])-(1[0-2]|0?[1-9]))-(?:[0-9]{2})?[0-9]{2}$"
>>> re.findall(pattern, TestString, re.I)
[]
Based on the the response, here was my answer: ((?:(?:1[0-2]|0[1-9])-(?:3[01]|[12][0-9]|0[1-9])|(?:3[01]|[12][0-9]|0[1-9])-(?:1[0-2]|0[1-9]))-(?:[0-9]{2})?[0-9]{2})