I'm trying to match dates using different regular expressions using named groups so that each regex returns the same group names into the DataFrame. The idea is to search the first regex, if there is no match, use the second regex and send the result to the same group/columns, and so forth. All regex have a maximum of 3 groups (month, day, year). Sometimes the order is different, sometimes there is only and , etc. Don't worry about the regex's correctness, I just want to figure out the groups problem. Sample regex's:
regex1 = '(?P<extracted>(?P<month>\d{1,2})[/-](?P<day>\d{1,2})[/-](?P<year>\d{2,4}))'
regex2 = '(?P<extracted>(?P<month>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s(?P<year>[1|2]\d{3}))'
regex3 = '(?P<extracted>(?P<year>[1|2]\d{3}))'
full_regex = f'({regex1}|{regex2}|{regex3})'
df_captured = df['original'].str.extract(full_regex)
The problem is that named groups can't be repeated. Is there a solutions without using nested if statatemnts or something uglier?