Despite the fact that the code in the question is not executable with some missing parts at the time of writing this answer, I tried something to help understand the problem.
I think you can achieve what you want by carefully creating capturing groups. Based on the simple input you provided Sample Input:12/2020 - 04/2021, I came up with this solution.
I have created 2 regexes in this example. They have a similar pattern up to capturing group 3. regex2 has a slightly different ending to capture a word instead of numbers causing it to not have capturing groups 4 and 5.
group1: captures start month
group2 : captures start year
group3 : captures full end date with regex1 or word Present with regex2
gruop4 : captures end month if end date is not equal to word Present
group5 : captures end year if end date is not equal to word Present
Note that I have not handled all the exceptions that could occur with various inputs.
import re
from datetime import datetime
from dateutil import relativedelta
line = """
12/2020 - 04/2021
05/2021 - Present
"""
regex1 = '(\d{2})\/(\d{4})\s-\s((\d{2})\/(\d{4}))'
regex2 = '(\d{2})\/(\d{4})\s-\s(Present)'
regexList = [regex1, regex2]
def diff_month(d1, d2):
return (d1.year - d2.year) * 12 + d1.month - d2.month
exp_ = 0
for regex in regexList:
for date_match in re.finditer(regex, line):
start_month = int(date_match.group(1))
start_year = int(date_match.group(2))
end_month = None
end_year = None
if date_match.group(3) == "Present":
today = datetime.today()
end_month = today.month.real
end_year = today.year.real
else:
end_month = int(date_match.group(4)) + 1 # +1 to get full month
end_year = int(date_match.group(5))
delta = relativedelta.relativedelta(datetime(end_year, end_month, 1), datetime(start_year, start_month, 1))
delta_months = delta.months + (12 * delta.years)
exp_ += delta_months
print("Total Experience = " + str(exp_ // 12) + " years " + str(exp_ % 12) + " months")
Result
Total Experience = 0 years 7 months
Sample Input:12/2020 - 04/2021you don't need those regexes you have already tried. The input is much complex than this as I see. And you will not be able to capture all scenarios on date patterns. I would also recommend NLP or a suitable ML model to capture these values.