0

Needed help on regex to match multiple patterns but the code doesnt seem to be be working I want to extract the text matching the regex pattern for 'experience' in a resume

    regex1 = '(?P<fmonth>\w+.\d+)\s*(\D|to)\s*(?P<smonth>\w+.\d+|present)'
    regex2 = '(?P<day>\d{1,2})\s*(?P<tmonth>\w+.\d+)\s*(\D|-)\s*(?P<bmonth>\w+.\d+|present)'
    regex3 = '(0[1-9]|1[0-2])/?([0-9]{4})\s*(\D|-)\s*(0[1-9]|1[0-2])/?([0-9]{4})'
    regex4= '(\d{4}-\d{2})\s*(\D|-)\s*(\d{4}-\d{2}|present)'
    regexList = [regex1,regex2,regex3,regex4]
    for regex in regexList:
        # experience= re.findall(regex,line)
        experience = re.match(regex,line)
        exp_.append(experience)
        print(exp_)

But the match always returns none even though the date format matching in the resume is present

Sample Input:12/2020 - 04/2021

Desired Output: Need to calculate total experience using the above date range in a resume

4
  • 1
    Please add a sample input with the desired output to your question. Commented Jul 8, 2021 at 13:38
  • 1
    i would recommend you to use NLTK instead of regex in this particular case. Commented Jul 8, 2021 at 13:42
  • Am not aware much on NLTK will explore on this thanks Jonathan Commented Jul 8, 2021 at 13:51
  • to extract Sample Input:12/2020 - 04/2021 you don't need those regexes you have already tried. The input is much complex than this as I see. And you will not be able to capture all scenarios on date patterns. I would also recommend NLP or a suitable ML model to capture these values. Commented Jul 8, 2021 at 14:02

1 Answer 1

1

Despite the fact that the code in the question is not executable with some missing parts at the time of writing this answer, I tried something to help understand the problem.

I think you can achieve what you want by carefully creating capturing groups. Based on the simple input you provided Sample Input:12/2020 - 04/2021, I came up with this solution.

I have created 2 regexes in this example. They have a similar pattern up to capturing group 3. regex2 has a slightly different ending to capture a word instead of numbers causing it to not have capturing groups 4 and 5.

group1: captures start month

group2 : captures start year

group3 : captures full end date with regex1 or word Present with regex2

gruop4 : captures end month if end date is not equal to word Present

group5 : captures end year if end date is not equal to word Present

Note that I have not handled all the exceptions that could occur with various inputs.

import re
from datetime import datetime

from dateutil import relativedelta

line = """
12/2020 - 04/2021
05/2021 - Present
"""

regex1 = '(\d{2})\/(\d{4})\s-\s((\d{2})\/(\d{4}))'
regex2 = '(\d{2})\/(\d{4})\s-\s(Present)'
regexList = [regex1, regex2]


def diff_month(d1, d2):
    return (d1.year - d2.year) * 12 + d1.month - d2.month


exp_ = 0
for regex in regexList:
    for date_match in re.finditer(regex, line):
        start_month = int(date_match.group(1))
        start_year = int(date_match.group(2))
        end_month = None
        end_year = None
        if date_match.group(3) == "Present":
            today = datetime.today()
            end_month = today.month.real
            end_year = today.year.real
        else:
            end_month = int(date_match.group(4)) + 1  # +1 to get full month
            end_year = int(date_match.group(5))
        delta = relativedelta.relativedelta(datetime(end_year, end_month, 1), datetime(start_year, start_month, 1))
        delta_months = delta.months + (12 * delta.years)
        exp_ += delta_months

print("Total Experience = " + str(exp_ // 12) + " years " + str(exp_ % 12) + " months")

Result

Total Experience = 0 years 7 months
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot @pubudu i will surely look into the code you have shared am sure this will solve my problem :)
one more question when i read the entire resume as text and use line as the resume_text then the above code doesnt work
Why do you need to do it line by line when you can do it for the whole document at once? Anyway, you want the sum of total experience right? Please feed the whole text to the line variable and check. Better to rename that variable to make more sense.
Sure @pubudu thanks for the help i will try it out
Great. Let me know if that worked for you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.