0

How to extract the string before and after some specific string? and only extract 12 digit numbers for roll no?

input_file ="my bday is on 04/01/1997 and 
            frnd bday on 28/12/2018, 
            account no is A000142116 and 
            valid for 30 days for me and 
            for my frnd only 4 DAYS.my roll no is 130302101786
            and register number is 1600523941. Admission number is 
            181212001103" 

for line in input_file:
    m1 = re.findall(r"[\d]{1,2}/[\d]{1,2}/[\d]{4}", line)
    m2 = re.findall(r"A(\d+)", line)
    m3 = re.findall(r"(\d+)days", line)
    m4 = re.findall(r"(\d+)DAYS", line)
    m5 = re.findall(r"(\d+)", line)
    m6 = re.findall(r"(\d+)", line)
    m7 = re.findall(r"(\d+)", line)
    for date_n in m1:
       print(date_n)
    for account_no in m2:
       print(account_no)
    for valid_days in m3:
       print(valid_days)
    for frnd_DAYS in m4:
       print(frnd_DAYS)
    for roll_no in m5:
       print(roll_no)
    for register_no in m6:
       print(register_no)
    for admission_no in m7:
       print(admission_no)

Expected Output:

04/01/1997
28/12/2018
A000142116
30 days
4 DAYS
130302101786
1600523941
181212001103

2 Answers 2

1

Use one expression for all of them:

\b[A-Z]?\d[/\d]*\b(?:\s+days)?

See a demo on regex101.com.
You'd need to precisize the "account number" format here.

Sign up to request clarification or add additional context in comments.

Comments

0

I would use a regex pattern with an alternation for all your possible matches:

\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}

This matches either a date, a number of days, or an account number. For account numbers, I assume that there are of length 10 or greater, consisting only of letters and numbers.

input_file = """my bday is on 04/01/1997 and 
                frnd bday on 28/12/2018, 
                account no is A000142116 and 
                valid for 30 days for me and 
                for my frnd only 4 DAYS.my roll no is 130302101786
                and register number is 1600523941. Admission number is 
                181212001103"""

results = re.findall(r'\d{2}/\d{2}/\d{4}|\d+ days|[A-Z0-9]{10,}', input_file, flags=re.IGNORECASE)
print(results)

['04/01/1997', '28/12/2018', 'A000142116', '30 days', '4 DAYS', '130302101786',
 '1600523941', '181212001103']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.