1

There is a student assignment file with a file naming pattern like this:

160608726_Task Basic Programming_02_02.pdf
200610612_Task PTI_12_01.xls
180609074_Task Industrial Automation 1_04_04.doc

I want to identify the Student ID Number owner of the file, the batch of the student, what is the name of the course, what week and what extension the file is.

Student batch are obtained from 2 first numbers(example : 20 , then the year is 2020, if 18, then the year is 2018)

Output program for 200610612_Task PTI_12_01.xls file name:

This assignment belongs to a student with Student ID Number 200610612.
These students are the 2020 batch.
The subject is PTI
This assignment is for the 12th week
The courses are held in semester 1
This file extension is xls.

I have tried several syntax, the last syntax is

txt = "200610612_Tugas PTI_12_01.xls"
x = re.findall (r"\d{2}", 'txt')
print (x)

but the output is [].

1 Answer 1

1

I suggest

^(?P<StudentIDNumber>(?P<batch>\d{2})\d*)_Task\s+(?P<Subject>[^_]+)_(?P<week>\d+)_(?P<Semester>\d+)\.(?P<Extension>\w+)$

See the regex demo. Details:

  • ^ - start of string
  • (?P<StudentIDNumber>(?P<batch>\d{2})\d*) - Group "StudentIDNumber": two or more digits, and Group "batch" capturing the first two digits
  • _Task - a literal string (use \w+ if the string is unknown)
  • \s+ - one or more whitespaces
  • (?P<Subject>[^_]+) - Group "Subject": one or more chars other than _
  • _ - a _ char
  • (?P<week>\d+) - Group "week": one or more digits
  • _ - a _ char
  • (?P<Semester>\d+) - Group "Semester": one or more digits
  • \. - a dot
  • (?P<Extension>\w+) - Group "Extension": one or more word chars
  • $ - end of string.

See the Python demo:

import re
rx = re.compile(r'^(?P<StudentIDNumber>(?P<batch>\d{2})\d*)_Task\s+(?P<Subject>[^_]+)_(?P<week>\d+)_(?P<Semester>\d+)\.(?P<Extension>\w+)$')
files = ['160608726_Task Basic Programming_02_02.pdf',
'200610612_Task PTI_12_01.xls',
'180609074_Task Industrial Automation 1_04_04.doc']
for fn in files:
    print(f"Parsing {fn}")
    res = rx.match(fn)
    if res:
        resdict = res.groupdict()
        resdict["batch"] = f"20{resdict['batch']}"
        print(resdict)

Output:

Parsing 160608726_Task Basic Programming_02_02.pdf
{'StudentIDNumber': '160608726', 'batch': '2016', 'Subject': 'Basic Programming', 'week': '02', 'Semester': '02', 'Extension': 'pdf'}
Parsing 200610612_Task PTI_12_01.xls
{'StudentIDNumber': '200610612', 'batch': '2020', 'Subject': 'PTI', 'week': '12', 'Semester': '01', 'Extension': 'xls'}
Parsing 180609074_Task Industrial Automation 1_04_04.doc
{'StudentIDNumber': '180609074', 'batch': '2018', 'Subject': 'Industrial Automation 1', 'week': '04', 'Semester': '04', 'Extension': 'doc'}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.