Optional Regex Component

Question

I am using a regex function to return four OR 5 new fields: Store name, Details, Reason (optional), Pause time start, and Pause time end. Reason does not show up in every case like the other four fields. If it does show up, then it is between Store and Details within the text itself.

I am currently using this code to find the four required fields (which works):

parser = re.compile(r"your store, ([^,]+).*Details: ([^\n]*).*Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)

df1['STORE']=''
df1['DETAILS']=''
df1['TIME_PAUSE_CREATED']=''
df1['TIME_PAUSE_END']=''

for index,i in enumerate(df1.DESCRIPTION):
    txt = parser_reg.findall(i)
    for field in txt:
        df1['STORE'][index]=field[0]
        df1['DETAILS'][index]=field[1]
        df1['TIME_PAUSE_CREATED'][index]=field[2]
        df1['TIME_PAUSE_END'][index]=field[3]

Is there a way to make an optional regex field and append that (else append 'Null') and continue scraping the other fields? I have tried using the following, but this only returns null values after store name:

parser = re.compile(r"your store, ([^,]+).*(Reason: ([^\n]*))?.*|Details: ([^\n]*).*)Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)

Ideally I would be able to add the same respective column for 'Reason' like the other fields, but the regex expression still isn't working for me.

Thank you!

"Hi, This is an automated email to notify you that your store, Restaurant Name, has been temporarily deactivated on DoorDash. Reason: Unspecified Details: Dasher reports store closed after speaking with a support agent. Store temporarily deactivated for the rest of the day after the agent attempted to call the store to confirm whether the store was open, and was either not able to reach the store to confirm. Deactivation Time Created at: Thursday, July 29, 2021 at 12:02 PM Scheduled end time: Friday, July 30, 2021 at 4:00 AM Thanks, DoorDash Merchant team" — cjwehr
– cjwehr, Commented Nov 5, 2021 at 21:01

Chris Maurer · Accepted Answer · 2021-11-06 05:25:48Z

1

I take it from your example that Reason: is not always supplied? That's OK, just add it as an optional (one or zero occurrences) group. If it's not present, that capture group will be null. Between Store and Details add (?:Reason: (.*?))?. The final question mark says the whole Reason: section can occur zero or one times, making it optional. The whole regex (after a little extra cleanup) should read:

your store, ([^,]+).*?(?:Reason: (.*?))?\sDetails: (.*?)(?:\sDeactivation Time)?\sCreated at: (.*?[AP]M).*Scheduled end time: (.*?[AP]M)

Remember that Reason: will now be in field[1] and the other capture groups will be shifted down one.

I included this regex scanning your example string above from the Regex101 website.

answered Nov 6, 2021 at 5:25

Chris Maurer

2,9321 gold badge13 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Optional Regex Component

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related