0

I am using a regex function to return four OR 5 new fields: Store name, Details, Reason (optional), Pause time start, and Pause time end. Reason does not show up in every case like the other four fields. If it does show up, then it is between Store and Details within the text itself.

I am currently using this code to find the four required fields (which works):

parser = re.compile(r"your store, ([^,]+).*Details: ([^\n]*).*Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)

df1['STORE']=''
df1['DETAILS']=''
df1['TIME_PAUSE_CREATED']=''
df1['TIME_PAUSE_END']=''

for index,i in enumerate(df1.DESCRIPTION):
    txt = parser_reg.findall(i)
    for field in txt:
        df1['STORE'][index]=field[0]
        df1['DETAILS'][index]=field[1]
        df1['TIME_PAUSE_CREATED'][index]=field[2]
        df1['TIME_PAUSE_END'][index]=field[3]

Is there a way to make an optional regex field and append that (else append 'Null') and continue scraping the other fields? I have tried using the following, but this only returns null values after store name:

parser = re.compile(r"your store, ([^,]+).*(Reason: ([^\n]*))?.*|Details: ([^\n]*).*)Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)

Ideally I would be able to add the same respective column for 'Reason' like the other fields, but the regex expression still isn't working for me.

Thank you!

3
  • Do you have any real world input strings? Commented Nov 5, 2021 at 20:24
  • "Hi, This is an automated email to notify you that your store, Restaurant Name, has been temporarily deactivated on DoorDash. Reason: Unspecified Details: Dasher reports store closed after speaking with a support agent. Store temporarily deactivated for the rest of the day after the agent attempted to call the store to confirm whether the store was open, and was either not able to reach the store to confirm. Deactivation Time Created at: Thursday, July 29, 2021 at 12:02 PM Scheduled end time: Friday, July 30, 2021 at 4:00 AM Thanks, DoorDash Merchant team" Commented Nov 5, 2021 at 21:01
  • @cjw528 Can you add the example string to the question? Commented Nov 5, 2021 at 21:05

1 Answer 1

1

I take it from your example that Reason: is not always supplied? That's OK, just add it as an optional (one or zero occurrences) group. If it's not present, that capture group will be null. Between Store and Details add (?:Reason: (.*?))?. The final question mark says the whole Reason: section can occur zero or one times, making it optional. The whole regex (after a little extra cleanup) should read:

your store, ([^,]+).*?(?:Reason: (.*?))?\sDetails: (.*?)(?:\sDeactivation Time)?\sCreated at: (.*?[AP]M).*Scheduled end time: (.*?[AP]M)

Remember that Reason: will now be in field[1] and the other capture groups will be shifted down one.

I included this regex scanning your example string above from the Regex101 website. enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.