I am using a regex function to return four OR 5 new fields: Store name, Details, Reason (optional), Pause time start, and Pause time end. Reason does not show up in every case like the other four fields. If it does show up, then it is between Store and Details within the text itself.
I am currently using this code to find the four required fields (which works):
parser = re.compile(r"your store, ([^,]+).*Details: ([^\n]*).*Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)
df1['STORE']=''
df1['DETAILS']=''
df1['TIME_PAUSE_CREATED']=''
df1['TIME_PAUSE_END']=''
for index,i in enumerate(df1.DESCRIPTION):
txt = parser_reg.findall(i)
for field in txt:
df1['STORE'][index]=field[0]
df1['DETAILS'][index]=field[1]
df1['TIME_PAUSE_CREATED'][index]=field[2]
df1['TIME_PAUSE_END'][index]=field[3]
Is there a way to make an optional regex field and append that (else append 'Null') and continue scraping the other fields? I have tried using the following, but this only returns null values after store name:
parser = re.compile(r"your store, ([^,]+).*(Reason: ([^\n]*))?.*|Details: ([^\n]*).*)Created at: ([^\n]*).*Scheduled end time: ([^\n]*)", flags=re.DOTALL | re.MULTILINE)
Ideally I would be able to add the same respective column for 'Reason' like the other fields, but the regex expression still isn't working for me.
Thank you!
