I want to get the date followed by DATE FILLED and REFILL from my text. Ordering of DATE FILLED or REFILL is not fixed. And date pattern can be:
6/23/20
6-23-20
My Python scripts is:
expiration_date_regex = re.compile(r"(USE\s+BY.*(?P<expiration>\d{1,2}/\d{1,2}/\d{2,4}))|(DATE\s+FILLED.*(?P<date_filled>\d{1,2}/\d{1,2}/\d{2,4}))", re.M)
find_matches(expiration_date_regex, text)
def find_matches(regex, text):
matches = regex.findall(text)
for match in matches:
print(match)
My text is:
CVS pharmacy
713-217 HsonSt
OTY: 90
REFILL 0 Refills
PRSCBN. A Beil
DATE FILLED 6/23/20
USE BY. 6/23/21
RPH Bill Liu
MFR AUROBINDO PHARM
ST DEA BC2236645
This is a WHITE
REDTME
But I'm getting output something like this, output is almost reasonable but I don't understand what do those first two empty strings means in the first tuple? same goes for last two string of the second tuple. It looks something like bitmask:
('', '', 'DATE FILLED 6/23/20', '6/23/20')
('USE BY. 6/23/21', '6/23/21', '', '')
(USE\s+BY|DATE\s+FILLED).*(?P<date>\d{1,2}/\d{1,2}/\d{2,4})(?i)(USE\s+BY|DATE\s+FILL(?:ED)?).*(?P<date>\d{1,2}[-/]\d{1,2}[-/]\d{2,4})