I have a basic question about parsing using Python's parsec.py library.
I would like to extract the date somewhere inside a text. For e.g,
Lorem ipsum dolor sit amet. A number 42 is present here. But here is a date 11/05/2017. Can you extract this?
or
Lorem ipsum dolor sit amet.
A number 42 is present here.
But here is a date 11/05/2017. Can you extract this?
In both cases I want the parser to return 11/05/2017.
I only want to use parsec.py parsing library and I don't want to use regex. parsec's built in regex function is okay.
I tried something like
from parsec import *
ss = "Lorem ipsum dolor sit amet. A number 42 is present here. But here is a date 11/05/2017. Can you extract this?"
date_parser = regex(r'[0-9]{2}/[0-9]{2}/[0-9]{4}')
date = date_parser.parse(ss)
I get ParseError: expected [0-9]{2}/[0-9]{2}/[0-9]{4} at 0:0
Is there a way to ignore the text until the date_parser pattern has reached? Without erroring?
re.findall(r'[0-9]{2}/[0-9]{2}/[0-9]{4}', ss)will find all occurrences of your date pattern inss. This is not difficult to maintain, and is simpler than any solution involving parsec.