How parse words with brackets using Python Pyparsing?

Question

I've tried to find free text query inside of user queries.

Let me give an example. User enters:

domain:example.com and Welcome to my website

Currently the output will be:

>> parser.parseString("domain:example.com and Welcome to my website")
([(['domain', ':', 'example.com'], {}), 'and Welcome to my website'], {})

My pyparsing code is:

word = pp.Word(pp.printables, excludeChars=":")
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]
free_text_search_res = parser.parseString(filters)

This is fine and works as expected. What I'm having issue with is that I need to also parse the below query correctly:

>> parser.parseString("domain:example.com and date:[2012-12-12 TO 2014-12-12] and Welcome to my website")
([(['domain', ':', 'example.com'], {}), 'and', (['date', ':', '[2012-12-12'], {}), 'TO 2014-12-12] and Welcome to my website'], {})

The date part is wrong. I expected to be ['date', ':', '[2012-12-12 TO 2014-12-12]']. Where I have done wrong?

You can see in the result that the string you’re imagining will be taken as a date has been split at the first space, which isn’t surprising as your definition of word is as a series of consecutive printable. If you want to use special syntax to denote a date then your pyparsing definition will have to include that, perhaps, a date starts with [ and ends with ] and can follow a : — DisappointedByUnaccountableMod
– DisappointedByUnaccountableMod, Commented Apr 14, 2021 at 6:37
@barny in printable, square brackets are also included. Is that what you mean? Or I need to explicitly say start and end of the word. [ word ] — Alireza
– Alireza, Commented Apr 14, 2021 at 6:43
Space isn’t included in printable - that’s why word Splits your text into words, isn’t it? — DisappointedByUnaccountableMod
– DisappointedByUnaccountableMod, Commented Apr 14, 2021 at 18:46
@barny yes, but including it makes my parser to not parse correctly. If you have a sample code I would appreciate it. — Alireza
– Alireza, Commented Apr 15, 2021 at 7:47

Tarun Lalwani · Accepted Answer · 2021-04-15 16:57:51Z

1

+50

You can try something like below

word = pp.Word(pp.printables, excludeChars=":")
word = ("[" + pp.Word(pp.printables+ " ", excludeChars=":[]") + "]") | word
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]

# free_text_search_res = parser.parseString(filters)
# tag.parseString("date:[2012-12-12 TO 2014-12-12]")

parser.parseString("domain:example.com and date:[2012-12-12 TO 2014-12-12] and Welcome to my website")

Will give you the below results

([(['domain', ':', 'example.com'], {}), 'and', (['date', ':', '[', '2012-12-12 TO 2014-12-12', ']'], {}), 'and Welcome to my website'], {})

answered Apr 15, 2021 at 16:57

Tarun Lalwani

147k11 gold badges218 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How parse words with brackets using Python Pyparsing?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related