2

I've tried to find free text query inside of user queries.

Let me give an example. User enters:

domain:example.com and Welcome to my website

Currently the output will be:

>> parser.parseString("domain:example.com and Welcome to my website")
([(['domain', ':', 'example.com'], {}), 'and Welcome to my website'], {})

My pyparsing code is:

word = pp.Word(pp.printables, excludeChars=":")
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]
free_text_search_res = parser.parseString(filters)

This is fine and works as expected. What I'm having issue with is that I need to also parse the below query correctly:

>> parser.parseString("domain:example.com and date:[2012-12-12 TO 2014-12-12] and Welcome to my website")
([(['domain', ':', 'example.com'], {}), 'and', (['date', ':', '[2012-12-12'], {}), 'TO 2014-12-12] and Welcome to my website'], {})

The date part is wrong. I expected to be ['date', ':', '[2012-12-12 TO 2014-12-12]']. Where I have done wrong?

4
  • You can see in the result that the string you’re imagining will be taken as a date has been split at the first space, which isn’t surprising as your definition of word is as a series of consecutive printable. If you want to use special syntax to denote a date then your pyparsing definition will have to include that, perhaps, a date starts with [ and ends with ] and can follow a : Commented Apr 14, 2021 at 6:37
  • @barny in printable, square brackets are also included. Is that what you mean? Or I need to explicitly say start and end of the word. [ word ] Commented Apr 14, 2021 at 6:43
  • Space isn’t included in printable - that’s why word Splits your text into words, isn’t it? Commented Apr 14, 2021 at 18:46
  • @barny yes, but including it makes my parser to not parse correctly. If you have a sample code I would appreciate it. Commented Apr 15, 2021 at 7:47

1 Answer 1

1
+50

You can try something like below

word = pp.Word(pp.printables, excludeChars=":")
word = ("[" + pp.Word(pp.printables+ " ", excludeChars=":[]") + "]") | word
non_tag = word + ~pp.FollowedBy(":")
# tagged value is two words with a ":"
tag = pp.Group(word + ":" + word)
# one or more non-tag words - use originalTextFor to get back
# a single string, including intervening white space
phrase = pp.originalTextFor(non_tag[1, ...])
parser = (phrase | tag)[...]

# free_text_search_res = parser.parseString(filters)
# tag.parseString("date:[2012-12-12 TO 2014-12-12]")

parser.parseString("domain:example.com and date:[2012-12-12 TO 2014-12-12] and Welcome to my website")

Will give you the below results

([(['domain', ':', 'example.com'], {}), 'and', (['date', ':', '[', '2012-12-12 TO 2014-12-12', ']'], {}), 'and Welcome to my website'], {})

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.