Lucene QueryParser : Parse multi-term string without analyzing

Question

I serialized a BooleanQuery constructed using TermQuery's into a string. Now I am trying to de-serialize the string back into a BooleanQuery on a different node in a distributed system. So while de-serializing, I have multiple fields and I do not want to use an analyzer

Eg : I am trying to parse the below string without analyzing

+contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

QueryParser in lucene requires an analyzer, but I want the above field values to be treated as terms. I am looking for a query parser which does something like the below since I do not want to parse the strings and construct the query myself.

TermQuery q1 = new TermQuery(new Term("contents", "maxItemsPerBlock"));
TermQuery q2 = new TermQuery(new Term("path", "/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java"));
BooleanQuery q = new BooleanQuery();
q.add(q1, BooleanClause.Occur.MUST);
q.add(q2, BooleanClause.Occur.MUST);

Also when I tried using a whitespace analyzer with a QueryParser, I got an "IllegalArgumentException : field must not be null" error. Below is the sample code

Analyzer analyzer = new WhitespaceAnalyzer();
String field = "contents";
QueryParser parser = new QueryParser(null, analyzer);
Query query = parser.parse("+contents:maxItemsPerBlock +path:/home/rchallapalli/Desktop/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java");

java.lang.IllegalArgumentException: field must not be null
at org.apache.lucene.search.MultiTermQuery.<init>(MultiTermQuery.java:233)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:99)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:81)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:108)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:93)
at org.apache.lucene.queryparser.classic.QueryParserBase.newRegexpQuery(QueryParserBase.java:572)
at org.apache.lucene.queryparser.classic.QueryParserBase.getRegexpQuery(QueryParserBase.java:774)
at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:844)
at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:348)
at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:247)
at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:202)
at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:160)
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:117)

The error message seems fairly clear: The first arg to the QueryParser ctor can't be null. If you don't care about the field argument, just pass it a garbage field name: new QueryParser("none", analyzer); — femtoRgon
– femtoRgon, Commented Aug 4, 2015 at 16:02

Allen Chou · Accepted Answer · 2015-08-03 10:54:55Z

1

Considering the text you offer in your question. Maybe WhitespaceAnalyzer which splits tokens at whitespace is a choice.

Before you serialize the BooleanQuery constructed by TermQuery, the term in TermQuery is actually what you want to match in the Lucene Index.

// code in Scala
val parser = new QueryParser(version, "", new WhitespaceAnalyzer((version)))
val parsedQuery = parser.parse(searchString)

I tried the following two cases: single-value field and multi-valued field, all work.

 +contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

 +(contents:maxItemsPerBlock contents:minItemsPerBlock) +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

Besides, in our system the serialization and deserialization when it comes to Query passing between nodes are based on java's ObjectInputStream and ObjectOutputStream. So you may try in that way so you don't have to consider the Analyzer thing.

edited Aug 3, 2015 at 10:54

answered Aug 3, 2015 at 5:11

Allen Chou

1,2371 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

rahul Over a year ago

Thanks for your reply. In my case I do not know the fields that would be involved in the query to use the MultiFieldQueryParser. I am trying to avoid parsing the query string and then building the query with TermQuery since I have to repeat most of the code in the query parser

Allen Chou Over a year ago

@rahul, so if you do not know the fields involved in the query, how can you construct the query String passed in the ``` public Query parse(String query) ```. In your case， if you don't know you are gonna match "contents", "path", how can you make the query string "+contents:.... +path:....".

rahul Over a year ago

More Info : First I serialized a BooleanQuery constructed using TermQuery's into a string. Now I am trying to de-serialize the string back into a BooleanQuery on a different node is a distributed system. So while de-serializing, I have multiple fields and I do not want to use an analyzer

Allen Chou Over a year ago

@rahul, thanks for the extra info, now I get it. I just edited my answer, maybe it's still not helping but at least more relevant. I suggest you put that extra info in your question so others may understand your problem better and help you solve it.

rahul Over a year ago

There is one problem with using a whitespace analyzer. If I indexed the first sentence in this comment, then I cannot search for "analyzer". I have to search for "analyzer." (with a DOT at the end)

|

Collectives™ on Stack Overflow

Lucene QueryParser : Parse multi-term string without analyzing

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related