0

I serialized a BooleanQuery constructed using TermQuery's into a string. Now I am trying to de-serialize the string back into a BooleanQuery on a different node in a distributed system. So while de-serializing, I have multiple fields and I do not want to use an analyzer

Eg : I am trying to parse the below string without analyzing

+contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

QueryParser in lucene requires an analyzer, but I want the above field values to be treated as terms. I am looking for a query parser which does something like the below since I do not want to parse the strings and construct the query myself.

TermQuery q1 = new TermQuery(new Term("contents", "maxItemsPerBlock"));
TermQuery q2 = new TermQuery(new Term("path", "/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java"));
BooleanQuery q = new BooleanQuery();
q.add(q1, BooleanClause.Occur.MUST);
q.add(q2, BooleanClause.Occur.MUST);

Also when I tried using a whitespace analyzer with a QueryParser, I got an "IllegalArgumentException : field must not be null" error. Below is the sample code

Analyzer analyzer = new WhitespaceAnalyzer();
String field = "contents";
QueryParser parser = new QueryParser(null, analyzer);
Query query = parser.parse("+contents:maxItemsPerBlock +path:/home/rchallapalli/Desktop/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java");

java.lang.IllegalArgumentException: field must not be null
at org.apache.lucene.search.MultiTermQuery.<init>(MultiTermQuery.java:233)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:99)
at org.apache.lucene.search.AutomatonQuery.<init>(AutomatonQuery.java:81)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:108)
at org.apache.lucene.search.RegexpQuery.<init>(RegexpQuery.java:93)
at org.apache.lucene.queryparser.classic.QueryParserBase.newRegexpQuery(QueryParserBase.java:572)
at org.apache.lucene.queryparser.classic.QueryParserBase.getRegexpQuery(QueryParserBase.java:774)
at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:844)
at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:348)
at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:247)
at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:202)
at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:160)
at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:117)
1
  • The error message seems fairly clear: The first arg to the QueryParser ctor can't be null. If you don't care about the field argument, just pass it a garbage field name: new QueryParser("none", analyzer); Commented Aug 4, 2015 at 16:02

1 Answer 1

1

Considering the text you offer in your question. Maybe WhitespaceAnalyzer which splits tokens at whitespace is a choice.

Before you serialize the BooleanQuery constructed by TermQuery, the term in TermQuery is actually what you want to match in the Lucene Index.

// code in Scala
val parser = new QueryParser(version, "", new WhitespaceAnalyzer((version)))
val parsedQuery = parser.parse(searchString) 

I tried the following two cases: single-value field and multi-valued field, all work.

 +contents:maxItemsPerBlock +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

 +(contents:maxItemsPerBlock contents:minItemsPerBlock) +path:/lucene-5.1.0/core/src/java/org/apache/lucene/codecs/blocktree/Stats.java

Besides, in our system the serialization and deserialization when it comes to Query passing between nodes are based on java's ObjectInputStream and ObjectOutputStream. So you may try in that way so you don't have to consider the Analyzer thing.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for your reply. In my case I do not know the fields that would be involved in the query to use the MultiFieldQueryParser. I am trying to avoid parsing the query string and then building the query with TermQuery since I have to repeat most of the code in the query parser
@rahul, so if you do not know the fields involved in the query, how can you construct the query String passed in the ``` public Query parse(String query) ```. In your case, if you don't know you are gonna match "contents", "path", how can you make the query string "+contents:.... +path:....".
More Info : First I serialized a BooleanQuery constructed using TermQuery's into a string. Now I am trying to de-serialize the string back into a BooleanQuery on a different node is a distributed system. So while de-serializing, I have multiple fields and I do not want to use an analyzer
@rahul, thanks for the extra info, now I get it. I just edited my answer, maybe it's still not helping but at least more relevant. I suggest you put that extra info in your question so others may understand your problem better and help you solve it.
There is one problem with using a whitespace analyzer. If I indexed the first sentence in this comment, then I cannot search for "analyzer". I have to search for "analyzer." (with a DOT at the end)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.