Lucene: exception - Query parser encountered <EOF> after "some word"

Question

I am working on a classification problem to classify product reviews as positive, negative or neutral as per the training data using Lucene API.

I am using an ArrayList of Review objects - "reviewList" that stores the attributes for each review while crawling the web pages.

The review attributes which include "polarity" & "review content" are then indexed using the indexer. Thereafter, based on the indexes objects, I need to classify the remaining review objects. But while doing so, there is a review object for which the Query parser is encountering an EOF character in the "review content", and hence terminating.

The line causing error has been commented accordingly -

    IndexReader reader = IndexReader.open(FSDirectory.open(new File("index")));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_31);
    QueryParser parser = new QueryParser(Version.LUCENE_31, "Review", analyzer);

    int length = Crawler.reviewList.size();
    for (int i = 200; i < length; i++) {
        String true_class;
        double r_stars = Crawler.reviewList.get(i).getStars();

        if (r_stars < 2.0) {
            true_class = "-1";
        } else if (r_stars > 3.0) {
            true_class = "1";
        } else {
            true_class = "0";
        }

        String[] reviewTokens = Crawler.reviewList.get(i).getReview().split(" ");
        String parsedReview = "";

        int j;

        for (j = 0; j < reviewTokens.length; j++) {
            if (reviewTokens[j] != null) {
                if (!((reviewTokens[j].contains("-")) || (reviewTokens[j].contains("!")))) {
                    parsedReview += reviewTokens[j] + " ";
                }
            } else {
                break;
            }
        }

        Query query = parser.parse(parsedReview); // CAUSING ERROR!!

        TopScoreDocCollector results = TopScoreDocCollector.create(5, true);
        searcher.search(query, results);
        ScoreDoc[] hits = results.topDocs().scoreDocs;

I've parsed the text manually to remove the characters that are causing the error, apart from checking if the next string is null...but the error persists.

This is the error stack trace -

Exception in thread "main" org.apache.lucene.queryParser.ParseException: Cannot parse 'I made the choice ... be all "thumbs ': Lexical error at line 1, column 938.  Encountered: <EOF> after : "\"thumbs "
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
at Sentiment_Analysis.Classification.classify(Classification.java:58)
at Sentiment_Analysis.Main.main(Main.java:17)
Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 938.  Encountered: <EOF> after : "\"thumbs "
at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
at org.apache.lucene.queryParser.QueryParser.jj_scan_token(QueryParser.java:1709)
at org.apache.lucene.queryParser.QueryParser.jj_3R_2(QueryParser.java:1598)
at org.apache.lucene.queryParser.QueryParser.jj_3_1(QueryParser.java:1605)
at org.apache.lucene.queryParser.QueryParser.jj_2_1(QueryParser.java:1585)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1280)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1266)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1313)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1266)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1226)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
... 2 more
Java Result: 1

Please help me solve this problem...have been banging my head with this for hours now!

John Topley · Accepted Answer · 2013-12-05 10:35:00Z

37

You should escape the double quote and other special characters via

Query query = parser.parse(QueryParser.escape(parsedReview));

As the QueryParser.escape Javadoc suggested,

Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding '\'.

edited Dec 5, 2013 at 10:35

John Topley

116k48 gold badges200 silver badges241 bronze badges

answered Apr 21, 2012 at 14:45

Pau Kiat Wee

9,49544 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Reema Over a year ago

Thanks a ton! It was spot on.. :D

Chunliang Lyu Over a year ago

For those who use a more recent releases(Lucene 4.6 for me), the escape function has been moved to QueryParserUtil class.

Divyang Shah Over a year ago

I want to make this using solr library instead of lucene library, any idea?

Superole Over a year ago

@ChunliangLyu in Lucene 4.10.4 escape() is still in QueryParser (inherited from QueryParserBase), but there is also one in QueryParserUtil as you mention. -I wonder what the difference is..?

Chunliang Lyu Over a year ago

@Superole Yes you are right, the QueryParser inherits the method from QueryParserBase. I have checked the implementations QueryParserBase and QueryParserUtil in the current revision, turns out they are exactly the same. So no functionality difference, perhaps some tiny little performance difference.

|

WonderWorker · Accepted Answer · 2017-06-05 12:43:03Z

2

I recognise this problem.

Declaring the GROUP BY before the WHERE declaration works fine in Teradata, but throws an error while parsing.

To fix, move the GROUP BY declaration after the WHERE declaration.

edited Jun 5, 2017 at 12:43

WonderWorker

9,2025 gold badges70 silver badges75 bronze badges

answered Jun 5, 2017 at 12:17

Rishabh Sharma

338 bronze badges

Collectives™ on Stack Overflow

Lucene: exception - Query parser encountered <EOF> after "some word"

2 Answers 2

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related