2

I'm putting together a proof of concept for Fulltext search in our application using Lucene.NET. Some queries work fine, some seem to return results that don't match what the Luke tool is returning. More problematically, this query:

(Description:tasty) (Gtin:00018389732061)

always yields this exception:

An unhandled exception of type 'System.IndexOutOfRangeException' occurred in Lucene.Net.dll at Lucene.Net.Search.TermScorer.Score() in d:\Lucene.Net\FullRepo\trunk\src\core\Search\TermScorer.cs:line 136 at Lucene.Net.Search.BooleanScorer.BooleanScorerCollector.Collect(Int32 doc) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 88 at Lucene.Net.Search.TermScorer.Score(Collector c, Int32 end, Int32 firstDocID) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\TermScorer.cs:line 80
at Lucene.Net.Search.BooleanScorer.Score(Collector collector, Int32 max, Int32 firstDocID) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 323 at Lucene.Net.Search.BooleanScorer.Score(Collector collector) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\BooleanScorer.cs:line 389 at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter filter, Collector collector) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 228 at Lucene.Net.Search.IndexSearcher.Search(Weight weight, Filter filter, Int32 nDocs) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\IndexSearcher.cs:line 188 at Lucene.Net.Search.Searcher.Search(Query query, Filter filter, Int32 n) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\Searcher.cs:line 108 at Lucene.Net.Search.Searcher.Search(Query query, Int32 n) in d:\Lucene.Net\FullRepo\trunk\src\core\Search\Searcher.cs:line 118
at...

If I use this query instead:

(Description:tasty) (Gtin:000)

I get results back. What is causing the exception in the top query? FWIW, here is the relevant code snippet:

protected virtual IList<Document> GetDocuments(BooleanQuery query, DirectoryInfo indexLocation, string defaultField)
        {
            var docs = new List<Document>();

            using (var dir = new MMapDirectory(indexLocation))
            {
                using (var searcher = new IndexSearcher(dir))
                {                        
                    var queryParser = new QueryParser(Constants.LuceneVersion, defaultField, new StandardAnalyzer(Constants.LuceneVersion));
                    TopDocs result = searcher.Search(query, Constants.MaxHits);

                    if (result == null) return docs;

                    foreach (var scoredoc in result.ScoreDocs.OrderByDescending(d => d.Score))
                    {
                        docs.Add(searcher.Doc(scoredoc.Doc));
                    }
                    return docs;
                }
            }
        }

Based on comments below, here is my current un-edited code that still doesn't work.

protected virtual IList<Document> GetDocuments(BooleanQuery query, DirectoryInfo indexLocation, string defaultField)
        {
            var docs = new List<Document>();

            using (var dir = new MMapDirectory(indexLocation))
            {
                using (var searcher = new IndexSearcher(dir))
                {
                    using (var analyzer = new StandardAnalyzer(Constants.LuceneVersion))
                    {
                        var queryParser = new QueryParser(Constants.LuceneVersion, defaultField, analyzer);
                        var collector = TopScoreDocCollector.Create(Constants.MaxHits, true);
                        var parsed = queryParser.Parse(query.ToString());
                        searcher.Search(parsed, collector);

                        var docsresult = new List<string>();
                        var matches = collector.TopDocs().ScoreDocs;
                        foreach (var scoredoc in matches.OrderByDescending(d => d.Score))
                        {
                            docs.Add(searcher.Doc(scoredoc.Doc));
                        }
                        return docs;
                    }
                }
            }
        }
7
  • Additionally, this query: +(Description:tasty) +Gtin:000* returns no hits on my Lucene.NET impl, while Luke (correctly) returns 11 matching documents. Commented Mar 9, 2016 at 15:57
  • Is "Gtin" indexed as a string or numeric field? Commented Mar 20, 2016 at 16:48
  • 1
    Luke will often return "different" results as the analyzer is often different from how the fields were indexed Commented Mar 20, 2016 at 16:51
  • your example doesn't actually parse the query. It uses the BooleanQuery that's passed in. Commented Mar 21, 2016 at 13:16
  • 1
    Any multi-part query will end up as a BooleanQuery. Can I assume that the queries you've been referring to are the "ToString()" of the "query" arg? What is this "fluent query builder"? I've been using Lucene.net for many years. I've never had good experiences with 3rd party builders for anything other than the simple. I've always ended up either generating a query string and parsing it or, rarely, building the Query object graph directly. It's going to be hard to go further unless you can provide a fuller example somewhat like my Answer below. Commented Mar 21, 2016 at 18:48

1 Answer 1

1

Not strictly an answer as it "works on my machine". Posting as an answer so that I can share the unit test code that "works". Hopefully the OP can show what is different with their version.

This version assumes that the "Gtin" field is a string field and is not analyzed (as it's seems to be a code).

[TestClass]
public class UnitTest4
{
    [TestMethod]
    public void TestLucene()
    {
        var writer = CreateIndex();
        Add(writer, "tasty", "00018389732061");
        writer.Flush(true, true, true);

        var searcher = new IndexSearcher(writer.GetReader());
        Test(searcher, "(Description:tasty) (Gtin:00018389732061)");
        Test(searcher, "Description:tasty Gtin:00018389732061");
        Test(searcher, "+Description:tasty +Gtin:00018389732061");
        Test(searcher, "+Description:tasty +Gtin:000*");

        writer.Dispose();
    }

    private void Test(IndexSearcher searcher, string query)
    {
        var result = Search(searcher, query);
        Console.WriteLine(string.Join(", ", result));
        Assert.AreEqual(1, result.Count);
        Assert.AreEqual("00018389732061", result[0]);
    }

    private List<string> Search(IndexSearcher searcher, string expr)
    {
        using (var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30))
        {
            var queryParser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "Description", analyzer);
            var collector = TopScoreDocCollector.Create(1000, true);
            var query = queryParser.Parse(expr);
            searcher.Search(query, collector);

            var result = new List<string>();
            var matches = collector.TopDocs().ScoreDocs;
            foreach (var item in matches)
            {
                var id = item.Doc;
                var doc = searcher.Doc(id);
                result.Add(doc.GetField("Gtin").StringValue);
            }
            return result;
        }
    }

    IndexWriter CreateIndex()
    {
        var directory = new RAMDirectory();

        var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
        var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000));

        return writer;
    }
    void Add(IndexWriter writer, string desc, string id)
    {
        var document = new Document();
        document.Add(new Field("Description", desc, Field.Store.YES, Field.Index.ANALYZED));
        document.Add(new Field("Gtin", id, Field.Store.YES, Field.Index.NOT_ANALYZED));

        writer.AddDocument(document);
    }
}
Sign up to request clarification or add additional context in comments.

7 Comments

I apparently whiffed on the GTIN thing. You are right. That's a code field. Did not realize the analyze setting should be set to false. Gonna try that now...
OK, I tried a slight variation of yours. I converted GTIN to not be analyzed. Also I am converting the query to text. No joy. Same error bubbling up out of the stack. Note that I have a physical index, not in-mem so code is a little different...ack, code is too long, I will try to edit my original post
any field that contains a code/key/id type thing should be NOT_ANALYZED. The type of Directory used in the eample code will not make any difference to the behaviour. However, in lucene.net I don't think MMAPDirectory actually has any any benefits (there may even be bugs) could you try with a simple FSDirectory. Windows does a pretty good job of caching file contents. As the segment files are write once then readonly, you get pretty good perf characteristics anyway.
Apparently that was it! It appears that FSDirectory is a base class so I used one of the derived classes - SimpleFSDirectory. Magically it started working.
Ha! That'll teach me for making assumptions! BTW you shouldn't create SimpleFSDirectory directly. Use "FSDirectory.Open" (static method)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.