4

Can I use Lucene to query an ElasticSearch index?

Using ElasticSearch I created an index and inserted these three documents:

$ curl -XPOST localhost:9200/index1/type1 -d '{"f1":"dog"}'
$ curl -XPOST localhost:9200/index1/type2 -d '{"f2":"cat"}'
$ curl -XPOST localhost:9200/index1/type2 -d '{"f3":"horse"}'

So, I have one index, two types, and three documents. Now, I would like to search for these using standard Lucene. Using a hex editor, I identified which shard has the indexed documents, and I can successfully query that index. I can't figure out though, how to retrieve the field values from the matching document(s).

The following program successfully searches but is unable to retrieve results.

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

import java.io.File;

public class TestES {

void doWork(String[] args) throws Exception {
    // Index reader for already created ElasticSearch index
    String indx1 = "/path-to-index/elasticsearch-0.90.0.RC2-SNAPSHOT/data/elasticsearch/nodes/0/indices/index1/1/index";
    Directory index = FSDirectory.open(new File(indx1));
    IndexReader reader = DirectoryReader.open(index);
    IndexSearcher searcher = new IndexSearcher(reader);

    // Looks like the query is correct since we do get a hit
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);
    Query q = new QueryParser(Version.LUCENE_41, "f2", analyzer).parse("cat");
    TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
    searcher.search(q, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;

    // We do get a hit, but results always displayed as null except for "_uid"
    if (hits.length > 0) {
        int docId = hits[0].doc;
        Document d = searcher.doc(docId);
        System.out.println("DocID " + docId + ", _uid: " + d.get("_uid") );
        System.out.println("DocID " + docId + ", f2: " + d.get("f2") );
    }
    reader.close();
}

public static void main(String[] args) throws Exception {
  TestES hl = new TestES();
  hl.doWork(args);
}
}

Results:
DocID 0, _uid: type2#3K5QXeZhQnit9UXM9_4bng
DocID 0, f2: null

The _uid value above is correct.

Eclipse shows me that the variable Document d does have two fields:

  • stored,indexed,tokenized,omitNorms<_uid:type2#3K5QXeZhQnit9UXM9_4bng>
  • stored<_source:[7b 22 66 32 22 3a 22 63 61 74 22 7d]>

Unfortunately, d.get("_source") also returns null.

How can I retrieve the document fields for a matching query?

Thank you.

6
  • Well, first off I would ask you why you're making your like harder than it should be :) Anyways, you're doing it right, the _source field is stored by default and contains the whole document you sent to elasticsearch. You have to retrieve it and parse it as a json document. Don't know why you get null. Did you make sure you're using the right lucene version? Commented May 30, 2013 at 8:37
  • I was afraid someone would ask that question :) Yes, I verified that I am running elasticsearch-0.90.0.RC2-SNAPSHOT/bin and the Lucene jars are in elasticsearch-0.90.0.RC2-SNAPSHOT/lib. I still cannot retrieve "_source" Commented May 30, 2013 at 11:06
  • 1
    Ah, I got it. Interestingly, I needed to retrieve the field "_source" as a binary value. So this worked: d.getBinaryValue("_source") and it retrieved [7b 22 66 32 22 3a 22 63 61 74 22 7d] which is {"f2":"cat"} Commented May 30, 2013 at 11:36
  • Right, sure! Missed that at first glance. Maybe you can post it as your own answer since you solved! Commented May 30, 2013 at 14:27
  • I wonder if there is a way to scope the Lucene query by ElasticSearch type. For example, is something like QueryParser(Version.LUCENE_41, "type1/f2", analyzer).parse("cat") ? Commented May 31, 2013 at 14:32

1 Answer 1

2

As stated in the comment, I needed to retrieve the field "_source" as a binary value. So this worked: d.getBinaryValue("_source") and it retrieved [7b 22 66 32 22 3a 22 63 61 74 22 7d] which is {"f2":"cat"}. Javanna, thanks for helping.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.