0

I need a little help regarding lucene index files, thought, maybe some of you guys can help me out.

I have json like this:

[
{
    "Id": 4476,
    "UrlName": null,
    "PhoneData": [
        {
            "PhoneType": "O",
            "PhoneNumber": "0065898",
        },
        {
           "PhoneType": "F",
            "PhoneNumber": "0065898",
        }
    ],
    "Contact": [],
    "Services": [
        {
            "ServiceId": 10,
            "ServiceGroup": 2
        },
        {
            "ServiceId": 20,
            "ServiceGroup": 1
        }
    ],
}

]

Adding first two fields is relatively easy:

// add lucene fields mapped to db fields
        doc.Add(new Field("Id", sampleData.Id.Value.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));
        doc.Add(new Field("UrlName", sampleData.UrlName.Value ?? "null" , Field.Store.YES, Field.Index.ANALYZED));

But how I can add PhoneData and Services to index so it can be connected to unique Id??

4
  • Not sure about Lucene. But in Solr I just flatten those json objects and index it. Commented Sep 21, 2013 at 14:26
  • Hm, I must say that I never used Solr, but as I remember under the hood of Solr is Lucene. Anyway, I think that I need to explore Solr in moredetails because I saw here on StackOvreflow everyone mentioning Solr. :-) Do u have any examples doing this in Solr? Thakns Commented Sep 21, 2013 at 14:33
  • In solr I'd add PhoneData_PhoneType and similarly flatten others too. Commented Sep 21, 2013 at 14:45
  • Given that is the only one, could you please, accept my answer as the best? (its 15 points) Thanks Commented Feb 17, 2016 at 8:08

1 Answer 1

2

For indexing JSON objects I would go this way:

  1. Store the whole value under a payload field, named for example $json. This field would be stored but not indexed.
  2. For each (indexable) property (maybe nested) create an indexable field with its name as a XMLPath-like expression identifying the property, for example PhoneData.PhoneType

If is ok that all nested properties will be indexed then it's simple, just iterate over all of them generating this indexable field.

But if you don't want to index all of them (a more realistic case), how to know which property is indexable is another problem; in this case you could:

  • Accept from the client the path expressions of the index fields to be created when storing the document, or
  • Put JSON Schema into play to describe your data (assuming your JSON records have a common schema), and extend it with a custom property that would allow you to tag which properties are indexable.

I have created a library doing this (and much more) that maybe can help you.

You can check it at https://github.com/brutusin/flea-db

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.