0

I am pretty new to Elasticsearch, so please bear with me and let me know if I need to provide any additional information. I have inherited a project and need to implement new search functionality. The document/mapping structure is already in place but can be changed if it can not facilitate what I am trying to achieve. I am using Elasticsearch version 5.6.16.

A company is able to offer a number of services. Each service offering is grouped together in a set. Each set is composer of 3 categories;

  • Product(s) (ID 1)
  • Process(es) (ID 3)
  • Material(s) (ID 4)

The document structure looks like;

[{
  "id": 4485,
  "name": "Company A",
  // ...
  "services": {
    "595": {
      "1": [
        95, 97, 91
      ],
      "3": [
        475, 476, 471
      ],
      "4": [
        644, 645, 683
      ]
    },
    "596": {
      "1": [
        91, 89, 76
      ],
      "3": [
        476, 476, 301
      ],
      "4": [
        644, 647, 555
      ]
    },
    "597": {
      "1": [
        92, 93, 89
      ],
      "3": [
        473, 472, 576
      ],
      "4": [
        641, 645, 454
      ]
    },
  }
}]

In the above example; 595, 596 and 597 are IDs relating to the set. 1, 3 and 4 relate to the categories (mentioned above).

The mapping looks like;

[{
  "id": {
    "type": "long"
  },
  "name": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  },
  "services": {
    "properties": {
      // ...
      "595": {
        "properties": {
          "1": {"type": "long"},
          "3": {"type": "long"},
          "4": {"type": "long"}
        }
      },
      "596": {
        "properties": {
          "1": {"type": "long"},
          "3": {"type": "long"},
          "4": {"type": "long"}
        }
      },
      // ...
    }
  },
}]

When searching for a company that provides a Product (ID 1) - a search of 91 and 95 which would return Company A because those IDs are within the same set. But if I was to search 95 and 76, it would not return Company A - while the company does do both of these products, they are not in the same set. These same rules would apply when searching Processes and Materials or a combination of these.

I am looking for confirmation that the current document/mapping structure will facilitate this type of search.

  • If so, given 3 arrays of IDs (Products, Processes and Materials), what is the JSON to find all companies that provide these services within the same set?
  • If not, how should the document/mapping be changed to allow this search?

Thank you for your help.

1 Answer 1

1

It is a bad idea to have ID for what appears as a value as a field itself as that could lead to creation of so many inverted indexes, (remember that in Elasticsearch, inverted index is created on every field) and I feel it is not reasonable to have something like that.

Instead change your data model to something like below. I have also included sample documents, the possible queries you can apply and how the response can appear.

Note that just for sake of simplicity, I'm focussing only on the services field that you have mentioned in your mapping.

Mapping:

PUT my_services_index
{
  "mappings": {
    "properties": {
      "services":{
        "type": "nested",                   <----- Note this
        "properties": {
          "service_key":{
            "type": "keyword"               <----- Note that I have mentioned keyword here. Feel free to use text and keyword if you plan to implement partial + exact search.
          },
          "product_key": {
            "type": "keyword"
          },
          "product_values": {
            "type": "keyword"
          },
          "process_key":{
            "type": "keyword"
          },
          "process_values":{
            "type": "keyword"
          },
          "material_key":{
            "type": "keyword"
          },
          "material_values":{
            "type": "keyword"
          }
        }
      }
    }
  }
}

Notice that I've made use of nested datatype. I'd suggest you to go through that link to understand why do we need that instead of using plain object type.

Sample Document:

POST my_services_index/_doc/1
{
  "services":[
  {
    "service_key": "595",
    "process_key": "1",
    "process_values": ["95", "97", "91"],
    "product_key": "3",
    "product_values": ["475", "476", "471"],
    "material_key": "4",
    "material_values": ["644", "645", "643"]
  },
  {
    "service_key": "596",
    "process_key": "1",
    "process_values": ["91", "89", "75"],
    "product_key": "3",
    "product_values": ["476", "476", "301"],
    "material_key": "4",
    "material_values": ["644", "647", "555"]
  }
    ]
}

Notice how you can now manage your data, if it ends up having multiple combinations or product_key, process_key and material_key.

The way you interpret the above document is that, you have two nested documents inside a document of my_services_index.

Sample Query:

POST my_services_index/_search
{
  "_source": "services.service_key", 
  "query": {
    "bool": {
      "must": [
        {
          "nested": {                                      <---- Note this
            "path": "services",
            "query": {
              "bool": {
                "must": [
                  {
                    "term": {
                      "services.service_key": "595"
                    }
                  },
                  {
                    "term": {
                      "services.process_key": "1"
                    }
                  },
                  {
                    "term": {
                      "services.process_values": "95"
                    }
                  }
                ]
              }
            },
            "inner_hits": {}                              <---- Note this
          }
        }
      ]
    }
  }
}

Note that I've made use of Nested Query.

Response:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.828546,
    "hits" : [                              <---- Note this. Which would return the original document. 
      {
        "_index" : "my_services_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.828546,
        "_source" : {
          "services" : [
            {
              "service_key" : "595",
              "process_key" : "1",
              "process_values" : [
                "95",
                "97",
                "91"
              ],
              "product_key" : "3",
              "product_values" : [
                "475",
                "476",
                "471"
              ],
              "material_key" : "4",
              "material_values" : [
                "644",
                "645",
                "643"
              ]
            },
            {
              "service_key" : "596",
              "process_key" : "1",
              "process_values" : [
                "91",
                "89",
                "75"
              ],
              "product_key" : "3",
              "product_values" : [
                "476",
                "476",
                "301"
              ],
              "material_key" : "4",
              "material_values" : [
                "644",
                "647",
                "555"
              ]
            }
          ]
        },
        "inner_hits" : {                    <--- Note this, which would tell you which inner document has been a hit. 
          "services" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 1.828546,
              "hits" : [
                {
                  "_index" : "my_services_index",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_nested" : {
                    "field" : "services",
                    "offset" : 0
                  },
                  "_score" : 1.828546,
                  "_source" : {
                    "service_key" : "595",
                    "process_key" : "1",
                    "process_values" : [
                      "95",
                      "97",
                      "91"
                    ],
                    "product_key" : "3",
                    "product_values" : [
                      "475",
                      "476",
                      "471"
                    ],
                    "material_key" : "4",
                    "material_values" : [
                      "644",
                      "645",
                      "643"
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

Note that I've made use of keyword datatype. Please feel free to use the datatype as and what your business requirements would be for all the fields.

The idea I've provided is to help you understand the document model.

Hope this helps!

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much for taking the time to help. I suspected the structure was part of the issue. Being new to Elasticsearch and not knowing all the lingo made it difficult to find what I was looking for.
Make sure you note the possible use-cases you have and accordingly the query you have, based on that you can modify the above model. I'm glad I could be of help!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.