0

Assume I have the following two elements in my elasticsearch index:

{
    "name": "bob",
    "likes": ["computer", "cat", "water"]
},
{
    "name": "alice",
    "likes": ["gaming", "gambling"]
}

I would now like to query for elements, that like computer, laptop or cat. (which matches bob, note that it should be an exact string match)

As a result I need the matches, as well as the count of matches, so would like to get the following back (since it found computer and cat, but not laptop or water):

{
    "name": "bob",
    "likes": ["computer", "cat"],
    "likes_count": 2
}

Is there a way to achieve this with a single elasticsearch query? (note that I'm still stuck with ES2.4, but will hopefully soon be able to upgrade).

Ideally I would also like to sort the output by likes_count.

Thank you!

1 Answer 1

1

Best way would be to create likes as nested data type

Mapping

PUT index71
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "likes":{
        "type": "nested", 
        "properties": {
          "name":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

Query:

GET index71/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "likes",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "likes.name": [
                        "computer",
                        "cat",
                        "laptop"
                      ]
                    }
                  }
                ]
              }
            },
            "inner_hits": {}         ---> It will return matched elements in nested type
          }
        }
      ]
    }
  },
  "aggs": {
    "likes": {
      "nested": {
        "path": "likes"
      },
      "aggs": {
        "matcheLikes": {
          "filter": {
            "bool": {
              "must": [
                  {
                    "terms": {
                      "likes.name": [
                        "computer",
                        "cat",
                        "laptop"
                      ]
                    }
                  }
                ]
            }
          },
          "aggs": {
            "likeCount": {
              "value_count": {
                "field": "likes.name"
              }
            }
          }
        }
      }
    }
  }
}

Result:

   "hits" : [
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "u9qTo3ABH6obcmRRRhSA",
        "_score" : 1.0,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "cat"
            },
            {
              "name" : "water"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "computer"
                  }
                },
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 1
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "cat"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  },
  "aggregations" : {
    "likes" : {
      "doc_count" : 3,
      "matcheLikes" : {
        "doc_count" : 2,
        "likeCount" : {
          "value" : 2
        }
      }
    }
  }

If likes cannot be changed to nested type then scripts need to be used which will impact performance

Mapping

{
  "index72" : {
    "mappings" : {
      "properties" : {
        "likes" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Query:

{
  "script_fields": {  ---> It will iterate through likes and get matched values
    "matchedElements": {
      "script": "def matchedLikes=[];def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {matchedLikes.add(doc['likes.keyword'][i])}} return matchedLikes;"
    }
  },
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "likes": [
                  "computer",
                  "laptop",
                  "cat"
                ]
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "Name": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "Count": {
          "scripted_metric": {  --> get count of matched values
            "init_script": "state.matchedLikes=[]",
            "map_script": " def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {state.matchedLikes.add(doc['likes.keyword'][i]);}}",
            "combine_script": "int count = 0; for (int i=0;i<state.matchedLikes.length;i++) { count += 1 } return count;",
            "reduce_script": "int count = 0; for (a in states) { count += a } return count"
          }
        }
      }
    }
  }
}

Result:

  "hits" : [
      {
        "_index" : "index72",
        "_type" : "_doc",
        "_id" : "wtqso3ABH6obcmRR0hSV",
        "_score" : 0.0,
        "fields" : {
          "matchedElements" : [
            "cat",
            "computer"
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "Name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bob",
          "doc_count" : 1,
          "Count" : {
            "value" : 2
          }
        }
      ]
    }
  }

EDIT 1 To give higher score to more matches change terms query to should clause. Each term in should clause will contribute towards score

GET index71/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "likes",
            "query": {
              "bool": {
                "should": [
                  {
                    "term": {
                      "likes.name": "computer"
                    }
                  },
                  {
                    "term": {
                      "likes.name": "cat"
                    }
                  },
                  {
                    "term": {
                      "likes.name": "laptop"
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  },
  "aggs": {
    "likes": {
      "nested": {
        "path": "likes"
      },
      "aggs": {
        "matcheLikes": {
          "filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "likes.name": [
                      "computer",
                      "cat",
                      "laptop"
                    ]
                  }
                }
              ]
            }
          },
          "aggs": {
            "likeCount": {
              "value_count": {
                "field": "likes.name"
              }
            }
          }
        }
      }
    }
  }
}

Result

  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.5363467,
    "hits" : [
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "u9qTo3ABH6obcmRRRhSA",
        "_score" : 1.5363467,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "cat"
            },
            {
              "name" : "water"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.7917595,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 1
                  },
                  "_score" : 1.7917595,
                  "_source" : {
                    "name" : "cat"
                  }
                },
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.2809337,
                  "_source" : {
                    "name" : "computer"
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "pr-lqHABcSMy6UhGAWtW",
        "_score" : 1.2809337,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "gaming"
            },
            {
              "name" : "gambling"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 1.2809337,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "pr-lqHABcSMy6UhGAWtW",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.2809337,
                  "_source" : {
                    "name" : "computer"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  },
  "aggregations" : {
    "likes" : {
      "doc_count" : 6,
      "matcheLikes" : {
        "doc_count" : 3,
        "likeCount" : {
          "value" : 3
        }
      }
    }
  }
Sign up to request clarification or add additional context in comments.

2 Comments

Glad could be of help.
I just tried to understand what your aggregation does in the first case with the nested mapping. How is this sorting the returned values by the number of matches it found? (so objects that match 4 search keywords should get scored higher than objects that only match 2 search keywords)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.