Return matched values and the count of values that matched in elasticsearch query

Question

Assume I have the following two elements in my elasticsearch index:

{
    "name": "bob",
    "likes": ["computer", "cat", "water"]
},
{
    "name": "alice",
    "likes": ["gaming", "gambling"]
}

I would now like to query for elements, that like computer, laptop or cat. (which matches bob, note that it should be an exact string match)

As a result I need the matches, as well as the count of matches, so would like to get the following back (since it found computer and cat, but not laptop or water):

{
    "name": "bob",
    "likes": ["computer", "cat"],
    "likes_count": 2
}

Is there a way to achieve this with a single elasticsearch query? (note that I'm still stuck with ES2.4, but will hopefully soon be able to upgrade).

Ideally I would also like to sort the output by likes_count.

Thank you!

jaspreet chahal · Accepted Answer · 2020-03-05 03:19:29Z

Best way would be to create likes as nested data type

Mapping

PUT index71
{
  "mappings": {
    "properties": {
      "name":{
        "type": "text"
      },
      "likes":{
        "type": "nested", 
        "properties": {
          "name":{
            "type":"keyword"
          }
        }
      }
    }
  }
}

Query:

GET index71/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "likes",
            "query": {
              "bool": {
                "must": [
                  {
                    "terms": {
                      "likes.name": [
                        "computer",
                        "cat",
                        "laptop"
                      ]
                    }
                  }
                ]
              }
            },
            "inner_hits": {}         ---> It will return matched elements in nested type
          }
        }
      ]
    }
  },
  "aggs": {
    "likes": {
      "nested": {
        "path": "likes"
      },
      "aggs": {
        "matcheLikes": {
          "filter": {
            "bool": {
              "must": [
                  {
                    "terms": {
                      "likes.name": [
                        "computer",
                        "cat",
                        "laptop"
                      ]
                    }
                  }
                ]
            }
          },
          "aggs": {
            "likeCount": {
              "value_count": {
                "field": "likes.name"
              }
            }
          }
        }
      }
    }
  }
}

Result:

   "hits" : [
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "u9qTo3ABH6obcmRRRhSA",
        "_score" : 1.0,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "cat"
            },
            {
              "name" : "water"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "computer"
                  }
                },
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 1
                  },
                  "_score" : 1.0,
                  "_source" : {
                    "name" : "cat"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  },
  "aggregations" : {
    "likes" : {
      "doc_count" : 3,
      "matcheLikes" : {
        "doc_count" : 2,
        "likeCount" : {
          "value" : 2
        }
      }
    }
  }

If likes cannot be changed to nested type then scripts need to be used which will impact performance

Mapping

{
  "index72" : {
    "mappings" : {
      "properties" : {
        "likes" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Query:

{
  "script_fields": {  ---> It will iterate through likes and get matched values
    "matchedElements": {
      "script": "def matchedLikes=[];def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {matchedLikes.add(doc['likes.keyword'][i])}} return matchedLikes;"
    }
  },
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "terms": {
                "likes": [
                  "computer",
                  "laptop",
                  "cat"
                ]
              }
            }
          ]
        }
      }
    }
  },
  "aggs": {
    "Name": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "Count": {
          "scripted_metric": {  --> get count of matched values
            "init_script": "state.matchedLikes=[]",
            "map_script": " def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {state.matchedLikes.add(doc['likes.keyword'][i]);}}",
            "combine_script": "int count = 0; for (int i=0;i<state.matchedLikes.length;i++) { count += 1 } return count;",
            "reduce_script": "int count = 0; for (a in states) { count += a } return count"
          }
        }
      }
    }
  }
}

Result:

  "hits" : [
      {
        "_index" : "index72",
        "_type" : "_doc",
        "_id" : "wtqso3ABH6obcmRR0hSV",
        "_score" : 0.0,
        "fields" : {
          "matchedElements" : [
            "cat",
            "computer"
          ]
        }
      }
    ]
  },
  "aggregations" : {
    "Name" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "bob",
          "doc_count" : 1,
          "Count" : {
            "value" : 2
          }
        }
      ]
    }
  }

EDIT 1 To give higher score to more matches change terms query to should clause. Each term in should clause will contribute towards score

GET index71/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "likes",
            "query": {
              "bool": {
                "should": [
                  {
                    "term": {
                      "likes.name": "computer"
                    }
                  },
                  {
                    "term": {
                      "likes.name": "cat"
                    }
                  },
                  {
                    "term": {
                      "likes.name": "laptop"
                    }
                  }
                ]
              }
            },
            "inner_hits": {}
          }
        }
      ]
    }
  },
  "aggs": {
    "likes": {
      "nested": {
        "path": "likes"
      },
      "aggs": {
        "matcheLikes": {
          "filter": {
            "bool": {
              "must": [
                {
                  "terms": {
                    "likes.name": [
                      "computer",
                      "cat",
                      "laptop"
                    ]
                  }
                }
              ]
            }
          },
          "aggs": {
            "likeCount": {
              "value_count": {
                "field": "likes.name"
              }
            }
          }
        }
      }
    }
  }
}

Result

  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.5363467,
    "hits" : [
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "u9qTo3ABH6obcmRRRhSA",
        "_score" : 1.5363467,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "cat"
            },
            {
              "name" : "water"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : 1.7917595,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 1
                  },
                  "_score" : 1.7917595,
                  "_source" : {
                    "name" : "cat"
                  }
                },
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "u9qTo3ABH6obcmRRRhSA",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.2809337,
                  "_source" : {
                    "name" : "computer"
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index" : "index71",
        "_type" : "_doc",
        "_id" : "pr-lqHABcSMy6UhGAWtW",
        "_score" : 1.2809337,
        "_source" : {
          "name" : "bob",
          "likes" : [
            {
              "name" : "computer"
            },
            {
              "name" : "gaming"
            },
            {
              "name" : "gambling"
            }
          ]
        },
        "inner_hits" : {
          "likes" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : 1.2809337,
              "hits" : [
                {
                  "_index" : "index71",
                  "_type" : "_doc",
                  "_id" : "pr-lqHABcSMy6UhGAWtW",
                  "_nested" : {
                    "field" : "likes",
                    "offset" : 0
                  },
                  "_score" : 1.2809337,
                  "_source" : {
                    "name" : "computer"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  },
  "aggregations" : {
    "likes" : {
      "doc_count" : 6,
      "matcheLikes" : {
        "doc_count" : 3,
        "likeCount" : {
          "value" : 3
        }
      }
    }
  }

I just tried to understand what your aggregation does in the first case with the nested mapping. How is this sorting the returned values by the number of matches it found? (so objects that match 4 search keywords should get scored higher than objects that only match 2 search keywords)

Collectives™ on Stack Overflow

Return matched values and the count of values that matched in elasticsearch query

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related