0

My Query with multiple multi_matches looks like follows:

"query": {
   "bool": {
     "should" : [
       {"multi_match" : {
         "query": "test",
         "fields":     ["field1^15", "field2^8"],
         "tie_breaker": 0.2,
         "minimum_should_match": "50%"
       }},
       {"multi_match" : {
          "query": "test2",
          "fields":     ["field1^15", "field2^8"],
          "tie_breaker": 0.2,
          "minimum_should_match": "50%"
         }
        }
      ]
     }
    }

I want to get all distinct field1 values which match the query. How can I realize that?

EDIT: Mapping:

"field1": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          },
          "analyzer": "nGram_analyzer"
        }

This is what I tried so far (I still get multiple identical field1 values):

"query": {
   "bool": {
     "should" : [
       {"multi_match" : {
         "query": "test",
         "fields":     ["field1^15", "field2^8"],
         "tie_breaker": 0.2,
         "minimum_should_match": "50%"
       }},
       {"multi_match" : {
          "query": "test2",
          "fields":     ["field1^15", "field2^8"],
          "tie_breaker": 0.2,
          "minimum_should_match": "50%"
         }
        }
      ]
     }
    },
"aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 100 //1
      }
    }
  }

UPDATE:

The query

    GET /test/test/_search
{
  "_source": ["field1"],
  "size": 10000,
  "query": {
                    "multi_match" : {
                      "query":      "test",
                      "fields":     ["field1^15", "field2^8"],
                      "tie_breaker": 0.2,
                      "minimum_should_match": "50%"
                    }
                },
  "aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 1
      }
    }
  }
}

results in

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 10,
    "successful": 10,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 35,
    "max_score": 110.26815,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz99c4X4ZbfhscNES7",
        "_score": 110.26815,
        "_source": {
          "field1": "test-hier"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz8JWGX4ZbfhscMwe_",
        "_score": 107.45808,
        "_source": {
          "field1": "test-hier"
        }
      },
      {
        "_index": "test",
        "_type": "test",
        "_id": "AVzz8JWGX4ZbfhscMwe_",
        "_score": 107.45808,
        "_source": {
          "field1": "test-da"
        }
      },
      ...

So actually there should only be one "test-hier".

2
  • You can simply add a terms aggregation on the field1 field. Can you show your mapping? Commented Jul 24, 2018 at 12:22
  • I edited my post with the mapping. It's the same for the other fields. Commented Jul 24, 2018 at 12:46

1 Answer 1

1

You can add a terms aggregation on the field1.keyword field and you'll get all distinct values (you can change size to any other value that better matches the cardinality of your field):

{
  "size": 0,
  "query": {...},
  "aggs": {
    "field1": {
      "terms": {
        "field": "field1.keyword",
        "size": 100
      },
      "aggs": {
        "single_hit": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

5 Comments

Thats what I tried but unfortunately I get multiple results with the same field1 value.
E.g.: {field1: "foo", field2: "bar"} and {field1: "foo", field2: "barbar"}. Of course, these are different field combinations but I'm only interested in field1. Would Elastic easily pick the first result and drop the others?
Sure! Sorry for the delay and thanks for you help. I updated the post.
Any idea? Or anyone else? I still couldn't find a solution for this.
I've updated my answer by adding a top_hits sub-aggregation with size 1. That's probably what you want. You need to look in the aggregations section, not in the hits section.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.