2

I would like a query which it returns the number of times a field is repeated, according to the unique value of another field I have this json:

          "name" : james,
          "city" : "chicago" <----------- same
        },
        {
          "name" : james,
          "city" : "san francisco"
        },
        {
          "name" : james,
          "city" : "chicago" <-----------same
        },
         {
          "name" : Mike,
          "city" : "chicago"
        },
         {
          "name" : Mike,
          "city" : "texas"<-----------same
        },
         {
          "name" : Mike,
          "city" : "texas"<-----------same
        },
         {
          "name" : Peter,
          "city" : "chicago"
        },

I want to make a query where I count based on the unique value of two fields. For example, james is equal to 2, because there are two equal fields (name: james, city, chicago) and a different field (name: james, city: san francisco) The output would then be the following:

  {
    "key" : "james",
    "doc_count" : 2
  },
  {
    "key" : "Mike",
    "doc_count" : 2
  },
  {
    "key" : "Peter",
    "doc_count" : 1
  },

It is possible to do a single value count of two fields?

0

2 Answers 2

5

You can do a two level terms aggregation:

{
  "size": 0,
  "aggs": {
    "names": {
      "terms": {
        "field": "name.keyword",
        "size": 10
      },
      "aggs": {
        "citys_by_name": {
          "terms": {
            "field": "city.keyword",
            "size": 10,
            "min_doc_count": 2
          }
        }
      }
    }
  }
}

The response will looks like this:

"aggregations" : {
    "names" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "james",
          "doc_count" : 15,
          "citys_by_name" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "chicago",
                "doc_count" : 14
              }
            ]
          }
        },
        {
          "key" : "Peter",
          "doc_count" : 2,
          "citys_by_name" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
              {
                "key" : "chicago",
                "doc_count" : 2
              }
            ]
          }
        },
        {
          "key" : "mike",
          "doc_count" : 2,
          "citys_by_name" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [ ]
          }
        }
      ]
    }
  }

Or you can concatenate fields:

GET test/_search
{
  "size": 0,
  "aggs": {
    "names": {
      "terms": {
        "script": {
          "source": "return doc['name.keyword'].value + ' ' + doc['city.keyword'].value",
          "lang": "painless"
        },
        "field": "name.keyword",
        "size": 10,
        "min_doc_count": 2
      }
    }
  }
}

The response will looks lie this:

 "aggregations" : {
    "names" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "james chicago",
          "doc_count" : 14
        },
        {
          "key" : "Peter chicago",
          "doc_count" : 2
        }
      ]
    }
  }

If you want more stats on buckets, use the stats_buckets aggregation:

{
  "size": 0,
  "aggs": {
    "names": {
      "terms": {
        "script": {
          "source": "return doc['name.keyword'].value + ' ' + doc['city.keyword'].value",
          "lang": "painless"
        },
        "field": "name.keyword",
        "size": 10,
        "min_doc_count": 2
      }
    },
   "names_stats":{
      "stats_bucket": {
        "buckets_path":"names._count"
      }
    }
    }
  }

Will result:

"aggregations" : {
    "names" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "james PARIS",
          "doc_count" : 15
        },
        {
          "key" : "james chicago",
          "doc_count" : 13
        },
        {
          "key" : "samuel PARIS",
          "doc_count" : 11
        },
        {
          "key" : "fred PARIS",
          "doc_count" : 2
        }
      ]
    },
    "names_stats" : {
      "count" : 4,
      "min" : 2.0,
      "max" : 15.0,
      "avg" : 10.25,
      "sum" : 41.0
    }
  }
Sign up to request clarification or add additional context in comments.

3 Comments

It would be possible to get number of keys of the in a buckets? For example, in the last response with key: james chicago, key: Peter chicago; number of key in buckets == 2.
Use Stats_buckets. I'll update my answer. But be careful with performance of this query
did my answer helps you? if so, please don't forget upvote
0

This was the solution that solved the problem for me

GET test/_search?filter_path=aggregations.count
{
  "size": 0,
  "aggs": {
    "names": {
      "terms": {
        "script": {
          "source": "return doc['name.keyword'].value + ' ' + doc['city.keyword'].value",
          "lang": "painless"
        },
        "field": "name.keyword",
        "size": 10,
        "min_doc_count": 2
      }
    },
    "count":{
      "cardinality": {"script": "return doc['name.keyword'].value + ' ' + doc['city.keyword'].value"
      }
    }
  }
}

Output:

{
  "aggregations" : {
      "count" : {
        "value" : 2
    }
  }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.