Elasticsearch Deep aggregation doc count doesnt match

Question

I did several aggregations to SUM some values on our installation of ES 1.7.2.

Found the hard way that on some random situations, the doc_count of each aggregation, doesn't match with the SUM of doc_count of the nested level.

"key": 503,
"doc_count": 383778,
"regionid": {...}

So doc_count=383778

If I SUM doc_count of every element of the regionid of the list bellow, I have doc_count=383718

 "key": 503,
 "doc_count": 383778,
 "regionid": {
    "doc_count_error_upper_bound": 0,
    "sum_other_doc_count": 0,
    "buckets": [
       {
          "key": 1,
          "doc_count": 303821,
          "ProviderId": {...}
       },
       {
          "key": 27,
          "doc_count": 23834,
          "ProviderId": {...}
       },
       {
          "key": 25,
          "doc_count": 9565,
          "ProviderId": {...}
       },
       {
          "key": 36,
          "doc_count": 8857,
          "ProviderId": {...}
       },
       {
          "key": 14,
          "doc_count": 8222,
          "ProviderId": {...}
       },
       {
          "key": 68,
          "doc_count": 6746,
          "ProviderId": {...}
       },
       {
          "key": 19,
          "doc_count": 4574,
          "ProviderId": {...}
       },
       {
          "key": 28,
          "doc_count": 4164,
          "ProviderId": {...}
       },
       {
          "key": 10,
          "doc_count": 3006,
          "ProviderId": {...}
       },
       {
          "key": 31,
          "doc_count": 2020,
          "ProviderId": {...}
       },
       {
          "key": 21,
          "doc_count": 1410,
          "ProviderId": {...}
       },
       {
          "key": 32,
          "doc_count": 1368,
          "ProviderId": {...}
       },
       {
          "key": 22,
          "doc_count": 1367,
          "ProviderId": {...}
       },
       {
          "key": 8,
          "doc_count": 1010,
          "ProviderId": {...}
       },
       {
          "key": 16,
          "doc_count": 825,
          "ProviderId": {...}
       },
       {
          "key": 35,
          "doc_count": 559,
          "ProviderId": {...}
       },
       {
          "key": 34,
          "doc_count": 517,
          "ProviderId": {...}
       },
       {
          "key": 26,
          "doc_count": 414,
          "ProviderId": {...}
       },
       {
          "key": 18,
          "doc_count": 371,
          "ProviderId": {...}
       },
       {
          "key": 15,
          "doc_count": 362,
          "ProviderId": {...}
       },
       {
          "key": 33,
          "doc_count": 185,
          "ProviderId": {...}
       },
       {
          "key": 9,
          "doc_count": 143,
          "ProviderId": {...}
       },
       {
          "key": 29,
          "doc_count": 102,
          "ProviderId": {...}
       },
       {
          "key": 17,
          "doc_count": 100,
          "ProviderId": {...}
       },
       {
          "key": 30,
          "doc_count": 96,
          "ProviderId": {...}
       },
       {
          "key": 20,
          "doc_count": 80,
          "ProviderId": {...}
       }
    ]
 }
},

Do you guys know why is this happening?

Maybe a bug?

Part of my aggregation:

 {
    "aggs": {
       "Provider": {
          "terms": {
             "field": "Provider"
          },
          "aggs": {
             "Gateway": {
                "terms": {
                   "field": "Gateway"
                },
                "aggs": {
                   "CustomerId": {
                      "terms": {
                         "field": "CustomerId"
                      },
                      "aggs": {
                         "regionid": {
                            "terms": {
                               "field": "regionid"

Any help is appreciated. Thanks

Is it possible that 60 of your documents don't have a value for the provider field? — Val
– Val, Commented Feb 26, 2016 at 4:06
Actually this was the problem. A "long" field had an empty value. Thanks! — Danielzt
– Danielzt, Commented Feb 26, 2016 at 19:32

jhilden · Accepted Answer · 2016-02-25 23:41:18Z

3

Aggregations in ES are not exact, they are an estimate based on the number of records sampled. Given a big enough sample size, that number can be exact, but that has significant performance implications.

You can read more info on "Shard Size" in the ES documentation on shard_size for terms aggregation

The flatter your index (meaning the more buckets the aggregation returns) the more you need to increase the Shard Size. We found that for a flat index in our system a 20x multiplier was a good rule of thumb. So if I'm returning the top 10 records for an aggregation, we use a shard size of 200.

answered Feb 25, 2016 at 23:41

jhilden

12.5k5 gold badges56 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Danielzt Over a year ago

Awesome. I'll take a look.

Collectives™ on Stack Overflow

Elasticsearch Deep aggregation doc count doesnt match

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related