1

I want to group values by field(account id in my case) using term aggregation and return only fields where doc_count is less than some value.

I can specify min_doc_count parameter, but there is no max_doc_count. So I'm looking for a way to simulate this behavior. One of my many tries is this, but it doesn't work.

{
  "size": 0,
  "aggs": {
    "by_account": {
      "terms": {
        "field": "accountId"
      },
      "aggs": {
        "by_account_filtered": {
          "bucket_selector": {
            "buckets_path": {
              "totalDocs": "_count"
            },
            "script": "params.totalDocs < 10000"
          }
        }
      }
    }
  }
}

What am I doing wrong?

1 Answer 1

2

The bucket_selector aggregation need to be nested ( since its a parent-type aggregation ) and sibling of a metric aggregation that it will use to filter buckets.

So we use a top level terms aggregation, then use a nested value_count aggregation to expose the bucket doc_count to the sibling selector_bucket aggregation

try this :

{
  "size": 0,
  "aggs": {
    "by_account": {
      "terms": {
        "field": "accountId"
      },
      "aggs": {
        "by_account_number": {
          "value_count" : {
            "field" : "accountId"
          }
        },
        "by_account_filtered": {
          "bucket_selector": {
            "buckets_path": {
              "totalDocs": "by_account_number"
            },
            "script": "params.totalDocs < 10000"
          }
        }
      }
    }

  }
}

EDIT : If you want to get the lowest account doc_count

{
      "size": 0,
      "aggs": {
        "by_account": {
          "terms": {
            "field": "accountId",
            "order" : { "_count" : "asc" },
            "size": 100
          },
          "aggs": {
            "by_account_number": {
              "value_count" : {
                "field" : "accountId"
              }
            },
            "by_account_filtered": {
              "bucket_selector": {
                "buckets_path": {
                  "totalDocs": "by_account_number"
                },
                "script": "params.totalDocs < 10000"
              }
            }
          }
        }

      }
    }
Sign up to request clarification or add additional context in comments.

5 Comments

If I paste it as is, I get an error Found two aggregation type definitions in [by_account]: [terms] and [by_account_filtered]. If I add nested aggs and move by_account_filtered inside, I get No aggregation found for path [by_account._count].
@ArtemMalinko sorry for the untested answer. This time it should be good, while a didnt have a script activated environnement to finish the testing.
It filters selected buckets, thank you very much for help. But it turned out that it's not exactly what I need. Probably I didn't formulate my question clear enough. This code filters top buckets returned by term query, while I want to filter all values stored in elastic. So I get nothing if all top results > 10000. I can increase the size of terms aggregation, but it doesn't scale well. Maybe you also know how to do this kind of aggregation? Like min_doc_count, but vice versa.
it will be hard to get all the value and beside elasticsearch is not a regular database it make approximation on bucket count and other metrics.. but you could try reverse sort on doc_count with a fixed agg size to get the X lowest doc_count ( elastic.co/guide/en/elasticsearch/reference/current/… )
This answer helped me solve a hard problem. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.