Count API: count query field A with distinct field B value

Question

For instance, given this result for a search, reduced to a size of 3 hits for brevity:

{
  "hits": {
    "total": {
      "value": 51812937,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "desc-imunizacao",
        "_type": "_doc",
        "_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0",
        "_score": 1.0,
        "_source": {
          "vacina_descricao_dose": "    2ª Dose",
          "estabelecimento_uf": "BA",
          "document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
        }
      },
      {
        "_index": "desc-imunizacao",
        "_type": "_doc",
        "_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0",
        "_score": 1.0,
        "_source": {
          "vacina_descricao_dose": "    1ª Dose",
          "estabelecimento_uf": "SE",
          "document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
        }
      },
      {
        "_index": "desc-imunizacao",
        "_type": "_doc",
        "_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0",
        "_score": 1.0,
        "_source": {
          "vacina_descricao_dose": "    1ª Dose",
          "estabelecimento_uf": "SE",
          "document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
        }
      }
    ]
  }
}

If I wanted to query for "estabelecimento_uf": "SE" and keep only one result for duplicates of "document_id", I would issue:

{
  "_source": ["document_id", "estabelecimento_uf", "vacina_descricao_dose"],
  "query": {
    "match": {
      "estabelecimento_uf": {
        "query": "SE"
      }
    }
  },
    "collapse": {
    "field": "document_id",
    "inner_hits": {
    "name": "latest",
      "size": 1
    }
  }
}

Is there a way to achieve this with Elasticsearch's Count API? Meaning: count query for field A (estabelecimento_uf) and count for unique values of field B (document_id), knowing that document_id has duplicates of course.

This is a public API: https://imunizacao-es.saude.gov.br/_search

This is the authentication:

User: imunizacao_public Pass: qlto5t&7r_@+#Tlstigi

Yes, maybe something like Cardinality aggregation, I'll try it. I was expecting to use Count API for this... — Rick Stanley
– Rick Stanley, Commented May 14, 2021 at 4:16

Bhavya · Accepted Answer · 2021-05-15 02:08:26Z

1

You can use a combination of filter aggregation along with cardinality aggregation, to get a count of unique document id based on a filter

{
  "size": 0,
  "aggs": {
    "filter_agg": {
      "filter": {
        "term": {
          "estabelecimento_uf.keyword": "SE"
        }
      },
      "aggs": {
        "count_docid": {
          "cardinality": {
            "field": "document_id.keyword"
          }
        }
      }
    }
  }
}

As far as I know, you cannot get the count of distinct field values using count API, you can either use field collapsing feature (as done in the question) OR use cardinality aggregation

Adding a working example with index data, search query and search result

{
  "vacina_descricao_dose": "    2ª Dose",
  "estabelecimento_uf": "BA",
  "document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
}
{
  "vacina_descricao_dose": "    1ª Dose",
  "estabelecimento_uf": "SE",
  "document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
}
{
  "vacina_descricao_dose": "    1ª Dose",
  "estabelecimento_uf": "SE",
  "document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
{
  "vacina_descricao_dose": "    1ª Dose",
  "estabelecimento_uf": "SE",
  "document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}

Search Query 1:

{
  "size": 0,
  "query": {
    "match": {
      "estabelecimento_uf": "SE"
    }
  },
  "aggs": {
    "count_doc_id": {
      "cardinality": {
        "field": "document_id.keyword"
      }
    }
  }
}

Search Result:

"aggregations": {
    "count_doc_id": {
      "value": 2            // note this
    }
  }

Search Query 2:

{
  "size": 0,
  "aggs": {
    "filter_agg": {
      "filter": {
        "term": {
          "estabelecimento_uf.keyword": "SE"
        }
      },
      "aggs": {
        "count_docid": {
          "cardinality": {
            "field": "document_id.keyword"
          }
        }
      }
    }
  }
}

Search Result:

"aggregations": {
    "filter_agg": {
      "doc_count": 3,
      "count_docid": {
        "value": 2         // note this
      }
    }
  }

edited May 15, 2021 at 2:08

answered May 14, 2021 at 6:06

Bhavya

16.2k3 gold badges23 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Rick Stanley Over a year ago

This is what end up doing it, minus the first level aggs, using instead "query match" for estabelecimento_id. Figured out it wasn't possible with Count API. I forgot to include my answer – thanks.

Rick Stanley Over a year ago

I compared my query's result with yours, and when doing a query with (this minified version): {"size":0,"query":{"match":{"estabelecimento_uf":"SE"}},"aggs":{"count_doc_id":{"cardinality":{"field":"document_id"}}}}, I get document count of value 612571, against 613904 from your query. Care to help me understand the difference? Useful: JSON Beautify tool. Note: there's no .keyword here.

Rick Stanley Over a year ago

Actually, I think this isn't one of the answers (the answer you have provided). If I remove the cardinality aggregation, it returns the same result. And I'm certain that there are duplicates fordocument_id for a number of states (estabelecimento_uf), so I think my query works as expected, I'll post it shortly, and hopefully someone will point out the differences or mistakes.

Bhavya Over a year ago

@RickStanley if you look at this part of documentation --> elastic.co/guide/en/elasticsearch/reference/current/…, you will find that the query will work in the same way as your minified version

Rick Stanley Over a year ago

I think it's safe and fair to say that this is the right answer, regardless of my data. But this .keyword property left me wondering about it's usage. I'm no Elasticsearch expert - just getting started.

|

Collectives™ on Stack Overflow

Count API: count query field A with distinct field B value

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related