3

I'm trying to create a query that returns information about how many documents that don't have data for two fields (date.new and date.old). I have tried the query below, but it works as OR-logic, where all documents missing either date.new or date.old are returned. Does anyone know how I can make this only return documents missing both fields?

{
   "aggs":{
      "Missing_field_count1":{
         "missing":{
            "field":"date.new"
         }
      },
      "Missing_field_count2":{
         "missing":{
            "field":"date.old"
         }
      }
   }
}

2 Answers 2

6

Aggregations is not the feature to use for this. You need to use the exists query wrapped within a bool/must_not query, like this:

GET index/_count
{
  "size": 0,
  "bool": {
    "must_not": [
      {
        "exists": {
          "field": "date.new"
        }
      },
      {
        "exists": {
          "field": "date.old"
        }
      }
    ]
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you for your fast reply. I think I might was unclear in my initial question, but I only want data about how many documents are missing both those fields, not to get the actual documents returned. I'm new to elasticsearch, so I'm a bit lost :/
See my updated answer, you simply need to hit the _count endpoint to get what you want.
Yes, this solved my question! Thank you for the good explanation
2

hits.total.value indicates the count of the documents that match the search request. The value indicates the number of hits that match and relation indicates whether the value is accurate (eq) or a lower bound (gte)

Index Data:

{
  "data": {
    "new": 1501,
    "old": 10
  }
}

{
  "title": "elasticsearch"
}

{
  "title": "elasticsearch-query"
}
{
  "date": {
    "new": 1400
  }
}

The search query given by @Val answers on how to achieve your use case.

Search Result:

"hits": {
    "total": {
      "value": 2,                <-- note this
      "relation": "eq"
    },
    "max_score": 0.0,
    "hits": [
      {
        "_index": "65112793",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.0,
        "_source": {
          "title": "elasticsearch"
        }
      },
      {
        "_index": "65112793",
        "_type": "_doc",
        "_id": "5",
        "_score": 0.0,
        "_source": {
          "title": "elasticsearch-query"
        }
      }
    ]
  }

1 Comment

Using the _count endpoint is better, especially in the case where there are more than 10000 hits

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.