0

I would like to find the minimum value of a field in a nested array object after aggregation.

Data example:

[
  {
    "id": "i1",
    "version": 1,
    "entries": [
      {
        "name": "n1",
        "position": 1
      }, {
        "name": "n2",
        "position": 2
      }
    ]
  }, {
    "id": "i1"
    "version": 2,
    "entries": [
      {
        "name": "n2",
        "position": 3
      }, {
        "name": "n3",
        "position": 4
      }
    ]
  },
  {
    "id": "i2",
    "version": 1,
    "entries": [
      {
        "name": "n1",
        "position": 8
      }, {
        "name": "n2",
        "position": 7
      }
    ]
  }, {
    "id": "i2"
    "version": 2,
    "entries": [
      {
        "name": "n2",
        "position": 6
      }, {
        "name": "n3",
        "position": 5
      }
    ]
  }
]

Pseudo Query:

SELECT min(entries["n2"].position) WHERE entries.name="n2" GROUP BY id;

Expected Result:

[
  {
    "id": "i1",
    "min(position)": 2
  }, {
    "id": "i2",
    "min(position)": 6
  }
]

I can do this in code, but it's not performant, as I need to return the document sources which can be quite large.

I am thinking of denormalizing the data, but would like to first know if this request is not possible at all.

1 Answer 1

4

You can do it by nesting several aggregations like this:

terms agg -> nested agg -> filter agg -> min agg

To test it I set up an index:

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   },
   "mappings": {
      "doc": {
         "properties": {
            "entries": {
               "type": "nested",
               "properties": {
                  "name": {
                     "type": "string"
                  },
                  "position": {
                     "type": "long"
                  }
               }
            },
            "id": {
               "type": "string"
            },
            "version": {
               "type": "long"
            }
         }
      }
   }
}

And indexed your docs:

PUT /test_index/doc/_bulk
{"index":{"_id":1}}
{"id":"i1","version":1,"entries":[{"name":"n1","position":1},{"name":"n2","position":2}]}
{"index":{"_id":2}}
{"id":"i1","version":2,"entries":[{"name":"n2","position":3},{"name":"n3","position":4}]}
{"index":{"_id":3}}
{"id":"i2","version":1,"entries":[{"name":"n1","position":8},{"name":"n2","position":7}]}
{"index":{"_id":4}}
{"id":"i2","version":2,"entries":[{"name":"n2","position":6},{"name":"n3","position":5}]}

Here is the query:

POST /test_index/_search?search_type=count
{
   "aggs": {
      "id_terms": {
         "terms": {
            "field": "id"
         },
         "aggs": {
            "nested_entries": {
               "nested": {
                  "path": "entries"
               },
               "aggs": {
                  "filter_name": {
                     "filter": {
                        "term": {
                           "entries.name": "n2"
                        }
                     },
                     "aggs": {
                        "min_position": {
                           "min": {
                              "field": "position"
                           }
                        }
                     }
                  }
               }
            }
         }
      }
   }
}

and the result:

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "id_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "i1",
               "doc_count": 2,
               "nested_entries": {
                  "doc_count": 4,
                  "filter_name": {
                     "doc_count": 2,
                     "min_position": {
                        "value": 2,
                        "value_as_string": "2.0"
                     }
                  }
               }
            },
            {
               "key": "i2",
               "doc_count": 2,
               "nested_entries": {
                  "doc_count": 4,
                  "filter_name": {
                     "doc_count": 2,
                     "min_position": {
                        "value": 6,
                        "value_as_string": "6.0"
                     }
                  }
               }
            }
         ]
      }
   }
}

Here is the code I used all together:

http://sense.qbox.io/gist/34a013099ef07fb527d9d7cf8490ad1bbafa718b

Sign up to request clarification or add additional context in comments.

1 Comment

the link to the gist is dead

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.