1

I'm trying to aggregate over dynamically mapped fields in ElasticSearch.

For example:

POST test/_doc/1
{
    "settings": {
        "range": {
            "value": 200,
            "display": "200 km"
        },
        "transmitter": {
            "value": 1.2,
            "display": "1.2 Ghz"
        }
    }
}

The properties under settings are dynamic. Essentially I need a query like this:

{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "settings": {
            "terms": {
                "field": "settings.*.display"
            }
        }
    }
}

Since * doesn't work here, I'm wondering if there's a way to return the fields from a painless script and then maybe use a pipeline aggregation? I can't find the painless equivalent to Object.keys(settings) in JavaScript.

I've seen an approach with nested objects, but I'd like to avoid that, as there might be many 'settings' properties and the default limit is 50, compared to nested_objects with 10000 properties.

1 Answer 1

2

The painless equivalent of Object.keys() is .keySet(). You can implement the following iterative logic in a scripted metric agg:

GET test/_search
{
  "size": 0,
  "aggs": {
    "dynamic_fields_agg": {
      "scripted_metric": {
        "init_script": "state.map = [:];",
        "map_script": """
          def source = params._source['settings'];
            for (def key : source.keySet()) {
              if (source[key].containsKey("display")) {
                 if (state.map.containsKey(key)) { 
                  state.map[key].add(source[key].display);
                 } else {
                   state.map[key] = [source[key].display];
                 }
              }
            }
        """,
        "combine_script": "return state",
        "reduce_script": "return states"
      }
    }
  }
}

which will yield something like

{
  "aggregations":{
    "dynamic_fields_agg":{
      "value":[
        {
          "map":{
            "range":[
              "200 km"
            ],
            "transmitter":[
              "1.2 Ghz"
            ]
          }
        }
      ]
    }
  }
}

Now you can post-process the values in the reduce/combine scripts however you like.


Using nested fields would not bring you much advantage here -- wildcard paths are not allowed there either. I asked that myself some time ago.


UPDATE -- the inline version:

GET /test/_search
{  "size": 0,  "aggs": {    "dynamic_fields_agg": {      "scripted_metric": {        "init_script": "state.map = [:];",        "map_script": "          def source = params._source[\"settings\"];\n            for (def key : source.keySet()) {\n              if (source[key].containsKey(\"display\")) {\n                 if (state.map.containsKey(key)) { \n                  state.map[key].add(source[key].display);\n                 } else {\n                   state.map[key] = [source[key].display];\n                 }\n              }\n            }",        "combine_script": "return state",        "reduce_script": "return states"      }    }  }}
Sign up to request clarification or add additional context in comments.

3 Comments

I can't get the multiline script to work, but it works inline, thank you! This is pretty impressive, I'll look into the reduce function to remove dupes, then the result is exactly what I need.
Cool! I've added the inline version to my answer.
Seems by using combine_script: 'return state.map', the output is reduced by one level. Also, my approach with reduce was wrong, instead I'm not adding the value in the first place, with another condition if (!state.map[key].values.contains(source[key].displayValue)) {. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.