1

I'm trying to count documents with unique nested field value (and next, the documents itself also). Looks like getting the unique documents works. But when I'm trying to execute a request for count, I'm getting an error as follows:

Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/package/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216}],"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216},"status":400}

The code:

        BoolQueryBuilder innerTemplNestedBuilder = QueryBuilders.boolQuery();
        NestedQueryBuilder templatesNestedQuery = QueryBuilders.nestedQuery("attachment", innerTemplNestedBuilder, ScoreMode.None);
        BoolQueryBuilder mainQueryBuilder = QueryBuilders.boolQuery().must(templatesNestedQuery);
        if (!isEmpty(templateName)) {
            innerTemplNestedBuilder.filter(QueryBuilders.termQuery("attachment.name", templateName));
        }
        SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource()
                    .collapse(new CollapseBuilder("attachment.uuid"))
                    .query(mainQueryBuilder);
    // NEXT LINE CAUSE ERROR
        long count = client.count(new CountRequest("package").source(searchSourceBuilder), RequestOptions.DEFAULT).getCount(); <<<<<<<<<< ERROR HERE
        // THIS WORKS 
        SearchResponse searchResponse = client.search(
                    new SearchRequest(
                            new String[] {"package"},
                            searchSourceBuilder.timeout(new TimeValue(20, TimeUnit.SECONDS)).from(offset).size(limit)
                    ).indices("package").searchType(SearchType.DFS_QUERY_THEN_FETCH),
                    RequestOptions.DEFAULT
        );
        return ....;

The overall intention of approach is to get a portion of documents and the number of all such documents. May be there is another approach for this need already exists. If I'm trying to get count using aggregations and cardinality - I'm getting the zero result and it looks like it doesn't work on the nested fields.

Count request:

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "query": {
                            "bool": {
                                "adjust_pure_negative": true,
                                "boost": 1.0
                            }
                        },
                        "path": "attachment",
                        "ignore_unmapped": false,
                        "score_mode": "none",
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "collapse": {
        "field": "attachment.uuid"
    }
}

How mapping created:

curl -X DELETE "localhost:9200/package?pretty"
curl -X PUT    "localhost:9200/package?include_type_name=true&pretty" -H 'Content-Type: application/json' -d '{
    "settings" :  {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    }}'
curl -X PUT    "localhost:9200/package/_mappings?pretty" -H 'Content-Type: application/json' -d'
{
      "dynamic": false,
      "properties" : {
        "attachment": {
            "type": "nested",
            "properties": {
                "uuid" : { "type" : "keyword" },
                "name" : { "type" : "text" }
            }
        },
        "uuid" : {
          "type" : "keyword"
        }
      }
}
'

result query generated by code should be something like this:

curl -X POST "localhost:9200/package/_count?&pretty" -H 'Content-Type: application/json' -d' { "query" :
    {
        "bool": {
            "must": [
                {
                    "nested": {
                        "query": {
                            "bool": {
                                "adjust_pure_negative": true,
                                "boost": 1.0
                            }
                        },
                        "path": "attachment",
                        "ignore_unmapped": false,
                        "score_mode": "none",
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "collapse": {
        "field": "attachment.uuid"
    }
}'
4
  • 2
    Can you dump the query that your java(?) query builder actually produces? A sample of your documents plus your mapping would be useful too. You won't be able to pase them in the comments so just edit your question please. Commented Mar 6, 2020 at 16:49
  • @jzzfs edited - added count request and index mapping Commented Mar 10, 2020 at 9:36
  • @jzzfs updated also error message for more precise Commented Mar 10, 2020 at 9:50
  • If i'm trying to get total number using '_search' - still getting un-collapsed value as 'total'. Commented Mar 10, 2020 at 11:21

1 Answer 1

1

Collapsing can only be used in the _search context, not in _count.

Secondly, what does your query even do? You've got a lot of redundant parameters there like boost:1 etc. You might as well say:

POST /package/_count?&pretty
{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "attachment",
            "query": {
              "match_all": {}
            }
          }
        }
      ]
    }
  }
}

which does not really do anything :)


To answer your original question of "counting documents with unique nested field value",

let's imagine 3 documents, 2 of which have the same attachment.uuid value:

[
  {
    "attachment":{
      "uuid":"04144e14-62c3-11ea-bc55-0242ac130003"
    }
  },
  {
    "attachment":{
      "uuid":"04144e14-62c3-11ea-bc55-0242ac130003"
    }
  },
  {
    "attachment":{
      "uuid":"100b9632-62c3-11ea-bc55-0242ac130003"
    }
  }
]

To get the terms breakdown of the uuids, run

GET package/_search
{
  "size": 0,
  "aggs": {
    "nested_uniques": {
      "nested": {
        "path": "attachment"
      },
      "aggs": {
        "subagg": {
          "terms": {
            "field": "attachment.uuid"
          }
        }
      }
    }
  }
}

which yields

...
{
  "aggregations":{
    "nested_uniques":{
      "doc_count":3,
      "subagg":{
        "doc_count_error_upper_bound":0,
        "sum_other_doc_count":0,
        "buckets":[
          {
            "key":"04144e14-62c3-11ea-bc55-0242ac130003",
            "doc_count":2
          },
          {
            "key":"100b9632-62c3-11ea-bc55-0242ac130003",
            "doc_count":1
          }
        ]
      }
    }
  }
}

To get the the parent doc count of unique nested fields, we're gonna have to get slightly more clever:

GET package/_search
{
  "size": 0,
  "aggs": {
    "nested_uniques": {
      "nested": {
        "path": "attachment"
      },
      "aggs": {
        "scripted_uniques": {
          "scripted_metric": {
            "init_script": "state.my_map = [:];",
            "map_script": """
              if (doc.containsKey('attachment.uuid')) {
                state.my_map[doc['attachment.uuid'].value.toString()] = 1;
              }
            """,
            "combine_script": """
              def sum = 0;
              for (c in state.my_map.entrySet()) {
                sum += 1
              }
              return sum
            """,
            "reduce_script": """
              def sum = 0;
              for (agg in states) {
                sum += agg;
              }
              return sum;
            """
          }
        }
      }
    }
  }
}

which returns

...
{
  "aggregations":{
    "nested_uniques":{
      "doc_count":3,
      "scripted_uniques":{
        "value":2
      }
    }
  }
}

and this scripted_uniques: 2 is exactly what you're after.


Note: I solved this use case using nested scripted metric aggs but if any of you know of a cleaner approach, I'm more than happy to learn it!

Sign up to request clarification or add additional context in comments.

2 Comments

thanks it works. Not sure how it will work under load on lots of documents, but it's the only working approach so far...
No prob. I have it working with ~500k heavily nested docs. It takes a few seconds but never fails/times out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.