6

I have an index with 20 different fields. I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique. In SQL it would look this way: select unique cat, sub from table A; I can do it for one field this way:

{
"size": 0,
"aggs" : {
    "unique_set" : {
        "terms" : { "field" : "cat" }
    }
}}

but how do I add another field to check uniqueness across two fields?

Thanks,

3 Answers 3

4

SQL's SELECT DISTINCT [cat], [sub] can be imitated with a Composite Aggregation.

{
  "size": 0, 
  "aggs": {
    "cat_sub": {
      "composite": {
        "sources": [
          { "cat": { "terms": { "field": "cat" } } },
          { "sub": { "terms": { "field": "sub" } } }
        ]
      }
    }
  }
}

Returns...

"buckets" : [
  {
    "key" : {
      "cat" : "a",
      "sub" : "x"
    },
    "doc_count" : 1
  },
  {
    "key" : {
      "cat" : "a",
      "sub" : "y"
    },
    "doc_count" : 2
  },
  {
    "key" : {
      "cat" : "b",
      "sub" : "y"
    },
    "doc_count" : 3
  }
]
Sign up to request clarification or add additional context in comments.

Comments

1

The only way to solve this are probably nested aggregations:

{
"size": 0,
    "aggs" : {
        "unique_set_1" : {

            "terms" : {
                     "field" : "cats"
            },
            "aggregations" : { 
                "unique_set_2": {
                    "terms": {"field": "sub"}
                }
            }
        }
    }

}

1 Comment

Thank you for reply. I tried to run it the way you proposed, but I can't make sense of the data. How do I know what it the total count of unique sets?
-3

Quote:

I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique.

This is nonsense; your question is unclear. You can have 10s unique pairs {cat, sub}, and 100s unique triplets {cat, sub, field_3}, and 1000s unique documents Doc{cat, sub, field3, field4, ...}.

If you are interested in document counts per unique pair {"Category X", "Subcategory Y"} then you can use Cardinality aggregations. For two or more fields you will need to use scripting which will come with performance hit.

Example:

{
    "aggs" : {
        "multi_field_cardinality" : {
            "cardinality" : {
                "script": "doc['cats'].value + ' _my_custom_separator_ ' + doc['sub'].value"
            }
        }
    }
}

Alternate solution: use nested Terms terms aggregations.

3 Comments

Any alternative if the Elastic cluster reports "scripts of type [inline], operation [aggs] and lang [groovy] are disabled" ?
Alternative is to use nested terms aggregations. But again initial question does not make sense:
"select unique cat, sub from table A; " does return unique pairs and DOES NOT unique documents containing unique pairs but user wants "to be able to pull unique docs where ..." - nonsense.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.