Elasticsearch, how to return unique values of two fields

Question

I have an index with 20 different fields. I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique. In SQL it would look this way: select unique cat, sub from table A; I can do it for one field this way:

{
"size": 0,
"aggs" : {
    "unique_set" : {
        "terms" : { "field" : "cat" }
    }
}}

but how do I add another field to check uniqueness across two fields?

Thanks,

Kyle McClellan · Accepted Answer · 2020-10-08 15:46:22Z

4

SQL's SELECT DISTINCT [cat], [sub] can be imitated with a Composite Aggregation.

{
  "size": 0, 
  "aggs": {
    "cat_sub": {
      "composite": {
        "sources": [
          { "cat": { "terms": { "field": "cat" } } },
          { "sub": { "terms": { "field": "sub" } } }
        ]
      }
    }
  }
}

Returns...

"buckets" : [
  {
    "key" : {
      "cat" : "a",
      "sub" : "x"
    },
    "doc_count" : 1
  },
  {
    "key" : {
      "cat" : "a",
      "sub" : "y"
    },
    "doc_count" : 2
  },
  {
    "key" : {
      "cat" : "b",
      "sub" : "y"
    },
    "doc_count" : 3
  }
]

answered Oct 8, 2020 at 15:46

Kyle McClellan

1,04311 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lilith Wittmann · Accepted Answer · 2015-07-13 22:00:15Z

1

The only way to solve this are probably nested aggregations:

{
"size": 0,
    "aggs" : {
        "unique_set_1" : {

            "terms" : {
                     "field" : "cats"
            },
            "aggregations" : { 
                "unique_set_2": {
                    "terms": {"field": "sub"}
                }
            }
        }
    }

}

answered Jul 13, 2015 at 22:00

Lilith Wittmann

3832 silver badges11 bronze badges

1 Comment

epipko Over a year ago

Thank you for reply. I tried to run it the way you proposed, but I can't make sense of the data. How do I know what it the total count of unique sets?

Fuad Efendi · Accepted Answer · 2016-12-13 20:52:48Z

-3

Quote:

I need to be able to pull unique docs where combination of fields "cat" and "sub" are unique.

This is nonsense; your question is unclear. You can have 10s unique pairs {cat, sub}, and 100s unique triplets {cat, sub, field_3}, and 1000s unique documents Doc{cat, sub, field3, field4, ...}.

If you are interested in document counts per unique pair {"Category X", "Subcategory Y"} then you can use Cardinality aggregations. For two or more fields you will need to use scripting which will come with performance hit.

Example:

{
    "aggs" : {
        "multi_field_cardinality" : {
            "cardinality" : {
                "script": "doc['cats'].value + ' _my_custom_separator_ ' + doc['sub'].value"
            }
        }
    }
}

Alternate solution: use nested Terms terms aggregations.

edited Dec 13, 2016 at 20:52

answered Aug 9, 2016 at 16:01

Fuad Efendi

1451 silver badge9 bronze badges

3 Comments

blong Over a year ago

Any alternative if the Elastic cluster reports "scripts of type [inline], operation [aggs] and lang [groovy] are disabled" ?

Fuad Efendi Over a year ago

Alternative is to use nested terms aggregations. But again initial question does not make sense:

Fuad Efendi Over a year ago

"select unique cat, sub from table A; " does return unique pairs and DOES NOT unique documents containing unique pairs but user wants "to be able to pull unique docs where ..." - nonsense.

Collectives™ on Stack Overflow

Elasticsearch, how to return unique values of two fields

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related