Elasticsearch: how to make an aggregation field not change the case of values

Question

I have the following mapping for an aggregation field:

"language" : {
    "type" : "string",
    "index": "analyzed",
    "analyzer" : "standard"
}

The value of a sample document in this property may look like: "en zh_CN"

This property has no other use except aggregation. I notice that when I get aggregation results on this property:

{
  "query": {
        "filtered" : {
            "query": { 
                    "match_all": {}
            },
            "filter" : {
                 ...
            }
        }
    },
    "aggregations": {
        "facets": {
            "terms": {
                "field": "language"
            }
        }
    }   
}

The bucket key values are in lower case.

  "aggregations" : {
    "facets" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "zh_cn",
        "doc_count" : 2
      }, {
        "key" : "en",
        "doc_count" : 1
      } ]
    }
  }

How can I achieve my aggregation goal without letting ES to lowers the case of its values. I feel that I may need to change the mapping for this property, but not sure how.

Thanks and regards.

Sloan Ahrens · Accepted Answer · 2015-03-13 19:38:31Z

8

Try this in your mapping instead:

"language" : {
    "type" : "string",
    "index": "not_analyzed"
}

The text in that field of each document will be used, unmodified, to create tokens, and those tokens will be returned by your terms aggregation. For the example value you provided, the aggregation will return it verbatim:

"aggregations": {
   "facets": {
      "buckets": [
         {
            "key": "en zh_CN",
            "doc_count": 1
         }
      ]
   }
}

If you still want the text to be tokenized on whitespace, you can try using the whitespace analyzer in your mapping:

"language": {
   "type": "string",
   "analyzer": "whitespace"
}

Then your aggregation will return:

"aggregations": {
   "facets": {
      "buckets": [
         {
            "key": "en",
            "doc_count": 1
         },
         {
            "key": "zh_CN",
            "doc_count": 1
         }
      ]
   }
}

Here is the code I used to test both examples:

http://sense.qbox.io/gist/a7b3c7d50c7012537c50d576d03940b28b5f8793

edited Mar 13, 2015 at 19:38

answered Mar 13, 2015 at 19:29

Sloan Ahrens

8,7382 gold badges32 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

curious1 Over a year ago

Sloan, thanks for your input! Your mapping is not going to work for me, because value "en zh_CN" actually has two elements "en" and "zh_CN" and they should be two bucket keys. With your mapping, I got bucket key such as "en zh_CN" in aggregation results.

Sloan Ahrens Over a year ago

Yeah, I just added another example to my answer that might work for you.

Collectives™ on Stack Overflow

Elasticsearch: how to make an aggregation field not change the case of values

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related