0

Hello we faced a problem with lowercase normalizer during aggregation query. We have an initial mapping like

"mappings": {
    "properties": {
      "keyword_value": {
        "type":  "keyword",'
        "normalizer": "lowercase_normalizer"
      }
    }
  }

During an aggregation query it will return aggregation function result like sum, count, etc. and the keyword_value as key in the lower case.

The issue is that we want to retrieve a keyword_value value in its original case.

If we make a basic search then we can retrieve data from keyword_value field in its original case.

We have a couple approaches in mind like making additional query to retrieve original values(could affect our performance). Also another approach is to update mapping with a new field without normalizer and update new fields value with additional query(not a suitable approach for us since we don't want to reindex the data).

So could you please suggest me the best approach how we can retrieve the keyword_value in its original case? Maybe we can somehow ignore lowercase normalizer during query? Why aggregation returns key in lower case but basic query returns in original?

2
  • how exactly is this java related? Seeing as you have: "normalizer": "lowercase_normalizer", are you really surprised you get lowercases? Maybe it's worth looking at other elasticsearch questions about the same topic, like: stackoverflow.com/questions/51664234/… Commented Apr 2 at 6:38
  • Removed java tag thanks for the pointing. Yes, I have little experience with elastic and currently working with exisiting code. Thanks for sharing the topic, but as stated in the question we're already having similar approach in mind, but we're want to avoid updating existing index mapping Commented Apr 2 at 7:01

1 Answer 1

0

update mapping with a new field without normalizer

This is the most efficient way for your use case because of the followings.

  1. easy to implement

  2. don't need to reindex

    1. The new data will have both keyword_value and keyword_value_original

    2. for the existing data use _update_by_query API call

  3. Better search speed when you compare with other solutions.

enter image description here

Here is how to:

PUT test_index_lowercase
{
  "mappings": {
    "properties": {
      "keyword_value": {
        "type": "keyword",
        "normalizer": "lowercase_normalizer"
      }
    }
  },
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "filter": ["lowercase","asciifolding"]
        }
      }
    }
  }
}

PUT test_index_lowercase/_doc/1
{
  "keyword_value": "MuSaB"
}

PUT test_index_lowercase/_doc/2
{
  "keyword_value": "musab"
}

GET test_index_lowercase/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "keyword_value"
      }
    }
  }
}

PUT test_index_lowercase/_mapping
{
  "properties": {
    "keyword_value": {
      "type": "keyword",
      "normalizer": "lowercase_normalizer",
      "fields": {
        "original": {
          "type": "keyword"
        }
      }
    }
  }
}

POST test_index_lowercase/_update_by_query?conflicts=proceed

GET test_index_lowercase/_search
{
  "size": 0,
  "aggs": {
    "1": {
      "terms": {
        "field": "keyword_value"
      }
    },
    "2": {
      "terms": {
        "field": "keyword_value.original"
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.