Elasticsearch sort alphabetically then numerically

Question

I looking for some elegant way to sort my results first by alphabet and then by numbers.

My current solution is inserting an "~" before numbers using the next sort script, "~" is lexicographically after "z":

"sort": {
  "_script":{
      "script" : "s = doc['name.raw'].value; n = org.elasticsearch.common.primitives.Ints.tryParse(s.split(' ')[0][0]); if (n != null) { '~' + s } else { s }",
      "type" : "string"
  }
 }

but I wonder if there is a more elegant and perhaps more performant solution.

Input:

ZBA ABC ...
ABC SDK ...
123 RIU ...
12B BTE ...
11J TRE ...
BCA 642 ...

Desired output:

ABC SDK ...
BCA 642 ...
ZBA ABC ...
11J TRE ...
12B BTE ...
123 RIU ...

Val · Accepted Answer · 2016-06-04 06:10:49Z

3

You can do the same thing at indexing time using a custom analyzer which leverages a pattern_replace character filter. It's more performant to do it at indexing than running a script sort at search time for each query.

It works in the same vein as your solution, i.e. if we detect a number, we prepend the value with a tilde ~, otherwise we don't do anything, yet we do it at indexing time and index the resulting value in the name.sort field.

PUT /tests
{
  "settings": {
    "analysis": {
      "char_filter": {
        "pre_num": {
          "type": "pattern_replace",
          "pattern": "(\\d)",
          "replacement": "~$1"
        }
      },
      "analyzer": {
        "number_tagger": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [],
          "char_filter": [
            "pre_num"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "name": {
          "type": "string",
          "fields": {
            "sort": {
              "type": "string",
              "analyzer": "number_tagger",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Then you can index your data

POST /tests/test/_bulk
{"index": {}}
{"name": "ZBA ABC"}
{"index": {}}
{"name": "ABC SDK"}
{"index": {}}
{"name": "123 RIU"}
{"index": {}}
{"name": "12B BTE"}
{"index": {}}
{"name": "11J TRE"}
{"index": {}}
{"name": "BCA 642"}

Then your query can simply look like this:

POST /tests/_search
{
  "sort": {
    "name.sort": "asc"
  }
}

And the response you'll get is:

{
  "hits": {
    "hits": [
      {
        "_source": {
          "name": "ABC SDK"
        }
      },
      {
        "_source": {
          "name": "BCA 642"
        }
      },
      {
        "_source": {
          "name": "ZBA ABC"
        }
      },
      {
        "_source": {
          "name": "11J TRE"
        }
      },
      {
        "_source": {
          "name": "12B BTE"
        }
      },
      {
        "_source": {
          "name": "123 RIU"
        }
      }
    ]
  }
}

edited Jun 4, 2016 at 6:10

answered Jun 4, 2016 at 5:21

Val

218k14 gold badges377 silver badges384 bronze badges

Sign up to request clarification or add additional context in comments.

14 Comments

dimartiro Over a year ago

I like to do the change at indexing time but there is no smarter solution than add a tilde before the number? that fails to convince me.

Val Over a year ago

The tilde is only added to a field you only use for sorting. Lexicographical sorting is what it is. The original input is not changed at all. The solution is the same as yours just carried out at indexing time instead of leveraging costly scripting at search time

dimartiro Over a year ago

Yes, I understand that, but I would like to find another way different than save another field with the tilde (~) before the number

Val Over a year ago

May I ask what bothers you about this solution?

Val Over a year ago

I can provide another solution which figures out a sort number, but that still needs to add another field just for sorting, since you want to change the way how lexicographic sorting works.

|

Collectives™ on Stack Overflow

Elasticsearch sort alphabetically then numerically

1 Answer 1

14 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

14 Comments

Your Answer

Sign up or log in

Post as a guest

Related