Elasticsearch sort based on the number of occurrences a string appears in an array

Question

I have an array field containig a list of strings: ie.: ["NY", "CA"]

At search time I have a filter which matches any of the strings in the array.

I would like to sort the results based on documents that have the most number of appearances of the searched string: "NY"

Results should include: document 1: ["CA", "NY", "NY"] document 2: ["NY", FL"] document 3: ["NY", CA", "NY", "NY"]

Results should be ordered as such

User 3, User 1, User 2

Is this possible? If so, how?

I have this problem right now, and I think in practice, it will sort based on term frequencies IF other documents have "CA" but not NY. — Henley Wing Chiu
– Henley Wing Chiu, Commented Sep 14, 2013 at 15:26

brupm · Accepted Answer · 2013-03-12 22:04:14Z

For those curious, I was not able to boost based on how many occurrences of the word happen in the array. I did however accomplished what I needed with the following:

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"states_ties":["CA"],"state_abbreviation":"CA","worked_in_states":["CA"],"training_in_states":["CA"]}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"states_ties":["CA","NY"],"state_abbreviation":"FL","worked_in_states":["NY","CA"],"training_in_states":["NY","CA"]}'
curl -X POST "http://localhost:9200/index/document/3" -d '{"id":3,"states_ties":["CA","NY","FL"],"state_abbreviation":"NY","worked_in_states":["NY","CA"],"training_in_states":["NY","FL"]}'

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "custom_filters_score": {
      "query": {
        "terms": {
          "states_ties": [
            "CA"
          ]
        }
      },
      "filters": [
        {
          "filter": {
            "term": {
              "state_abbreviation": "CA"
            }
          },
          "boost": 1.03
        },
        {
          "filter": {
            "terms": {
              "worked_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.02
        },
        {
          "filter": {
            "terms": {
              "training_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.01
        }
      ],
      "score_mode": "multiply"
    }
  },
  "sort": [
    {
      "_score": "desc"
    }
  ]
}'

results: id: score

1: 0.75584483
2: 0.73383
3: 0.7265643

femtoRgon · Accepted Answer · 2013-03-11 16:19:43Z

0

This would be accomplished by the standard Lucene scoring implementation. If you were simply searching for "NY", without specifying an order, it will sort by relevance, and will assign highest relevance to a document with more occurances of the term, all else being equal.

answered Mar 11, 2013 at 16:19

femtoRgon

33.4k7 gold badges67 silver badges90 bronze badges

4 Comments

brupm Over a year ago

Not for a filter query, I have added supporting code to the question.

femtoRgon Over a year ago

Ah, I see. I don't believe you can do that though. Filtering does what it says, it filters. Either a doc gets through the filter or it doesn't. It simply restricts the result set. I don't believe there is any concept allowing you to determine that doc1 passes a filter better than doc2. I would suggest that using a filter is the wrong way to approach your problem.

brupm Over a year ago

gist.github.com/brupm/5138787 here's the supporting code. But I believe femtoRgon is correct.

brupm Over a year ago

Also, even when using query_string searches, the score only seems to calculate properly if I search across the entire doc: gist.github.com/brupm/5138842

Collectives™ on Stack Overflow

Elasticsearch sort based on the number of occurrences a string appears in an array

2 Answers 2

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related