0

I have an array field containig a list of strings: ie.: ["NY", "CA"]

At search time I have a filter which matches any of the strings in the array.

I would like to sort the results based on documents that have the most number of appearances of the searched string: "NY"

Results should include: document 1: ["CA", "NY", "NY"] document 2: ["NY", FL"] document 3: ["NY", CA", "NY", "NY"]

Results should be ordered as such

User 3, User 1, User 2

Is this possible? If so, how?

1
  • I have this problem right now, and I think in practice, it will sort based on term frequencies IF other documents have "CA" but not NY. Commented Sep 14, 2013 at 15:26

2 Answers 2

1

For those curious, I was not able to boost based on how many occurrences of the word happen in the array. I did however accomplished what I needed with the following:

curl -X POST "http://localhost:9200/index/document/1" -d '{"id":1,"states_ties":["CA"],"state_abbreviation":"CA","worked_in_states":["CA"],"training_in_states":["CA"]}'
curl -X POST "http://localhost:9200/index/document/2" -d '{"id":2,"states_ties":["CA","NY"],"state_abbreviation":"FL","worked_in_states":["NY","CA"],"training_in_states":["NY","CA"]}'
curl -X POST "http://localhost:9200/index/document/3" -d '{"id":3,"states_ties":["CA","NY","FL"],"state_abbreviation":"NY","worked_in_states":["NY","CA"],"training_in_states":["NY","FL"]}'

curl -X GET 'http://localhost:9200/index/_search?per_page=10&pretty' -d '{
  "query": {
    "custom_filters_score": {
      "query": {
        "terms": {
          "states_ties": [
            "CA"
          ]
        }
      },
      "filters": [
        {
          "filter": {
            "term": {
              "state_abbreviation": "CA"
            }
          },
          "boost": 1.03
        },
        {
          "filter": {
            "terms": {
              "worked_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.02
        },
        {
          "filter": {
            "terms": {
              "training_in_states": [
                "CA"
              ]
            }
          },
          "boost": 1.01
        }
      ],
      "score_mode": "multiply"
    }
  },
  "sort": [
    {
      "_score": "desc"
    }
  ]
}'

results: id: score

1: 0.75584483
2: 0.73383
3: 0.7265643
Sign up to request clarification or add additional context in comments.

Comments

0

This would be accomplished by the standard Lucene scoring implementation. If you were simply searching for "NY", without specifying an order, it will sort by relevance, and will assign highest relevance to a document with more occurances of the term, all else being equal.

4 Comments

Not for a filter query, I have added supporting code to the question.
Ah, I see. I don't believe you can do that though. Filtering does what it says, it filters. Either a doc gets through the filter or it doesn't. It simply restricts the result set. I don't believe there is any concept allowing you to determine that doc1 passes a filter better than doc2. I would suggest that using a filter is the wrong way to approach your problem.
gist.github.com/brupm/5138787 here's the supporting code. But I believe femtoRgon is correct.
Also, even when using query_string searches, the score only seems to calculate properly if I search across the entire doc: gist.github.com/brupm/5138842

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.