11

Trying to use ElasticSearch to create a search that uses distance from a centerpoint to influence relevance.

I don't want to simply sort on distance from a point, which I know is possible, because I want relevance based on the searched query to also affect results.

I'd like to pass in a search string, say "coffee", and a lat/lon, say "38, -77", and get my results ordered by a combination of how related they are to "coffee" and how close they are to "38, -77".

Thanks!

2 Answers 2

11

The recently (0.90.4) added function_score query type adds support for ranking based on distance. This is an alternative to the custom score query type.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

An example lifted from there:

"query": {
  "function_score": {
    "functions": [
      { "gauss":  { "loc":   { "origin": "51,0", "scale": "5km" }}},
    ]
  }
}

This applies a decay function (there are several) to a field ("loc") that scores against the distance from an origin given a particular scale. This is exactly what you'd want for distance ranking since it gives you a lot of flexibility to how it should rank without writing custom scripts.

Sign up to request clarification or add additional context in comments.

2 Comments

What kind of field is "loc" ? Can you explain a little bit more about it ? I have something like this "loc" : "12.5,65.5" and it doesn't seem to be working.
might have changed since I wrote this, there has been a lot of change around geo stuff in recent elasticsearch versions. 0.90.4 is a long time ago.
7

You can use distance function in the script of the Custom Score Query to modify _score based on the distance from a centerpoint.

7 Comments

Would those weights be applied after the actual search? The reason I ask is because if the limit on the search was 100 results and the total matching results was 1000, then some very close results could be left out if they came after the first 100.
These weights would be applied after the search but before the retrieval. First search is performed and 1000 (in your example) results are collected. For every collected result the relevancy score is calculated using provided script, and top 100 results are retained. When all 1000 records from the search results are processed the top 100 records are retrieved.
Thanks so much for your help! Is this how elasticsearch always works? It seems for some queries that include a large amount of "fuzziness", the potential search set could include millions of results with low relevancy. How does elasticsearch know when to stop looking?
This is how elasticsearch search works in most cases. If you think about it, to find 10 most relevant records out of millions of results, you have to calculate relevancy score for these millions of results. There are some ways to shortcut this process (by using routing, scan search type, limit filter, etc.), but I cannot fit all of them into 600 characters of a comment. Moreover I don't think it's a good idea to hijack this question with somewhat irrelevant discussion. So, I would propose moving this discussion to a separate question or elasticsearch mailing list.
Ah yes, I apologize. Posted a new question: stackoverflow.com/questions/13114958/… Thanks again for all your help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.