1

i'm not getting the expected result when using a phrase in the query_string for elasticsearch.

let's say i have a title, 'john wayne goes to manhattan'. i've indexed the title field with a 'standard' analyzer and the following is my query. with or without the fuzzy indicator (~) it won't find anything unless i have 'john wayne' spelled correctly. no results for 'john wane' or similar.

"query": {

  "query_string": {
    "fields": ["title^2"],
    "query": "\"john wayne\"~1",
    "default_operator": "AND", 
    "phrase_slop": 0, 
    "minimum_should_match": "100%"
  }
}

i've tried altering the number after the tilde to increase the fuziness, but still no matches.

any ideas?

1 Answer 1

6

Doing a fuzzy search on a phrase is actually a "proximity" search. Instead of measuring the levenshtein distance between letters, the proximity between terms in the query.

Your query should return results if it were:

"query" : "john wane~1" 

See here for more info on the difference: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_fuzziness

Edit:

Here is a concrete example recreation:

Create some docs

curl -XPUT "http://localhost:9200/test/test/1" -d'
{
    "message" : "My best friend is John Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/2" -d'
{
    "message" : "My best friend is John Marion Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/3" -d'
{
    "message" : "My best friend is John Marion Mitchell Wayne, who is yours?"
}'

Sample naive query, non phrase:

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "query_string": {
           "query": "john AND wane~1"
        }
    }
}'

How to do the phrase query with span. Notice the terms are lower cased, as the term query is not analyzed. Also, you can adjust the span slop to control how close to each other each term should be.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
        "clauses" : [
            { "span_term" : { "message" : "john" } },
            { "span_term" : { "message" : "wayne" } }
        ],
        "slop" : 0,
        "in_order" : true
        }
    }
}'

And now here is the real deal of exactly what you are looking for.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
            "clauses" : [
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "john",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                },
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "wane",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                }
            ],
            "slop" : 0,
            "in_order" : true
        }
    }
}'
Sign up to request clarification or add additional context in comments.

6 Comments

doesn't that just make 'wane' in this case the one that is fuzzy? you'd have to do "john~1 wane~1", but then you're gonna get variants of 'john' as well which may or may not be desirable. what i'm looking for is the whole phrase 'john wayne' to be fuzzy, but also that it's a phrase, and it doesn't find 'john' in one document and 'wayne' in another.
Yes, it does. I just wanted to show a sample query that would do a fuzzy search on the specific term. If you want entire phrase to be fuzzy on a per term basis, things get more complex. If you treat your field as single token, you can do fuzzy queries on that, but it will be at the entire field level. I think your best bet would be: elasticsearch.org/guide/en/elasticsearch/reference/current/…
i'm not sure what that query does. the documentation is limited, imho. it seems to just be saying i can wrap the matches in the span_multi ... ok but what does it do?
Yeah, I get your confusion. The elasticsearch docs have a lack context to say the least. Updated above with full example.
thank you so much for the complete example! you mentioned they were not analyzed ... so what would you suggest to use for an analyzer if you were to use one?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.