elasticsearch query string not performing fuzzy query as expected when using phrases

Question

i'm not getting the expected result when using a phrase in the query_string for elasticsearch.

let's say i have a title, 'john wayne goes to manhattan'. i've indexed the title field with a 'standard' analyzer and the following is my query. with or without the fuzzy indicator (~) it won't find anything unless i have 'john wayne' spelled correctly. no results for 'john wane' or similar.

"query": {

  "query_string": {
    "fields": ["title^2"],
    "query": "\"john wayne\"~1",
    "default_operator": "AND", 
    "phrase_slop": 0, 
    "minimum_should_match": "100%"
  }
}

i've tried altering the number after the tilde to increase the fuziness, but still no matches.

any ideas?

ppearcy · Accepted Answer · 2014-06-24 20:51:49Z

6

Doing a fuzzy search on a phrase is actually a "proximity" search. Instead of measuring the levenshtein distance between letters, the proximity between terms in the query.

Your query should return results if it were:

"query" : "john wane~1"

See here for more info on the difference: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_fuzziness

Edit:

Here is a concrete example recreation:

Create some docs

curl -XPUT "http://localhost:9200/test/test/1" -d'
{
    "message" : "My best friend is John Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/2" -d'
{
    "message" : "My best friend is John Marion Wayne, who is yours?"
}'

curl -XPUT "http://localhost:9200/test/test/3" -d'
{
    "message" : "My best friend is John Marion Mitchell Wayne, who is yours?"
}'

Sample naive query, non phrase:

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "query_string": {
           "query": "john AND wane~1"
        }
    }
}'

How to do the phrase query with span. Notice the terms are lower cased, as the term query is not analyzed. Also, you can adjust the span slop to control how close to each other each term should be.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
        "clauses" : [
            { "span_term" : { "message" : "john" } },
            { "span_term" : { "message" : "wayne" } }
        ],
        "slop" : 0,
        "in_order" : true
        }
    }
}'

And now here is the real deal of exactly what you are looking for.

curl -XGET "http://localhost:9200/_search" -d'
{
    "query" : {
        "span_near" : {
            "clauses" : [
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "john",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                },
                {
                    "span_multi" : {
                        "match" : {
                            "fuzzy" : {
                                "message" : {
                                    "value" : "wane",
                                    "fuzziness" : "1"
                                }
                            }
                        }
                    }
                }
            ],
            "slop" : 0,
            "in_order" : true
        }
    }
}'

edited Jun 24, 2014 at 20:51

answered Jun 21, 2014 at 4:57

ppearcy

2,78220 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

bigerock Over a year ago

doesn't that just make 'wane' in this case the one that is fuzzy? you'd have to do "john~1 wane~1", but then you're gonna get variants of 'john' as well which may or may not be desirable. what i'm looking for is the whole phrase 'john wayne' to be fuzzy, but also that it's a phrase, and it doesn't find 'john' in one document and 'wayne' in another.

ppearcy Over a year ago

Yes, it does. I just wanted to show a sample query that would do a fuzzy search on the specific term. If you want entire phrase to be fuzzy on a per term basis, things get more complex. If you treat your field as single token, you can do fuzzy queries on that, but it will be at the entire field level. I think your best bet would be: elasticsearch.org/guide/en/elasticsearch/reference/current/…

bigerock Over a year ago

i'm not sure what that query does. the documentation is limited, imho. it seems to just be saying i can wrap the matches in the span_multi ... ok but what does it do?

ppearcy Over a year ago

Yeah, I get your confusion. The elasticsearch docs have a lack context to say the least. Updated above with full example.

bigerock Over a year ago

thank you so much for the complete example! you mentioned they were not analyzed ... so what would you suggest to use for an analyzer if you were to use one?

|

Collectives™ on Stack Overflow

elasticsearch query string not performing fuzzy query as expected when using phrases

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related