0

I have an elastic search index with following documents and I want to have an autocomplete functionality over the specified fields:

mapping: https://gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4

Usecase:

My query is of the form prefix type eg "sta", "star", "star w" .."start war" etc with an additional filter as tags = "science fiction". Also there queries could match other fields like description, actors(in cast field, not this is nested). I also want to know which field it matched to.

I investigated 2 ways for doing that but non of the methods seem to address the usecase above:

1) Suggester autocomplete:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/search-suggesters-completion.html

With this it seems I have to add another field called "suggest" replicating the data which is not desirable.

2) using a prefix filter/query:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/query-dsl-prefix-filter.html

this gives the whole document back not the exact matching terms.

Is there a clean way of achieving this, please advise.

4 Answers 4

1

Don't create mapping separately, insert data directly into index. It will create default mapping for that. Use below query for autocomplete.

GET /netflix/movie/_search
{
"query": {
    "query_string": {
        "query": "sta*"
    }
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the reply, but this will return the whole document, what about just returning the terms in for that document as tags will be a long list.
Can you share your mapping
When user search you want to show only search_terms data..?
want both search terms and which fields it matched on. Thanks again.
1

I think completion suggester would be the cleanest way but if that is undesirable you could use aggregations on name field.

This is a sample index(I am assuming you are using ES 1.7 from your question

PUT netflix
{
  "settings": {
    "analysis": {
      "analyzer": {
        "prefix_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim",
            "edge_filter"
          ]
        },
        "keyword_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim"
          ]
        }
      },
      "filter": {
        "edge_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      }
    }
  },
  "mappings": {
    "movie":{
      "properties": {
        "name":{
          "type": "string",
          "fields": {
            "prefix":{
            "type":"string",
            "index_analyzer" : "prefix_analyzer",
            "search_analyzer" : "keyword_analyzer"
            },
            "raw":{
              "type": "string",
              "analyzer": "keyword_analyzer"
            }
          }
        },
        "tags":{
          "type": "string", "index": "not_analyzed"
        }
      }
    }
  }
}

Using multi-fields, name field is analyzed in different ways. name.prefix is using keyword tokenizer with edge ngram filter so that string star wars can be broken into s, st, sta etc. but while searching, keyword_analyzer is used so that search query does not get broken into multiple small tokens. name.raw will be used for aggregation.

The following query will give top 10 suggestions.

GET netflix/movie/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "tags": "sci-fi"
        }
      },
      "query": {
        "match": {
          "name.prefix": "sta"
        }
      }
    }
  },
  "size": 0,
  "aggs": {
    "unique_movie_name": {
      "terms": {
        "field": "name.raw",
        "size": 10
      }
    }
  }
}

Results will be something like

"aggregations": {
      "unique_movie_name": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "star trek",
               "doc_count": 1
            },
            {
               "key": "star wars",
               "doc_count": 1
            }
         ]
      }
   }

UPDATE :

You could use highlighting for this purpose I think. Highlight section will get you the whole word and which field it matched. You can also use inner hits and highlighting inside it to get nested docs also.

{
  "query": {
    "query_string": {
      "query": "sta*"
    }
  },
  "_source": false,
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

5 Comments

Thanks a lot for your solution. Here is the complete mapping: gist.github.com/anonymous/0609b1d110d91dceb9a90faa76d1d5d4, Is there a way to return which fields it matched on (it has nested fields too) as part of the result using your solution or any other way. Thanks a lot once again !
what is the requirement? would like to suggest results(autocomplete) or want to know which fields user query matched? I thought you wanted to auto complete movie name field.
would like to suggest results (autocomplete) and also know which field that suggestion came from, the suggestion could come from movie name as well as from actor's name, description etc etc, so that in the auto suggest, the suggestion is displayed with what that entity is, eg: query = "ar", results in 1) "arnold schwarzenegger" as entity actor(it matched actor name) 2) arabian nights as movie (as it matched movie name). I really appreciate your help for the same.
I have updated the answer. let me know if you need any further help
Thanks a lot for your help. highlights should work, will try it out !
0

you can use lowercase filter for the elastic index.THis will help you to search upper case letters as well.

Create doc using below settings

   PUT lowercase_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "whitespace_lowercase": {
          "tokenizer": "whitespace",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
 "mappings": {
    "properties": {
      "field1": { "type": "text" }
    }
  }
}

Now when you search you will get both of the fields included irrespective of lowercase and upper case

Comments

0

I created this table for myself:

UseCase Completion S. Context S. Term S. Phrase S. search_as_you_type Edge N-Gram
Basic Auto-Complete X X X X
Flexible Search/Query X X
High Performace for Large Datasets X X X X
Higher Memory Usage X X X
Higher Storage Usage X X
Substring Matches X X
Dynamic Data Updates X X X X
Relevance Scoring X X X X
Spell Correction X X
complexity to implement low high medium high low medium
Speciality fast prefix matching context-aware suggestions single term corrections multi term corrections implements edge n-gram, full text partial matching

differentiate between Query Suggestion and Search

References

ever since the author asked, the search_as_you_type field was implemented which is exactly what author would have needed back then :D

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.