Implementing search using Elasticsearch

Question

I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.

This is our current query.

{
"query": {
    "wildcard" : {
        "message" : {
            "title" : "h*"
        }
    }
}

}

By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.

Does this answer your question? delete record from elastic search using typescript with node js — Mayank Pandav
– Mayank Pandav, Commented Mar 4, 2020 at 10:57
No, our requirement is that we need to get the results even there is a small spelling mistake in the word which we have searched — Arjun Sankar
– Arjun Sankar, Commented Mar 4, 2020 at 11:07

Amit · Accepted Answer · 2020-03-04 14:36:10Z

You need to use english analyzer which stemmed tokens to its root form. More info can be found here

I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.

Index Mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "english" 
      }
    }
  }
}

Index document

{
   "title" : "Hello World"
}

Search query for `h` and its result

{
  "query": {
    "match": {
      "title": "h"
    }
  }
}

 "hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.42763555,
            "_source": {
               "title": "Hello World"
            }
         }
      ]

Search query for `Hello Worlds` and same document comes in result

{
  "query": {
    "match": {
      "title": "Hello worlds"
    }
  }
}

Result

"hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.8552711,
            "_source": {
               "title": "Hello World"
            }
         }
      ]

jaspreet chahal · Accepted Answer · 2020-03-04 11:20:32Z

1

EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens. Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.

Fuzziness can be used to find similar terms

Mapping

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Query

GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "hello worlds",
        "fuzziness": 1
      }
    }
  }
}

answered Mar 4, 2020 at 11:20

jaspreet chahal

9,1492 gold badges14 silver badges30 bronze badges

4 Comments

Arjun Sankar Over a year ago

The results are empty when I used the query above.

jaspreet chahal Over a year ago

Can you paste out put of GET my_index/_search

Arjun Sankar Over a year ago

[ "took" => 0 "timed_out" => false "_shards" => array:4 [] "hits" => array:3 [ "total" => array:2 [] "max_score" => null "hits" => [] ] "aggregations" => array:1 [] ]

jaspreet chahal Over a year ago

The index is empty. You need to add data. PUT my_index will create a new index by name of my_index. Using ngram or edgegram is a tokenizer which stores tokens in a certain way so this has to be done at time of mapping(creating index)

Collectives™ on Stack Overflow

Implementing search using Elasticsearch

2 Answers 2

Index Mapping

Index document

Search query for `h` and its result

Search query for `Hello Worlds` and same document comes in result

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Index Mapping

Index document

Search query for h and its result

Search query for Hello Worlds and same document comes in result

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Search query for `h` and its result

Search query for `Hello Worlds` and same document comes in result