2

I am currently implementing elasticsearch in my application. Please assume that "Hello World" is the data which we need to search. Our requirement is that we should get the result by entering "h" or "Hello World" or "Hello Worlds" as the keyword.

This is our current query.

{
"query": {
    "wildcard" : {
        "message" : {
            "title" : "h*"
        }
    }
}

}

By using this we are getting the right result using the keyword "h". But we need to get the results in case of small spelling mistakes also.

4
  • Does this answer your question? delete record from elastic search using typescript with node js Commented Mar 4, 2020 at 10:57
  • No, our requirement is that we need to get the results even there is a small spelling mistake in the word which we have searched Commented Mar 4, 2020 at 11:07
  • 1
    @OpsterElasticsearchNinja Will try it out today. Commented Mar 5, 2020 at 5:54
  • 1
    @ArjunSankar tq and let me know if u hv further questions Commented Mar 5, 2020 at 6:00

2 Answers 2

2

You need to use english analyzer which stemmed tokens to its root form. More info can be found here

I implemented it by taking your example data, query and expected results using the edge n-gram analyzer and match query.

Index Mapping

{
  "settings": {
    "analysis": {
      "filter": {
        "autocomplete_filter": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 10
        }
      },
      "analyzer": {
        "autocomplete": { 
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "autocomplete_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "autocomplete", 
        "search_analyzer": "english" 
      }
    }
  }
}

Index document

{
   "title" : "Hello World"
}

Search query for h and its result

{
  "query": {
    "match": {
      "title": "h"
    }
  }
}

 "hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.42763555,
            "_source": {
               "title": "Hello World"
            }
         }
      ]

Search query for Hello Worlds and same document comes in result

{
  "query": {
    "match": {
      "title": "Hello worlds"
    }
  }
}

Result

"hits": [
         {
            "_index": "so-60524477-partial-key",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.8552711,
            "_source": {
               "title": "Hello World"
            }
         }
      ]
Sign up to request clarification or add additional context in comments.

Comments

1

EdgeNGrams or NGrams have better performance than wildcards. For wild card all documents have to be scanned to see which match the pattern. Ngrams break a text in small tokens. Ex Quick Foxes will stored as [ Qu, Qui, Quic, Quick, Fo, Fox, Foxe, Foxes ] depending on min_gram and max_gram size.

Fuzziness can be used to find similar terms

Mapping

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "text":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}

Query

GET my_index/_search
{
  "query": {
    "match": {
      "text": {
        "query": "hello worlds",
        "fuzziness": 1
      }
    }
  }
}

4 Comments

The results are empty when I used the query above.
Can you paste out put of GET my_index/_search
[ "took" => 0 "timed_out" => false "_shards" => array:4 [] "hits" => array:3 [ "total" => array:2 [] "max_score" => null "hits" => [] ] "aggregations" => array:1 [] ]
The index is empty. You need to add data. PUT my_index will create a new index by name of my_index. Using ngram or edgegram is a tokenizer which stores tokens in a certain way so this has to be done at time of mapping(creating index)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.