Query to match substrings in elasticsearch

Question

I have a field with mapping :

{
"type" : "text",
    "fields" : {
      "keyword" : {
        "type" : "keyword",
        "ignore_above" : 256
      }
    }
}

One of the document has value for the above field as "abcdef". What kind of ES query should be used to match this document when searching for "def"?

I have tried match, prefix queries. ES version : 5.1.1

Could you post the search requests you've tried, please.

Adam Benson
– Adam Benson

2019-07-18 12:43:31 +00:00
Commented Jul 18, 2019 at 12:43 — Adam Benson
– Adam Benson, Commented Jul 18, 2019 at 12:43

Amit · Accepted Answer · 2019-07-18 12:51:12Z

1

You can create a custom analyzer which uses the n-gram analyzer and uses it on your field on which you want the substring search, wildcard searches are quite costly and I guess that's the reason you don't want to use them as mentioned in your this duplicate SO question.

My Index setting and mapping according to your requirement.

{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "my_tokenizer"
                }
            },
            "tokenizer": {
                "my_tokenizer": {
                    "type": "ngram",
                    "min_gram": 3,
                    "max_gram": 3,
                    "token_chars": [
                        "letter",
                        "digit"
                    ]
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "foo": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                },
                "analyzer": "my_analyzer"
            }
        }
    }
}

I have created a field called foo and used my custom n-gram analyzer on that field, so for value abcdef it would create a below tokens.

{
    "tokens": [
        {
            "token": "abc",
            "start_offset": 0,
            "end_offset": 3,
            "type": "word",
            "position": 0
        },
        {
            "token": "bcd",
            "start_offset": 1,
            "end_offset": 4,
            "type": "word",
            "position": 1
        },
        {
            "token": "cde",
            "start_offset": 2,
            "end_offset": 5,
            "type": "word",
            "position": 2
        },
        {
            "token": "def",
            "start_offset": 3,
            "end_offset": 6,
            "type": "word",
            "position": 3
        }
    ]
}

And then below search query returns me the doc containing abcdef.

{
    "query": {
        "term" : {
            "foo" : "def"
        }
    }
}

EDIT: My postman collection link if you want to check all the API calls., Just replace it with you es port and index.

answered Jul 18, 2019 at 12:51

Amit

32.5k7 gold badges68 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

User3518958 Over a year ago

Ok, so in n-gram analyser, we can configure tokens of what length should be built ? But this is very good info.

Amit Over a year ago

@User3518958, yes of course, you can see "min_gram": 3 and "max_gram": 3, attribute in my mapping and more information in the link I provided for n-gram.

Amit Over a year ago

@User3518958 can you please provide update on whether I was able to answer ur question or not?

Collectives™ on Stack Overflow

Query to match substrings in elasticsearch

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related