2

When I make this query:

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "regexp":{
            "main_text": ".*word r.*"
        }
    }
}
'

I get no results. But when I query:

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "regexp":{
            "main_text": ".*word.*"
        }
    }
}
'

I get results with word (including results with "word r..."). I am using Elasticsearch 6.2.2. Any idea what is going on?

0

2 Answers 2

1

Let's say you have the below sentence

word raincoat bword wordcd

If the field main_text is of type text and if it uses default i.e. Standard Analyzer, then the sentence would be broken into below tokens

word raincoat bword wordcd

(Yup no spaces)

Now these words are actually which are stored in inverted index and when you query using match or even regex, it would try to match to these words.

Note that it doesn't save sentence as is for e.g. "word raincoat" nor it is saved as "word " (notice the space) in inverted index.

Now you are using regex .*word.* you would get documents having word, bword and wordcd 'coz that's what your inverted index has.

Again now when you use regex .*word r*, since inverted index doesn't save the "word raincoat" together, you wouldn't get the result.

What you can do is, have the field main_text of type keyword, in this case datatype keyword doesn't go through the analysis phase and therefore keeps the entire value saved as is in inverted index. Your regex *.word r.*, would then work as expected.

You always search inverted index, so you would get only what inverted index stores

In case if you need both partial search as well as exact search implementation, then I'd suggest you make use of multi-field for main_text or whatever field name you intend to.

Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

0

This is becuase regexp is a term query and not a fulltext query. You are probably using a whitespace tokenizer and then you wont ever find a token containg whitespace

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.