no results when using whitespace in regex query

Question

When I make this query:

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "regexp":{
            "main_text": ".*word r.*"
        }
    }
}
'

I get no results. But when I query:

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "regexp":{
            "main_text": ".*word.*"
        }
    }
}
'

I get results with word (including results with "word r..."). I am using Elasticsearch 6.2.2. Any idea what is going on?

Kamal Kunjapur · Accepted Answer · 2018-11-05 21:07:19Z

Let's say you have the below sentence

word raincoat bword wordcd

If the field main_text is of type text and if it uses default i.e. Standard Analyzer, then the sentence would be broken into below tokens

word raincoat bword wordcd

(Yup no spaces)

Now these words are actually which are stored in inverted index and when you query using match or even regex, it would try to match to these words.

Note that it doesn't save sentence as is for e.g. "word raincoat" nor it is saved as "word " (notice the space) in inverted index.

Now you are using regex .*word.* you would get documents having word, bword and wordcd 'coz that's what your inverted index has.

Again now when you use regex .*word r*, since inverted index doesn't save the "word raincoat" together, you wouldn't get the result.

What you can do is, have the field main_text of type keyword, in this case datatype keyword doesn't go through the analysis phase and therefore keeps the entire value saved as is in inverted index. Your regex *.word r.*, would then work as expected.

You always search inverted index, so you would get only what inverted index stores

In case if you need both partial search as well as exact search implementation, then I'd suggest you make use of multi-field for main_text or whatever field name you intend to.

Hope this helps!

YonatanBM · Accepted Answer · 2018-11-05 17:32:05Z

0

This is becuase regexp is a term query and not a fulltext query. You are probably using a whitespace tokenizer and then you wont ever find a token containg whitespace

answered Nov 5, 2018 at 17:32

YonatanBM

466 bronze badges

Collectives™ on Stack Overflow

no results when using whitespace in regex query

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related