Elasticsearch regexp with space not working

Question

Let's assume I have books with titles indexed with ElasticSearch as following:

curl -XPUT "http://localhost:9200/_river/books/_meta" -d'
{
"type": "jdbc",
"jdbc": {
"driver": "org.postgresql.Driver",
"url": "jdbc:postgresql://localhost:5432/...",
"user": "...",
"password": "...",
"index": "books",
"type": "books",
"sql": "SELECT * FROM books"}

}'

For instance, I have a book called "Afoo barb".

The following code (searching for '.*foo.*') returns well the book:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo.*' }
               }
            }
          }
        }
     }
  }
});

But the following code (searching for '.*foo bar.*') does not:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo bar.*' }
               }
            }
          }
        }
     }
  }
});

I tried to replace the space by '\s' or '.*' but it does not work either.

I think the title is separated in terms (['Afoo', 'barb']) so it can't find '.*foo bar.*'.

How can I ask Elasticsearch to search the regexp in the complete title ?

karthik manchala · Accepted Answer · 2015-05-26 11:06:26Z

1

Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field, and not to the original text of the field.

You can use different tokenizer for indexing your fields or define the regex in such a way that it returns required documents with high score.

Example with keyword tokenizer:

'regexp': { title: '*(foo bar)*' }

edited May 26, 2015 at 11:06

answered May 21, 2015 at 17:20

karthik manchala

13.7k1 gold badge34 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

pidupuis Over a year ago

'.*(foo|bar).*' does not work since it does an union between the result of '.*foo.*' and '.*bar.*'. I rather want an intersection because I don't want the title 'Foo baz'...

pidupuis Over a year ago

What tokenizer should I use to search against the entire original text ?

karthik manchala Over a year ago

@pidupuis you can use keyword tokenizer for your purpose..

pidupuis Over a year ago

Thanks but I don't manage to use it... According to things I read on forums, I tried to add it as a settings when putting the index or directly use it as param in a query_string but it does not work and I don't find any explicit documentation...

karthik manchala Over a year ago

@pidupuis have you read this?

|

Collectives™ on Stack Overflow

Elasticsearch regexp with space not working

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related