2

Let's assume I have books with titles indexed with ElasticSearch as following:

curl -XPUT "http://localhost:9200/_river/books/_meta" -d'
{
"type": "jdbc",
"jdbc": {
"driver": "org.postgresql.Driver",
"url": "jdbc:postgresql://localhost:5432/...",
"user": "...",
"password": "...",
"index": "books",
"type": "books",
"sql": "SELECT * FROM books"}

}'

For instance, I have a book called "Afoo barb".

The following code (searching for '.*foo.*') returns well the book:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo.*' }
               }
            }
          }
        }
     }
  }
});

But the following code (searching for '.*foo bar.*') does not:

client.search({
  index: 'books',
  'from': 0,
  'size': 10,
  'body' : {
    'query': {
      'filtered': {
         'filter': {
           'bool': {
              'must': {
                'regexp': { title: '.*foo bar.*' }
               }
            }
          }
        }
     }
  }
});

I tried to replace the space by '\s' or '.*' but it does not work either.

I think the title is separated in terms (['Afoo', 'barb']) so it can't find '.*foo bar.*'.

How can I ask Elasticsearch to search the regexp in the complete title ?

1 Answer 1

1

Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field, and not to the original text of the field.

You can use different tokenizer for indexing your fields or define the regex in such a way that it returns required documents with high score.

Example with keyword tokenizer:

'regexp': { title: '*(foo bar)*' }
Sign up to request clarification or add additional context in comments.

8 Comments

'.*(foo|bar).*' does not work since it does an union between the result of '.*foo.*' and '.*bar.*'. I rather want an intersection because I don't want the title 'Foo baz'...
What tokenizer should I use to search against the entire original text ?
@pidupuis you can use keyword tokenizer for your purpose..
Thanks but I don't manage to use it... According to things I read on forums, I tried to add it as a settings when putting the index or directly use it as param in a query_string but it does not work and I don't find any explicit documentation...
@pidupuis have you read this?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.