Ignore spaces in Elasticsearch

Question

For my search I want to take into account the fact that the "space" character is not mandatory in a filter request.
For exemple:
when I filter on "THE ONE" I see the corresponding document.
I want to see it even if I write "THEONE".
This is how my query is built today:

boolQueryBuilder.must(QueryBuilders.boolQuery()
     .should(QueryBuilders.wildcardQuery("description", "*" + 
         searchedWord.toLowerCase() + "*"))
     .should(QueryBuilders.wildcardQuery("id", "*" + 
         searchedWord.toUpperCase() + "*"))
     .should(QueryBuilders.wildcardQuery("label", "*" + 
         searchedWord.toUpperCase() + "*"))
     .minimumShouldMatch("1"));

What I want is to add this filter: (Writing a space-ignoring autocompleter with ElasticSearch)

"word_joiner": {
  "type": "word_delimiter",
  "catenate_all": true
}

But I don't know how to do this using the API. Any idea?
Thanks!

EDIT: Following @raam86 suggestion, I added my own custom analyzer:

{
    "index": {
      "number_of_shards": 1,
      "analysis": {
        "filter": {
          "word_joiner": {
            "type": "word_delimiter",
            "catenate_all": true
          }
        },
        "analyzer": {
          "custom_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [
              "word_joiner"
            ]
          }
        }
      }
    }
}

And here is the document:

@Document(indexName = "cake", type = "pa")
@Setting(settingPath = "/elasticsearch/config/settings.json")
public class PaElasticEntity implements Serializable {
   @Field(type = FieldType.String, analyzer = "custom_analyzer")
    private String maker;
}

Still not working...

Nikita Klimov · Accepted Answer · 2017-09-26 12:13:24Z

5

You need a shingle token filter. Simple example.

1. create index with settings

PUT joinword
{
    "settings": {
        "analysis": {
            "filter": {
                "word_joiner": {
                    "type": "shingle",
                    "output_unigrams": "true",
                    "token_separator": ""
                }
            },
            "analyzer": {
                "word_join_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "word_joiner"
                    ]
                }
            }
        }
    }
}

2. check that analyzer work as expected

GET joinword/_analyze?pretty
{
  "analyzer": "word_join_analyzer",
  "text": "ONE TWO"
}

output:

{
  "tokens" : [ {
    "token" : "one",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "<ALPHANUM>",
    "position" : 0
  }, {
    "token" : "onetwo",
    "start_offset" : 0,
    "end_offset" : 7,
    "type" : "shingle",
    "position" : 0
  }, {
    "token" : "two",
    "start_offset" : 4,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

So now you can find this document by one, two or onetwo. A search will be case insensitive.

Working Spring example

Full project available on GitHub.

Entity:

@Document(indexName = "document", type = "document", createIndex = false)
@Setting(settingPath = "elasticsearch/document_index_settings.json")
public class DocumentES {
    @Id()
    private String id;
    @Field(type = String, analyzer = "word_join_analyzer")
    private String title;

    public DocumentES() {
    }

    public DocumentES(java.lang.String title) {
        this.title = title;
    }

    public java.lang.String getId() {
        return id;
    }

    public void setId(java.lang.String id) {
        this.id = id;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    @Override
    public java.lang.String toString() {
        return "DocumentES{" +
                "id='" + id + '\'' +
                ", title='" + title + '\'' +
                '}';
    }
}

Main:

@SpringBootApplication
@EnableConfigurationProperties(value = {ElasticsearchProperties.class})
public class Application implements CommandLineRunner {
    @Autowired
    ElasticsearchTemplate elasticsearchTemplate;

    public static void main(String[] args) {
        SpringApplication.run(Application.class);
    }

    @Override
    public void run(String... args) throws Exception {
        elasticsearchTemplate.createIndex(DocumentES.class);
        elasticsearchTemplate.putMapping(DocumentES.class);

        elasticsearchTemplate.index(new IndexQueryBuilder()
                .withIndexName("document")
                .withType("document")
                .withObject(new DocumentES("ONE TWO")).build()
        );

        Thread.sleep(2000);
        NativeSearchQuery query = new NativeSearchQueryBuilder()
                .withIndices("document")
                .withTypes("document")
                .withQuery(matchQuery("title", "ONEtWO"))
                .build();

        List<DocumentES> result = elasticsearchTemplate.queryForList(query, DocumentES.class);

        result.forEach (System.out::println);

    }
}

edited Sep 26, 2017 at 12:13

answered Sep 25, 2017 at 22:27

Nikita Klimov

3091 silver badge6 bronze badges

Sign up to request clarification or add additional context in comments.

19 Comments

Anna Over a year ago

Thanks for the answer! Still doesn't work with this analyzer :( But I couldn't do the step two... I just tried this query: http://localhost:9200/cake/_search?q=ONETWO and it doesn't give me any result. What tool do you use to preform the second step?

raam86 Over a year ago

@anna you can use curl something like curl -XGET http://localhost:9200/cake/_analyze?pretty -d { "analyzer": "word_join_analyzer", "text": "ONE TWO" }

Anna Over a year ago

All right, so I executed this curl -XGET "http://localhost:9200/cake/_analyze?analyzer=word_join_analyzer&pretty" -d 'ONE TWO' and I get the error: curl: (6) Could not resolve host: TWO'... Does it allow spaces? I tried using the character %20, but the results were totally wrong.

Nikita Klimov Over a year ago

curl localhost:9200/joinword/_analyze?pretty -d '{"analyzer":"word_join_analyzer", "text": "ONE TWO"}'

Anna Over a year ago

Found solution: escaping space character as "\ ". So I get the right output! The one that is shown by @Nikita Klimov... So does this mean that this works?

|

Collectives™ on Stack Overflow

Ignore spaces in Elasticsearch

1 Answer 1

Working Spring example

19 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Working Spring example

19 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related