due to a particular document process production I have a bunch of documents with malformed words, having spaces within them. These could be important words to search for and for the moment I don't have the possibility to obtain another format of documents so, I want to know if there is any way to index the documents and find them anyway using correctly formed words in query time. For example I could have the word 'e ng i ne er' in indexed document and I want to find it typing 'engineer'. Do you know ways of achieving that task in elasticsearch?
1 Answer
I would try to start from NGram tokenizer. Which tokenize only numbers and letters, so even using spaces it will be able to find a match.
1 Comment
panchtox
Thanks Volodymyr! I went through the shingles path, mostly using this example with some modifications. But I understand that it is essentially a variation of your proposal.