0

due to a particular document process production I have a bunch of documents with malformed words, having spaces within them. These could be important words to search for and for the moment I don't have the possibility to obtain another format of documents so, I want to know if there is any way to index the documents and find them anyway using correctly formed words in query time. For example I could have the word 'e ng i ne er' in indexed document and I want to find it typing 'engineer'. Do you know ways of achieving that task in elasticsearch?

1 Answer 1

1

I would try to start from NGram tokenizer. Which tokenize only numbers and letters, so even using spaces it will be able to find a match.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Volodymyr! I went through the shingles path, mostly using this example with some modifications. But I understand that it is essentially a variation of your proposal.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.