Recently we discovered that, since we aren't sanitizing search terms as they come into our system, we would get occasional parsing exceptions in Elasticsearch when special characters such as / (forward slash) , etc. were used w/ "query_string". So, we decided to switch to "simple_query_string". However, we discovered that the same analyzers do not appear to be used for each. I reviewed When Analyzers Are Used to see if it indicated there would be a difference between simple and regular query string but it did not, so I'm wondering if this is a bug. For example:
"query_string": { "query": "sales", "fields": [ "title" ] }
will use the analyzer for the "title" field which is our "en_analyzer" (see definition below) and properly stem "sales" to "sale" and find the matching documents. Simply changing "query_string" to "simple_query_string" will not. We have to search for "sale" or add an analyzer to the query, like so:
"simple_query_string": { "query": "sales", "fields": [ "title" ], "analyzer": "en_analyzer" }
Of course, not all our fields are analyzed the same way and so the default behavior described in the documentation I referenced above makes perfect sense and that's what we desire. Is this a bug or does "simple_query_string" just not behave the same way w/ respect to field analysis during a query? We are using ES 1.7.2.
The relevant parts of our definition for "en_analyzer" are:
"en_analyzer": { "type": "custom", "tokenizer": "icu_tokenizer", "filter": [ "icu_normalizer", "en_stop_filter", "en_stem_filter", "icu_folding", "shingle_filter" ], "char_filter": [ "html_strip" ] }
with:
"en_stop_filter": { "type": "stop", "stopwords": [ "_english_" ] }, "en_stem_filter": { "type": "stemmer", "name": "minimal_english" }