I have a keyword field that I would like to tokenize (split on commas), but it may also contain values with "+" characters. For example:
query_string.keywords = Living,Music,+concerts+and+live+bands,News,Portland
When creating the index the following does a nice job of splitting the keywords on commas:
{
"settings": {
"number_of_shards": 5,
"analysis": {
"analyzer": {
"happy_tokens": {
"type": "pattern",
"pattern": "([,]+)"
}
}
}
},
"mappings": {
"post" : {
"properties" : {
"query_string.keywords" : {
"type": "string",
"analyzer" : "happy_tokens"
}
}
}
}
}
How can I add a char_filter (see below) to this to change the +'s to spaces or empty strings?
"char_filter": {
"kill_pluses": {
"type": "pattern_replace",
"pattern": "+",
"replace": ""
}
}