I'm having trouble framing an address search query in ElasticSearch.
The address is stored in ES with the following structure:
Address {
street,
city,
zipcode
}
And here is a sample query:
GET /adr-address/_search
{
"query": {
"multi_match": {
"query": "mainstreet, houston",
"type": "most_fields",
"fields": [ "street", "city", "zipcode"]
}
}
}
"hits": [
{
"_source": {
"id": "S6v4xyO8UE5NRcWtmMATPQ==",
"street": "Houston 2nd Avenue",
"zipcode": "8032",
"city": "Houston"
}
},
{
"_source": {
"id": "aLgQFrO8zCT8m88lAnYZPQ==",
"street": "Houston 1st Avenue",
"zipcode": "8044",
"city": "Houston"
}
},
{
"_source": {
"id": "aLgQFrO8zCT8m88lAnYZPQ==",
"street": "mainstreet",
"zipcode": "8044",
"city": "Houston"
}
},
The multi match query works fine most of the time, except for the scenario when street contains the city name as well. Elasticsearch assigns higher priority to these results which is totally understandable even though not acceptable.
Here is the _analyze result:
GET /adr-address/_validate/query?explain
{
"query": {
"multi_match": {
"query": "mainstreet, houston",
"type": "most_fields",
"fields": [ "street", "city", "zipcode" ]
}
}
}
{
"valid": true,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"explanations": [
{
"index": "adr-address",
"valid": true,
"explanation": "(zipcode:mainstreet zipcode:houston) (street:mainstreet street:houston) (city:mainstreet city:houston)"
}
]
}
It should be noted that google maps api returns accurate results for the same query.
Assumptions/conditions made until now:
- Tokenizers are: space, comma, numbers etc
- Input term can contain multi word street name, zip code or city in any order
Any suggestion on how I could improve the search reuslts?