0

So I have an ES document upon which structure I have wno control, so I cannot change the mappings. I have a location field (mapping type text) that is "analyzed" by ES. My documents look like this:

[
 {
  title: "Something that happened in the UK",
  location: "United States, London"
 },
 {
  title: "Something that happened in the US",
  location: "United Kingdom, London"
 }
]

I am trying to write a query that would only filter the location field and return results that are either united states or united kingdom but not both.

{
 "query":
 {
   "match": {
    "location": {  "query": "united statess" }
 }
}

This does not work because the word united is present in both location names. The field is unfortunately analyzed and it will return both results. I have tried adding the "operator" : "and" to the "match" query but that does not return any results. What am I missing? Is there a way to achieve this with the "match" query?

2
  • can you please share your index mapping, and based on the example given above what is your expected search result ? Commented Oct 20, 2021 at 2:02
  • So the issue is, this is a private API of a company partner and I only know that the location field is of type text and its analyzed. What I want to achieve is to get an ES query that would return only results from the united states or united kingdom. But using the match query for this does not work because the word united is present in both location fields. Commented Oct 20, 2021 at 7:37

2 Answers 2

0

I understand that you are trying to create a xor filter on the location field. There is no xor shortcut in the boolean query of Elasticsearch but xor can be constructed with OR, AND and NOT operators.

must -> and
should -> or
must_not -> not

So there are two ways to construct a xor filter with these operators (pseudocode):

must( (should(uk, us), must_not( must(uk, us))  

should( must( uk, must_not(us)), must(must_not(uk), us))

There is also the more readable Query_String query which supports boolean syntax.

Below is an example for bool and match query combination and an example for the query string query that both act as exclusive OR, you can test these queries in Kibana dev tools:

PUT /test_xor

PUT /test_xor/_doc/1
{
  "type": "neither uk nor us",
  "location": [
    {
      "title": "Something that happened in germany",
      "location": "Germany, Berlin"
    },
    {
      "title": "Something that happened in the France",
      "location": "France, Paris"
    }
  ]
}

PUT /test_xor/_doc/2
{
  "type": "only us",
  "location": [
    {
      "title": "Something that happened in germany",
      "location": "Germany, Berlin"
    },
     {
      "title": "Something that happened in the US",
      "location": "United States, London"
    }
  ]
}

PUT /test_xor/_doc/3
{
  "type": "only uk",
  "location": [
    {
      "title": "Something that happened in germany",
      "location": "Germany, Berlin"
    },
      {
      "title": "Something that happened in the US",
      "location": "United States, London"
    }
  ]
}



PUT /test_xor/_doc/4
{
  "type": "uk and us",
  "location": [
    {
      "title": "Something that happened in the US",
      "location": "United States, London"
    },
    {
      "title": "Something that happened in the UK",
      "location": "United Kingdom, London"
    }
  ]
}

GET /test_xor/_search
{
  "query": {
    "query_string" : {
        "query": "(\"United States, London\" OR \"United Kingdom, London\") AND NOT (\"United States, London\" AND \"United Kingdom, London\")",
        "fields": ["location.location"]
    }
  }
}

GET /test_xor/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "location.location": {
                    "query": "United States, London",
                    "operator": "and"
                  }
                }
              },
              {
                "match": {
                  "location.location": {
                    "query": " OR \"United Kingdom, London\"",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "bool": {
                  "must": [
                    {
                      "match": {
                        "location.location": {
                          "query": "United States, London",
                          "operator": "and"
                        }
                      }
                    },
                    {
                      "match": {
                        "location.location": {
                          "query": "United Kingdom, London",
                          "operator": "and"
                        }
                      }
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

I'm not sure what you understood exactly, but what I want is to match the exact words that are given to the match query. So if I say {match: {location: "united states"}} I want it to match all documents where the location field contains both united and states words. That does not work if I use match like the example above. So what I want is essentially when I search for united kingdom I want to be sure that no records from united states are returned. No complicated xor, just a match query that matches ALL words provided to it.
Ah ok i understood that because you wrote "return results that are either united states or united kingdom but not both." The not "both" part lead me to understand you want an xor result. For your problem you can just set the operator in the match query to "and" like i did in the example: "match": { "location.location": { "query": "United Kingdom, London", "operator": "and" } }
0

After trying various things I came up with this solution which seems to work. Though for some reason I feel that there is a better way to achieve this:

Answering my own question, would this make sense? 🤔

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "location": "united"
          }
        },
        {
          "match": {
            "location": "states"
          }
        }
      ]
    }
  }
}

[EDIT]: I have actually found a better solution which looks like this:

{
  "query": {
    {
      "match_phrase": {
        "location": "united states"
      }
   }
}

2 Comments

Does this not only return matches with "united states" in the location field? You wouldn't get any with "united kingdom" right? I understood that you want the behavior to be like an XOR.
Additionally I have later found out that this query returns duplicate results in some cases that I have yet to figure out so I am still puzzled by why this happens

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.