1

I'm trying to write an ElasticSearch query that allows for filtering the results set. The application provides a filter for job titles and also an exclusion filter for the very same job titles. So for example, in the data set bellow, I want to filter for Engineer, but also exclude Software Engineer. The problem is that now the query also excludes Principal Software Engineer and it shoudn't.

Here's the data I'm using:

{
  "data": [
    {
      "email": "[email protected]",
      "job_title": "Industrial Electrical Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Chief Revenue Officer"
    },
    {
      "email": "[email protected]",
      "job_title": "Principal Software Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Software Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Design Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Software Designer"
    },
    {
      "email": "[email protected]",
      "job_title": "Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Mechanical Design Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Electrical Engineer"
    },
    {
      "email": "[email protected]",
      "job_title": "Chief Executive Officer"
    }
  ]
}

And here is the ElasticSearch query:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "term": {
            "user_id": 1
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "match": {
                  "job_title": "Software Engineer"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

2 Answers 2

2

Assuming that job_title is of text type. Elasticsearch uses a standard analyzer for the text type field if no analyzer is specified. This will break "Software Engineer" into

{
  "tokens": [
    {
      "token": "software",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "engineer",
      "start_offset": 9,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

So when querying with must_not and match query for "Software Engineer", it will not include any of the results that include either software or engineer


If you have not explicitly defined any mapping then you need to add .keyword to the job_title field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after job_title field).

Modify your query as

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "job_title.keyword": "Software Engineer"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Update 1:

If you are using elasticsearch version 7.10 or above, and you want to make the search case insensitive as well as search for the exact term, then you can use the case_insensitive param with the term query.

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "job_title": {
                    "query": "Engineer",
                    "operator": "and"
                  }
                }
              }
            ]
          }
        }
      ],
      "filter": [
        {
          "bool": {
            "must_not": [
              {
                "term": {
                  "job_title.keyword": {
                    "value": "software engineer",
                    "case_insensitive": true
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Otherwise, if you are using a version below 7.10, then you need to modify your index mapping as shown below, and then reindex the data

{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "job_title": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "normalizer": "my_normalizer"
          }
        }
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

6 Comments

Thank you for your response. I did try this version with .keyword but wasn't receiving any results. After your post I tried again but this time wrote Software Engineer instead of software engineer like I was doing before and it worked.
@Cosmin glad this worked for you :-) Can you please accept the answer as well :-)
@Cosmin it didn't work with small case letter as keyword analyzer looks for exact match and not the analyzed text as that of standard analyzer
Can this be adjusted somehow to be case insensitive? I'm sure users will always type in lowercase.
@Cosmin yes you can do that, but for that, you need to change your index mapping. Can you please tell which version of elasticsearch are you using ?
|
0

You can use match phrase in your 'must_not' clause to exclude only the exact phrase 'Software Engineer'.

2 Comments

I tried that already. The problem with match_phrase is that if I want to exclude Software Engineer it also excludes Principal Software Engineer.
I think the behavior you describe should happen only for 'phrase_prefix' @Cosmin

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.