matching multiple terms using match_phrase - Elasticsearch

Question

Trying to fetch two documents that fit on the params searched, searching by each document separately works fine.

The query:


{
   "query":{
      "bool":{
         "should":[
            {
               "match_phrase":{
                  "email":"elpaso"
               }
            },
            {
               "match_phrase":{
                  "email":"walker"
               }
            }
         ]
      }
   }
}

Im expecting to retrieve both documents that have these words in their email address field, but the query is only returning the first one elpaso

Is this an issue related to index mapping? I'm using type text for this field.

Any concept I am missing?

Index mapping:

{

    "mappings": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "name":{
            "type": "text"
          },
          "email":{
            "type" : "text"
          }
        }
      }
  }

Sample data:

{
   "id":"4a43f351-7b62-42f2-9b32-9832465d271f",
   "name":"Walker, Gary (Mr.) .",
   "email":"[email protected]"
}


{
   "id":"1fc18c05-da40-4607-a901-3d78c523cea6",
   "name":"Texas Chiropractic Association P.A.C.",
   "email":"[email protected]"
}

{
   "id":"9a2323f4-e008-45f0-9f7f-11a1f4439042",
   "name":"El Paso Energy Corp. PAC",
   "email":"[email protected]"
}

I also noticed that if I use elpaso and txchiro instead of walker the query works as expected!

noticed that the issue happens, when I use only parts of the field. If i search by the exact entire email address, everything works fine.

is this expected from match_phrase?

Bhavya · Accepted Answer · 2021-04-01 16:14:18Z

1

You are not getting any result from walker because elasticsearch uses a standard analyzer if no analyzer is specified which will tokenize [email protected] as

GET /_analyze
{
  "analyzer" : "standard",
  "text" : "[email protected]"
}

The following token will be generated

{
  "tokens": [
    {
      "token": "walkergrym",
      "start_offset": 0,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "mail.com",
      "start_offset": 11,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Since there is no token for walker you are not getting "[email protected]" in your search result.

Whereas for "[email protected]", token generated are txchiro and mail.com and for "[email protected]" tokens are elpaso and mail.com

You can use the edge_ngram tokenizer, to achieve your required result

Adding a working example with index data, mapping, search query, and search result

Index Mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 6,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "my_analyzer"
      },
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

Search Query:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "email": "elpaso"
          }
        },
        {
          "match": {
            "email": "walker"
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "66907434",
        "_type": "_doc",
        "_id": "1",
        "_score": 3.9233165,
        "_source": {
          "id": "4a43f351-7b62-42f2-9b32-9832465d271f",
          "name": "Walker, Gary (Mr.) .",
          "email": "[email protected]"
        }
      },
      {
        "_index": "66907434",
        "_type": "_doc",
        "_id": "3",
        "_score": 3.9233165,
        "_source": {
          "id": "9a2323f4-e008-45f0-9f7f-11a1f4439042",
          "name": "El Paso Energy Corp. PAC",
          "email": "[email protected]"
        }
      }
    ]

edited Apr 1, 2021 at 16:14

answered Apr 1, 2021 at 15:34

Bhavya

16.2k3 gold badges23 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

queroga_vqz Over a year ago

yeah, its weird. I've added my index mapping to the question, think everything is all right with it?

Bhavya Over a year ago

@queroga_vqz there is no issue with index mapping. Can you please share your sample index data ?

queroga_vqz Over a year ago

thank you for your time @ESCoder, added the sample data, and also another working case

Bhavya Over a year ago

@queroga_vqz please go through the updated answer, and let me know if this resolves your issue ?

queroga_vqz Over a year ago

Perfect. Didnt know about how ES worked on top of standard tokenizer. Thanks a lot for the lesson!

Collectives™ on Stack Overflow

matching multiple terms using match_phrase - Elasticsearch

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related