0

Trying to fetch two documents that fit on the params searched, searching by each document separately works fine.

The query:


{
   "query":{
      "bool":{
         "should":[
            {
               "match_phrase":{
                  "email":"elpaso"
               }
            },
            {
               "match_phrase":{
                  "email":"walker"
               }
            }
         ]
      }
   }
}

Im expecting to retrieve both documents that have these words in their email address field, but the query is only returning the first one elpaso

Is this an issue related to index mapping? I'm using type text for this field.

Any concept I am missing?

Index mapping:

{

    "mappings": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "name":{
            "type": "text"
          },
          "email":{
            "type" : "text"
          }
        }
      }
  }

Sample data:

{
   "id":"4a43f351-7b62-42f2-9b32-9832465d271f",
   "name":"Walker, Gary (Mr.) .",
   "email":"[email protected]"
}


{
   "id":"1fc18c05-da40-4607-a901-3d78c523cea6",
   "name":"Texas Chiropractic Association P.A.C.",
   "email":"[email protected]"
}

{
   "id":"9a2323f4-e008-45f0-9f7f-11a1f4439042",
   "name":"El Paso Energy Corp. PAC",
   "email":"[email protected]"
}

I also noticed that if I use elpaso and txchiro instead of walker the query works as expected!

noticed that the issue happens, when I use only parts of the field. If i search by the exact entire email address, everything works fine.

is this expected from match_phrase?

1 Answer 1

1

You are not getting any result from walker because elasticsearch uses a standard analyzer if no analyzer is specified which will tokenize [email protected] as

GET /_analyze
{
  "analyzer" : "standard",
  "text" : "[email protected]"
}

The following token will be generated

{
  "tokens": [
    {
      "token": "walkergrym",
      "start_offset": 0,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "mail.com",
      "start_offset": 11,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

Since there is no token for walker you are not getting "[email protected]" in your search result.

Whereas for "[email protected]", token generated are txchiro and mail.com and for "[email protected]" tokens are elpaso and mail.com

You can use the edge_ngram tokenizer, to achieve your required result

Adding a working example with index data, mapping, search query, and search result

Index Mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 6,
          "token_chars": [
            "letter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "email": {
        "type": "text",
        "analyzer": "my_analyzer"
      },
      "id": {
        "type": "keyword"
      },
      "name": {
        "type": "text"
      }
    }
  }
}

Search Query:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "email": "elpaso"
          }
        },
        {
          "match": {
            "email": "walker"
          }
        }
      ]
    }
  }
}

Search Result:

"hits": [
      {
        "_index": "66907434",
        "_type": "_doc",
        "_id": "1",
        "_score": 3.9233165,
        "_source": {
          "id": "4a43f351-7b62-42f2-9b32-9832465d271f",
          "name": "Walker, Gary (Mr.) .",
          "email": "[email protected]"
        }
      },
      {
        "_index": "66907434",
        "_type": "_doc",
        "_id": "3",
        "_score": 3.9233165,
        "_source": {
          "id": "9a2323f4-e008-45f0-9f7f-11a1f4439042",
          "name": "El Paso Energy Corp. PAC",
          "email": "[email protected]"
        }
      }
    ]
Sign up to request clarification or add additional context in comments.

5 Comments

yeah, its weird. I've added my index mapping to the question, think everything is all right with it?
@queroga_vqz there is no issue with index mapping. Can you please share your sample index data ?
thank you for your time @ESCoder, added the sample data, and also another working case
@queroga_vqz please go through the updated answer, and let me know if this resolves your issue ?
Perfect. Didnt know about how ES worked on top of standard tokenizer. Thanks a lot for the lesson!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.