0

When I execute a simple search query on an email it does not return anything to me, unless I remove what follows the "@", why?

I wish to make queries on the e-mails in fuzzy and autocompletion.

ELASTICSEARCH INFOS:

{
  "name" : "ZZZ",
  "cluster_name" : "YYY",
  "cluster_uuid" : "XXX",
  "version" : {
    "number" : "6.5.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "WWW",
    "build_date" : "2018-11-29T23:58:20.891072Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

MAPPING :

PUT users
{
  "mappings":
  {
    "_doc": { "properties": { "mail": { "type": "text" } } }
  }
}

ALL DATAS :

[
    { "mail": "[email protected]" },
    { "mail": "[email protected]" }
]

QUERY WORKS :

Term request works but mail == "[email protected]" and not "firstname.lastname"...

QUERY :
GET users/_search
{ "query": { "term": { "mail": "firstname.lastname" } }}

RETURN :
{
  "took": 7,
  "timed_out": false,
  "_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
  "hits": {
    "total": 1,
    "max_score": 4.336203,
    "hits": [
      {
        "_index": "users",
        "_type": "_doc",
        "_id": "H1dQ4WgBypYasGfnnXXI",
        "_score": 4.336203,
        "_source": {
          "mail": "[email protected]"
        }
      }
    ]
  }
}

QUERY NOT WORKS :

QUERY :
GET users/_search
{ "query": { "term": { "mail": "[email protected]" } }}

RETURN :
{
  "took": 0,
  "timed_out": false,
  "_shards": { "total": 6, "successful": 6, "skipped": 0, "failed": 0 },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

SOLUTION :

Change mapping (reindex after mapping changes) with uax_url_email analyzer for mails.

PUT users
{
  "settings":
  {
    "index": { "analysis": { "analyzer": { "mail": { "tokenizer":"uax_url_email" } } } }
  }
  "mappings":
  {
    "_doc": { "properties": { "mail": { "type": "text", "analyzer":"mail" } } }
  }
}

1 Answer 1

1

If you use no other tokenizer for your indexed text field, it will use the standard tokenizer, which tokenizes on the @ symbol [I don't have a source on this, but there's proof below].

If you use a term query rather than a match query then that exact term will be searched for in the inverted index elasticsearch match vs term query.

Your inverted index looks like this

GET users/_analyze
{
  "text": "[email protected]"
}

{
  "tokens": [
    {
      "token": "firstname.lastname",
      "start_offset": 0,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "company.com",
      "start_offset": 19,
      "end_offset": 30,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

To resolve this you could specify your own analyzer for the mail field or you could use the match query, which will analyze your searched text just like how it analyzes the indexed text.

GET users/_search
{
  "query": {
    "match": {
      "mail": "[email protected]"
    }
  }
}
Sign up to request clarification or add additional context in comments.

3 Comments

I can not use match because the mail must be exactly the same, but I will look for more information about tokenizer. thank you very much.
You've probably figured it out already but the uax_url_email tokenizer might be what you want elastic.co/guide/en/elasticsearch/reference/current/…
Yes it is, I will add the solution to my post, thank you again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.