ElasticSearch - Unable to filter on an array of strings

Question

I have the following model class:

public class NewsItem
{
   public String Language  { get; set; }
   public DateTime DateUpdated  { get; set; }
   public List<String> Tags { get; set; }
}

I index it with NEST using the automapping, resulting in the mapping below:

{
  "search": {
    "mappings": {
      "news": {
        "properties": {
          "dateUpdated": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "language": {
            "type": "string"
          },
          "tags": {
            "type": "string"
          },
        }
      }
    }
  }
}

I then run a query on language which works fine:

{
  "query": {
    "constant_score": {
      "filter": [
        {
          "terms": {
            "language": [
              "en"
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

But running the same query on the tags property doesn't work. Is there any special tricks to query an array field? I read the docs again and again and I don't understand why this query gives no results:

{
  "query": {
    "constant_score": {
      "filter": [
        {
          "terms": {
            "tags": [
              "Hillary"
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

The document returned from another query:

{
  "_index": "search",
  "_type": "news",
  "_score": 0.12265198,
  "_source": {
    "tags": [
      "Hillary"
    ],
    "language": "en",
    "dateUpdated": "2016-11-07T15:41:00Z"
  }
}

Val · Accepted Answer · 2016-11-09 10:02:18Z

2

Your tags field is analyzed, hence Hillary has been indexed to hillary. So you have two ways out:

A. Use a match query instead (since terms query does not analyze the token

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {              <--- use match here
            "tags": "Hillary"
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

B. Keep the terms query but lowercase the token:

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "tags": [
              "hillary"           <--- lowercase here
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

answered Nov 9, 2016 at 10:02

Val

218k14 gold badges377 silver badges384 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Christer Nordvik Over a year ago

I rewrote this before posting it here, the actual tag was more like "election-2016". So you put me on the right track, searching for "2016" worked so I need it to stop analyzing this tag field to prevent this behavior as I only want 100% matches like "election-2016". Thanks!

Val Over a year ago

Then yes, in that case, you should make your field not_analyzed instead and use a terms query.

rajat · Accepted Answer · 2016-11-09 10:05:06Z

1

Elasticsearch by default runs an analyzer on all strings but Terms filter on other hand computer exact match. So this implies that ES is storing 'Hillary' as 'hillary' while you are querying for 'Hillary'. So, there are 2 ways to fix this. Either you use a match query instead of terms query or you don't automap and rather create an index and analyze the tags field as you want. You can also query 'hillary' but this would be a solution for this one case because if tag was something like 'us elections' us and elections both will be stored separately.

answered Nov 9, 2016 at 10:05

rajat

8941 gold badge9 silver badges24 bronze badges

Collectives™ on Stack Overflow

ElasticSearch - Unable to filter on an array of strings

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related