1

I have the following model class:

public class NewsItem
{
   public String Language  { get; set; }
   public DateTime DateUpdated  { get; set; }
   public List<String> Tags { get; set; }
}

I index it with NEST using the automapping, resulting in the mapping below:

{
  "search": {
    "mappings": {
      "news": {
        "properties": {
          "dateUpdated": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "language": {
            "type": "string"
          },
          "tags": {
            "type": "string"
          },
        }
      }
    }
  }
}

I then run a query on language which works fine:

{
  "query": {
    "constant_score": {
      "filter": [
        {
          "terms": {
            "language": [
              "en"
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

But running the same query on the tags property doesn't work. Is there any special tricks to query an array field? I read the docs again and again and I don't understand why this query gives no results:

{
  "query": {
    "constant_score": {
      "filter": [
        {
          "terms": {
            "tags": [
              "Hillary"
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

The document returned from another query:

{
  "_index": "search",
  "_type": "news",
  "_score": 0.12265198,
  "_source": {
    "tags": [
      "Hillary"
    ],
    "language": "en",
    "dateUpdated": "2016-11-07T15:41:00Z"
  }
}

2 Answers 2

2

Your tags field is analyzed, hence Hillary has been indexed to hillary. So you have two ways out:

A. Use a match query instead (since terms query does not analyze the token

{
  "query": {
    "bool": {
      "filter": [
        {
          "match": {              <--- use match here
            "tags": "Hillary"
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}

B. Keep the terms query but lowercase the token:

{
  "query": {
    "bool": {
      "filter": [
        {
          "terms": {
            "tags": [
              "hillary"           <--- lowercase here
            ]
          }
        }
      ]
    }
  },
  "sort": {
    "dateUpdated": {
      "order": "desc"
    }
  }
}
Sign up to request clarification or add additional context in comments.

2 Comments

I rewrote this before posting it here, the actual tag was more like "election-2016". So you put me on the right track, searching for "2016" worked so I need it to stop analyzing this tag field to prevent this behavior as I only want 100% matches like "election-2016". Thanks!
Then yes, in that case, you should make your field not_analyzed instead and use a terms query.
1

Elasticsearch by default runs an analyzer on all strings but Terms filter on other hand computer exact match. So this implies that ES is storing 'Hillary' as 'hillary' while you are querying for 'Hillary'. So, there are 2 ways to fix this. Either you use a match query instead of terms query or you don't automap and rather create an index and analyze the tags field as you want. You can also query 'hillary' but this would be a solution for this one case because if tag was something like 'us elections' us and elections both will be stored separately.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.