3

I'm inserting some companies in the index, where the countries attribute is an array of country codes:

curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

I want to aggregate the companies by country_code (countries attribute), count how many companies are present for each country.

Sadly, even this (the count for AT code) doesn't work:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'

I'm getting:

...

"facets" : {
  "foo" : {
    "_type" : "filter",
    "count" : 0
  }
}

What I'm doing wrong ?

4
  • Is only AT not working? Did you try CN? Commented Oct 1, 2013 at 10:53
  • Now, I tried for CN too, same response as for AT in the facets section Commented Oct 1, 2013 at 10:57
  • 1
    hmm ok, I was thinking it could be due to ES not indexing stopwords (stackoverflow.com/questions/17883936/…). But if CN also does not work, then that cannot be the case. Commented Oct 1, 2013 at 10:58
  • good catch, thanks, but issue still exists... maybe other idea ? Commented Oct 1, 2013 at 11:02

1 Answer 1

5

I think it is because filters are not analyzed. AT is stopword, so it is not indexed. You can check it using _analyze API: http://localhost:9200/test/_analyze?text=AT&field=countries.

You can check non stopword, for example CN, but this is lowercased http://localhost:9200/test/_analyze?text=CN&field=countries. So cn (which is in fact stored in index) doesn't match with CN in you facet filter.

You can try to modify your search to lowercased country abbreviation:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "cn" }
      }
    }
  }
}'

to get

"facets" : {
    "foo" : {
      "_type" : "filter",
      "count" : 15
    }
  }

But I think you should define mapping for countries to "index":"not_analyzed" to avoid this (both stopwords and lowercasing)

# Delete index
#
curl -XDELETE 'http://localhost:9200/test'

# Create with mapping
#
curl -XPUT 'http://localhost:9200/test/' -d '{
  "mappings": {
    "company": {
      "properties": {
        "countries": { "type": "string", "index" : "not_analyzed"  }
      }
    }
  }
}'


# Index documents
#
curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

# Refresh index
#
curl -XPOST 'http://localhost:9200/test/_refresh'

# Search
#
curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'
Sign up to request clarification or add additional context in comments.

3 Comments

Awesome explanation of the issues, thanks a lot. It works like a charm ;)
i have the same problem, I have added index: not_analyzed for the field 'country_code', thought some countries 'at', 'be', etc were excluded from facets. I'm continue checking. By now I just added _ before country code, so it stores _at, _be, etc.
or, @vhyza was right, I just have 2 types - one with provided mapping, and another is autocreated without :not_indexed, my fauld, so yes - index: not_analyzed solves the problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.