1

We have an index running with 241.047 items in it. These items can have any number of subitems, which are indexed as nested documents. The total number of subitems is 381.705.

Both include_in_parent and include_in_root are not set in the mapping, which means that each nested document is indexed as additional documents. This should mean that there will be a total number of 241.047 + 381.705 = 622.752 documents in the index.

When I run the following Curl command to look up the number of documents in the index I get a different number, it's not far off but I'm wondering why it's giving me a different number and it's not returning the number I'm expecting.

  • curl -XGET 'http://localhost:9200/catawiki_development/_status?pretty' returns 622.861

Next to that, when I'm running a Curl command to get the number of root documents I get a different number than if I run a match_all query and ask for the number of documents returned

  • curl -XGET 'http://localhost:9200/elasticsearch_development/_count?pretty' returns 241.156
  • The match_all query returns the correct number of documents, 241.047

How can these difference be explained?

3
  • May I know if the answer helped? Commented Jan 23, 2014 at 19:09
  • Sorry, at the time of your answer I've already stopped developing the elasticsearch based application. I wasn't able to try it out unfortunately so I can't tell you if your answer helped. Commented Jan 31, 2014 at 15:10
  • Fair enough, thanks for getting back to me anyway! Commented Jan 31, 2014 at 16:07

1 Answer 1

2

The path of a count api request is quite different from the path of a normal search request. In fact it is a shortcut that allows to only get the count of the documents matching a query, thats' it. It differs from a search with search_type=count too, which is effectively only the first part of a search: broadcast the search request to all shards, but no reduce/fetch since we only want to return the total number of matching documents. You can also add facets etc. to a search request (when using search_type=count too), which is something that you cannot do with the count api.

That said, I'm not that surprised you see a difference for the above reason, it would be nice to understand exactly what the problem is though. The best would be to be able to reproduce the problem with a small number of documents and open an issue including a curl recreation so that we can have a look at it.

In the meantime, I would suggest to use a search request with search_type=count if you have problems with the count api. That one is guaranteed to return the same number of documents as a normal search, just because it is exactly the same logic.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.