19

I've setup Elasticsearch with 1 cluster á 4 nodes. Number of shards per index: 1; Number of replicas per index: 3

When I call a simple query like the following one multiple times I get different results (different total hits and different top 10 documents):

http://localhost:9200/index_name/_search?q=term

Different data on each shard? I like to have all shards up to date. What can I do?

This is the result of /_cluster/health:

{
  "cluster_name" : "secret",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 4,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 24,
  "active_shards" : 96,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0
}

As a temporary solution I rebuild the index through Ruby gem tire: ModelName.rebuild_index

But I need a long-term solution.

9
  • More details to your setup would be nice. Some information missing here is for example number of replicas per index, number of shards per node would also be nice to know. Commented Jun 18, 2014 at 14:09
  • Number of shards per index: 1 Number of replicas per index: 3 Where do I get the number of shards per node from? Cannot see it in my elasticsearch.yml. Commented Jun 18, 2014 at 14:20
  • Sorry, was shard per index. Seeing those I also don't really understand why you have a problem with this to be honest. Commented Jun 18, 2014 at 14:35
  • Can you post the output of cluster health? elasticsearch.org/guide/en/elasticsearch/reference/current/… Commented Jun 18, 2014 at 14:58
  • I added the cluster health output to the description. Commented Jun 19, 2014 at 8:11

3 Answers 3

16

We ran into a similar problem and it turned out to be because Elasticsearch round-robins between different shards when searching. Each shard returns a slightly different _score because of slightly different indexing due to the way ES handles deleted documents in an index. In our case this meant similar results often placed slightly lower or higher in the results order, and, when combined with pagination (using from and size in the search query) it meant the same results were turning up on two separate "pages" or not at all from page to page.

We found an Elasticsearch article on consistent scoring which explains this quite neatly and implemented a preference parameter to ensure that we always get the same scores for a particular search by querying the same shards:

http://localhost:9200/index_name/_search?q=term&preference=blablabla

We also thought about using sorting, but Elasticsearch sorts results with the same scores by an internal Lucene document ID, ensuring that results with the same scores are always returned in the same order.

Sign up to request clarification or add additional context in comments.

Comments

6

This is because you don't have specified sort order and size. So every time you query you get random first 10 records as default size for result set by elasticsearch server is 10.

You can add sorting in following way with curl,

curl -XPOST 'localhost:9200/_search' -d '{
 "query" : {
   ...
  },
   "sort" : [
     {"price" : {"order" : "asc", "mode" : "avg"}}
   ]
}'

Check here for for more info specially from and size with sort which is most mostly used for pagination.

update:

Though default sort is score DESC sometime it not works when records don't have relevant _score, as per http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_sorting.html#_sorting

3 Comments

I should not have different total hits each time even without specifying sort and size. When I don't specify sort then I get the prefered default sort which is "score DESC".
score DESC is correct but trouble is when two records has same score.
check this out elasticsearch.org/guide/en/elasticsearch/guide/current/… when records don't have meaningful score it starts smelling.
1

This question helped me, as the answer says,

One of the possible reasons could be distributed IDF, by default Elastic uses local IDF on each shard, to save some performance which will lead to different idfs across the cluster.

ES doc here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.