1

I have Elasticsearch cluster ( it's not a real cluster cause i have only 1 node ) The cluster have 1.8 TB of data something like 350 indexes .

I would like to add new node to the cluster BUT i don't want to run replication for all the data . I want to split my shards into 2 nodes ( Each node will be with 1 shard) .

for each index i have 2 shards 0 & 1 and i would like to split my data . This is possible ? how this will effect Kibana performance ?

Thanks a lot

Amit

1 Answer 1

1

for each index i have 2 shards 0 & 1 and i would like to split my data . This is possible ?

When you add your second node to your cluster, then your data will be automatically "rebalanced" and your data will be replicated.

Presumably, if you were to run

$ curl -XGET localhost:9200/_cluster/health?pretty

then you would see that your current cluster health is probably yellow because there is no replication taking place. You can tell because you presumably have an equal number of assigned primary shards as you have unassigned shards (unallocated replicas).

What will happen when you start the second node in the cluster? It will immediately begin to copy shards from the original node. Once complete, the data will exist on both nodes and this is what you want to happen. As you scale further by eventually adding a third node, you will actually spread the shards across the cluster in a less predictable way that does not result in a 1:1 relationship between the nodes. It's only once you add a third node that you can reasonably avoid copying all of the shard data to every node.

Other considerations:

  • Be sure to set discovery.zen.minimum_master_nodes to 2. This should always be set to M / 2 + 1 using integer division (truncated division) where M is the number of master eligible nodes in your cluster. If you don't set this setting, then you will eventually cause yourself data loss.
  • You want replication because it gives you higher availability in the event of hardware failure on either node. Due to the above setting with a two node cluster, your cluster would be readonly until you added a second node or the setting was unset, but at least the data would still exist.

how this will effect Kibana performance ?

It's hard to say whether this will really improve performance, but it most likely will simply by spreading the workload across two machines.

Sign up to request clarification or add additional context in comments.

4 Comments

my cluster is now green and not yellow . i don't want to replicate my data ... i just want to split the shards i want to split the size of my data into 2 nodes . now i have number.of.replicas = 0 and number.of.shards=2 that's why i have green status in my cluster
If it's green with one node, then you have no replicas. As a result, it will naturally rebalance itself and split the shards between the two nodes.
Thanks @pickypg so i can leave the replica=0 and when i'll add new node it's will only split my shards ?
@AmitDaniel Correct. I honestly don't recommend it without replicas (and I think you'd see a read level performance boost with the replicas), but it will do what you want without them.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.