Elasticsearch - Add one node to a running cluster

Question

I have Elasticsearch cluster ( it's not a real cluster cause i have only 1 node ) The cluster have 1.8 TB of data something like 350 indexes .

I would like to add new node to the cluster BUT i don't want to run replication for all the data . I want to split my shards into 2 nodes ( Each node will be with 1 shard) .

for each index i have 2 shards 0 & 1 and i would like to split my data . This is possible ? how this will effect Kibana performance ?

Thanks a lot

Amit

pickypg · Accepted Answer · 2015-07-12 17:56:32Z

1

for each index i have 2 shards 0 & 1 and i would like to split my data . This is possible ?

When you add your second node to your cluster, then your data will be automatically "rebalanced" and your data will be replicated.

Presumably, if you were to run

$ curl -XGET localhost:9200/_cluster/health?pretty

then you would see that your current cluster health is probably yellow because there is no replication taking place. You can tell because you presumably have an equal number of assigned primary shards as you have unassigned shards (unallocated replicas).

What will happen when you start the second node in the cluster? It will immediately begin to copy shards from the original node. Once complete, the data will exist on both nodes and this is what you want to happen. As you scale further by eventually adding a third node, you will actually spread the shards across the cluster in a less predictable way that does not result in a 1:1 relationship between the nodes. It's only once you add a third node that you can reasonably avoid copying all of the shard data to every node.

Other considerations:

Be sure to set discovery.zen.minimum_master_nodes to 2. This should always be set to M / 2 + 1 using integer division (truncated division) where M is the number of master eligible nodes in your cluster. If you don't set this setting, then you will eventually cause yourself data loss.
You want replication because it gives you higher availability in the event of hardware failure on either node. Due to the above setting with a two node cluster, your cluster would be readonly until you added a second node or the setting was unset, but at least the data would still exist.

how this will effect Kibana performance ?

It's hard to say whether this will really improve performance, but it most likely will simply by spreading the workload across two machines.

edited Jul 12, 2015 at 17:56

answered Jul 12, 2015 at 17:21

pickypg

22.4k5 gold badges75 silver badges85 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Amit Daniel Over a year ago

my cluster is now green and not yellow . i don't want to replicate my data ... i just want to split the shards i want to split the size of my data into 2 nodes . now i have number.of.replicas = 0 and number.of.shards=2 that's why i have green status in my cluster

pickypg Over a year ago

If it's green with one node, then you have no replicas. As a result, it will naturally rebalance itself and split the shards between the two nodes.

Amit Daniel Over a year ago

Thanks @pickypg so i can leave the replica=0 and when i'll add new node it's will only split my shards ?

pickypg Over a year ago

@AmitDaniel Correct. I honestly don't recommend it without replicas (and I think you'd see a read level performance boost with the replicas), but it will do what you want without them.

Collectives™ on Stack Overflow

Elasticsearch - Add one node to a running cluster

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related