5

Starting from one host running Cassandra, I am trying to add a new node and form a cluster.

I update the seeds list on both hosts and after restarting both nodes, I do nodetool status and see both nodes forming a cluster. However, I am seeing some data loss issue. I am not seeing all the data that I added to a column family before I added the new node.

Steps to reproduce:

  1. Start a node with following settings in cassandra.yaml

    • initial_token:
    • num_tokens:256
    • seed_list: host1
  2. Create a keyspace and a column family and enter some data

  3. Start another node, exact same settings and host1 with the following settings changes on both - seeds: host1, host2
  4. When I log in to cal from host2, I do not see all data.
6
  • Did you update the replication factor on your keyspace? Commented Apr 3, 2014 at 20:05
  • No, the replication factor is still set to 1. What is a good practice, should I set the replication factor to the number of nodes in my cluster ? Commented Apr 3, 2014 at 20:32
  • Updating the replication factor does not help. Still seeing the issue. Commented Apr 4, 2014 at 1:09
  • 2
    The idea with the replication factor, is that you could lose a machine and still have all of your data. With 2 servers and a replication factor of 1, losing a machine means losing half of your data. So with 2 servers it makes sense to go with a replication factor of 2. But if you went to 3 nodes, staying with a RF of 2 would still allow you to get 100% of your data if you lost a node. Commented Apr 4, 2014 at 12:14
  • 1
    I am having this exact same problem. Did you determine a solution? Commented Jun 19, 2014 at 22:49

2 Answers 2

1

Running:

nodetool cleanup
nodetool repair
nodetool rebuild

should solve the issue.

Sign up to request clarification or add additional context in comments.

5 Comments

Tried this, still see the same issue. Anything else that I could try.
With replication factor 1, it will be written only on 1 node.
Right, but doing a select * should return data from all the hosts right ? I am not sure why select * is not returning the entire data.
@Nitin, you may need to use consistency of QUORUM or ALL if you are not already.
I tried a repair, but somehow it blocks forever (or at least what looks like forever) without doing anything (at least seemingly, since the process does nothing and no data gets transferred...) Would there be a reason for the repair not to work in that situation?
1

Will suggest you to run a nodetool cleanup in both the nodes so that keys get distributed.

1 Comment

Does not work. Surprisingly when I decommission the newly added node, I can see all the data again from CQL in old host.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.