I have a 3 node cluster with 1 seed and nodes in different zones. All running in GCE with GoogleCLoudSnitch.
I wanted to change hardware on each node so I started with adding a new seed in a different region which joined perfectly to the cluster. Then I started with "nodetool decommission" and when done I removed the the node when it is down and "nodetool status" states it's not in the cluster. I did this for all nodes and lastly I did it on the "extra" seed in the different region just to remove it to get back to a 3 node cluster.
We lost data! What can possibly be the problem? I saw a commando, "nodetool rebuild", which I ran and actually got some data back. "nodetool cleanup" didn't help either. Should I have run "nodetool flush" prior to "decommission"?
At the time of running "decommission" most keyspaces had ..
{'class' : 'NetworkTopologyStrategy', 'europe-west1' : 2}"
Should I first altered key spaces to include the new region/datacenter, which would be "'europe-west3' : 1" since only one node exist in that datacenter? I also noted that some keyspaces in the cluster had by mistake ..
{ 'class' : 'SimpleStrategy', 'replication_factor' : 1 }
Could this have caused the loss of data? It seems that it was in the "SimpleStrategy keyspaces" the data was lost.
(Disclaimer: I'm a ScyllaDB employee)
Did you 1st add new nodes to replace the ones you are decommissioning and configured the keyspace replication strategy accordingly? (you only mentioned the new seed node in your description, you did not mention if you did it for the other nodes).
Your data loss can very well be a result of the following:
Not altering the keyspaces to include the new region/zone with the proper replication strategy and replication factor.
Keyspaces that were configured with simple strategy (no netwrok aware) replication policy and replication factor 1. This means that the data was stored only in 1 node, and once that node went down and decommissioned, you basically lost the data.
Did you by any chance took snapshots and stored them outside your cluster? If you did you could try and restore them.
I would highly recommend reviewing these procedures for better understanding and the proper way to perform the procedure you intended to perform:
http://docs.scylladb.com/procedures/add_dc_to_exist_dc/
http://docs.scylladb.com/procedures/replace_running_node/
Related
I see some data inconsistency after moving data to a new cluster.
Old cluster has 9 nodes in total and each has got 2+ TB of data on it.
New cluster has same set of nodes as old and configuration is same.
Here is what I've performed in order:
nodetool snapshot.
Copied that snapshot to destination
Created a new Keyspace on Destination Cluster.
Used sstableloader utility to load.
Restarted all nodes.
After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status).
Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.
I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.
Did i miss any important steps to consider before and after?
Is there any procedure to find out the root cause of this?
I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.
Highly Appreciate your help!
The fact that the results eventually becomes consistent indicates that the replicas are out-of-sync.
You can verify this by reviewing the logs around the time that you were loading data, particularly for dropped mutations. You can also check the output of nodetool netstats. If you're seeing blocking read repairs, that's another confirmation that the replicas are out-of-sync.
If you still have other partitions you can test, enable TRACING ON in cqlsh when you query with CONSISTENCY ALL. You will see if there are digest mismatches in the trace output which should also trigger read repairs. Cheers!
[EDIT] Based on your comments below, it sounds like you possibly did not load the snapshots from ALL the nodes in the source cluster with sstableloader. If you've missed loading SSTables to the target cluster, then that would explain why data is missing.
I have 3 cassandra nodes, when I execute a query, 2 nodes are giving same response but 1 node is giving different response
Suppose I executed following query
select * from employee;
Node1 and Node2 are giving 2 rows but Node3 is giving 0 rows(empty response)
How to solve this issue
1.You are not using Network topology.
2.Your replication factor is 2.
Simple strategy : Use only for a single datacenter and one rack. SimpleStrategy places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology (rack or datacenter location).
Go to this link :
https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html
I did the following steps, then problem was solved and now the data is in sync in all the 3 nodes
run the command nodetool rebuild on the instances and also
update 'replication_factor': '2' to 'replication_factor': '3'
I have the following datacenter-aware configuration:
Primary Datacenter: 3 node cluster, RF=3
Data size is more than 100GB per node
I would lite to add new data center (Secondary Datacenter: 3 node cluster, RF=3)
I know how do that.
But the problem is: How sync data from primary to secondary quickly?
I tried "nodetool repair" (with various keys) and "nodetool rebuild" but it takes much time near 10 hours.
I used cassandra 2.1.15 version
nodetool rebuild is usually the fastest way to sync new nodes.
To speed it up you could try a couple things:
If you have a lot of network bandwidth between the data centers, try increasing the cassandra.yaml parameter inter_dc_stream_throughput_outbound_megabits_per_sec. This defaults to 200 Mbs, so you could try a higher value.
You could also use a smaller replication factor than 3 in the new data center, for example start with 1 to get it up and running as quickly as possible, then later alter the keyspace to a higher value and use repair to create the extra replicas.
I have 3 nodes in cluster
Node1 = 127.0.0.1:9160
Node2 = 127.0.0.2:9161
Node3 = 127.0.0.3:9162
I want to use only one node(node1) for insertion. Other two nodes should be used for fault tolerance on writing millions of records. i.e. when node1 is down either node2 or node3 should take care of writing.For that I formed a cluster with replication factor of 2 and added seed nodes properly in cassandra.yalm file. It is working fine. But due to partition whenever I write the data to the node 1, rows are getting scattered across all the node in the cluster. So is there any way to use the nodes for only replication in the cluster?...Or is there any way to disable the partitioning?...
thanks in advance..
No. Cassandra is a fully distributed system.
What are you trying to achieve here? We have a 6 node cluster with RF=3 and since PlayOrm fixed the config bug they had in astyanax, even if we start getting one slow node, it automatically starts going to the other nodes to keep the system fast. Why would you want to avoid great features like that???? IF your primary node gets slow you would be screwed in your situation.
If you describe your use-case better, we might be able to give you better ideas.
I have a Cassandra cluster with 2 nodes. I am using NetworkTopologyStrategy
I was trying to increase the replication factor of keyspace in Cassandra to 2. I did the following steps:
UPDATE KEYSPACE demo WITH strategy_options = {DC1:2,DC2:2}; on both the nodes
Then I ran the nodetool repair on both the nodes
Then I ran my Hector code to count the number of rows and columns in the database.
I get the following error: Unavailable Exception
Also when I run the command
./nodetool –h ip_address ring
I found that both nodes ownership is 0 %. Please tell me how should I fix that.
You mention "both nodes", which implies that you have two total nodes rather than two data centers as would be suggested by your strategy options. Specifying {DC1:2,DC2:2} would require a minimum of four nodes (two in each DC to satisfy the replication factor), although this would not be advised since essentially all your nodes would be points of failure.
A minimal Cassandra cluster should have at least three nodes, in which case a RF of two would allow one node to go down without bringing down the system. It sounds like you have a single cluster (rather than two data centers), so what you really need is one more node (3 total), RF=2, using the SimpleStrategy instead of NetworkTopologyStrategy.