I have a 3-node cassandra cluster in aws cloud which is running perfectly.
The traffic is low and I want to scale it down to two or single node due to economic constraints.
What could be the right practice here? Can I pause the other 2 nodes?
Is some data loss is expected?
If the cassandra nodes are available and you decommission them "gracefully", no data loss occurs. The reason is because when you decommission nodes token/data re-distribution occurs (so the process takes some time). If you "hard force" a node out (or if it becomes unavailable for any reason) and your RF is not configured to have data redundancy (e.g. set to 1), you will lose data. So try to remove the node "gracefully" (nodetool decommission (not sure how that's done in AWS)) and when you're done, be sure your RF settings per keyspace are correct (i.e. don't have RF > nodes and be sure it's > 1 if you want redundancy).
-Jim
Related
Does Cassandra maintains RF when a node goes down. For e.g. if number of nodes is 5 and RF is 2 then when a single node goes down, does the remaining replica copies it's data to some other node to maintain the RF of 2?
In the Datastax's documentation, it's mentioned that "If a node fails, the load is spread evenly across other nodes in the cluster". Does this mean that migration of data happens when a node goes down? Is this a feature available only in Datastax's Cassandra and not Apache Cassandra?
No, instead a "hint" will be stored in the coordinator node and will get eventually written to the node which owns the token range when the node comes back up - the write will succeed depending on your consistency level. So in the above example the write will succeed if you are writing with consistency level as ONE.
If the node is down only for short period - the node will receive the data back from hints from other nodes when it comes back. But if you decommission a node, then the data gets replicated to other nodes and the other nodes will have the new token ranges (same case when a node is added to the cluster as well).
Over time the data in one replica can become inconsistent with others and the repair process helps Cassandra in fixing them - https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesTOC.html
This is applicable in Apache Cassandra as well.
I have a cluster 4 cassandra nodes. I have recently added a new node but data processing is taking too long. Is there a way to make this process faster ? output of nodetool
Less data per node. Your screenshot shows 80TB per node, which is insanely high.
The recommendation is 1TB per node, 2TB at most. The logic behind this is bootstrap times get too high (as you have noticed). A good Cassandra ring should be able to rapidly recover from node failure. What happens if other nodes fail while the first one is rebuilding?
Keep in mind that the typical model for Cassandra is lots of smaller nodes, in contrast to SQL where you would have a few really powerful servers. (Scale out vs scale up)
So, I would fix the problem by growing your cluster to have 10X - 20X the number of nodes.
https://groups.google.com/forum/m/#!topic/nosql-databases/FpcSJcN9Opw
Now presently we are having a Dc with 3 nodes and with a replication of 3, i am planning to decommission a node do i need to decrease my replication to 2 or just decommissioning node will adjust the data among the two nodes with a replication of 3??
Decommissioning a node will not cause your Cassandra cluster to break necessarily, but it will make it so that a few things will stop working.
A few things that will happen if you decommission the node but don't adjust the replication factor:
First, nothing about your replication factor will be changed just because you decommission a node. To do otherwise would cause chaos.
Queries (both read and write) that attempt to use ConsistencyLevel.ALL will fail, because they will not be able to get 3 machines to participate
Queries with ConsistencyLevel.QUORUM will be less available, because BOTH remaining machines will need to respond to queries to meet quorum.
Because you have 3 machines and a RF of 3, that means that every machine has a complete copy of the data. Decommission the node, update your replication factor, and then run nodetool repair on the remaining two nodes. After you do that, you should be good to go.
My 2 cents: I would suggest you to first change your replication to 2, run a repair on all nodes and then issue "nodetool decommission" from the node you want to decommission. There will be data moving around, but by doing it this way nothing should stop working.
I have a keyspace with replication factor set to 3 but I have only a single node. Will then the disk space be used 3 times the data size? As the replicas are not yet assigned to any other nodes, will cassandra stop creating replicas unless new nodes join the cluster?
No, the disk space used would not be three times the size. The single node would own the entire token range and all writes would be written to that single node once.
What happens with the writes for the other two replicas would depend on if those nodes were previously present in the cluster and are currently down, or if they have never been added to the cluster. If they had never been added, then C* would just skip trying to write to them.
If they had been added but are currently down, and if you have hinted handoffs enabled and are still within the hinted handoff window, then C* will store hints for the down nodes on the single up node.
It depends on the replication strategy you have used . Assuming your queries are working you might have used SimpleStrategy , if you try to write to such a configuration your write should fail as it needs to write to 2 additional replica node before it gives a acknowledgement to the client ,which in case of SimpleStratagy are the next two clockwise nodes in the Ring.
We have Cassandra cluster with single token per node, total 22 nodes, average load per node is 500Gb. It has SimpleStrategy for the main keyspace and SimpleSnitch.
We need to migrate all data to the new datacenter and shutdown the old one without a downtime. New cluster has 28 nodes. I want to have vnodes on it.
I'm thinking of the following process:
Migrate the old cluster to vnodes
Setup the new cluster with vnodes
Add nodes from the new cluster to the old one and wait until it balances everything
Switch clients to the new cluster
Decommission nodes from the old cluster one by one
But there are a lot of technical details. First of all, should I shuffle the old cluster after vnodes migration? Then, what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch? I want to switch to NetworkTopologyStrategy because new cluster has 2 different racks with separate power/network switches.
should I shuffle the old cluster after vnodes migration?
You don't need to. If you go from one token per node to 256 (the default), each node will split its range into 256 adjacent, equally sized ranges. This doesn't affect where data lives. But it means that when you bootstrap in a new node in the new DC it will remain balanced throughout the process.
what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch?
The difficulty is that switching replication strategy is in general not safe since data would need to be moved around the cluster. NetworkToplogyStrategy (NTS) will place data on different nodes if you tell it nodes are in different racks. For this reason, you should move to NTS before adding the new nodes.
Here is a method to do this, after you have upgraded the old cluster to vnodes (your step 1 above):
1a. List all existing nodes as being in DC0 in the properties file. List the new nodes as being in DC1 and their correct racks.
1b. Change the replication strategy to NTS with options DC0:3 (or whatever your current replication factor is) and DC1:0.
Then to add the new nodes, follow the process here: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-a-data-center-to-a-cluster. Remember to set the number of tokens to 256 since it will be 1 by default.
In step 5, you should set the replication factor for DC0 to be 0 i.e. change replication options to DC0:0, DC1:3. Now those nodes aren't being used so decommission won't stream any data but you should still do it rather than powering them off so they are removed from the ring.
Note one risk is that writes made at a low consistency level to the old nodes could get lost. To guard against this, you could write at CL.LOCAL_QUORUM after you switch to the new DC. There is still a small window where writes could get lost (between steps 3 and 4). If it is important, you can run repair before decommissioning the old nodes to guarantee no losses or write at a high consistency level.
If you are trying to migrate to a new cluster with vnodes, wouldn't you need to change the Partitioner. The documents say that it isn't a good idea to migrate data between different Partitioners.