Maintaining RF when node fails - cassandra

Does Cassandra maintains RF when a node goes down. For e.g. if number of nodes is 5 and RF is 2 then when a single node goes down, does the remaining replica copies it's data to some other node to maintain the RF of 2?
In the Datastax's documentation, it's mentioned that "If a node fails, the load is spread evenly across other nodes in the cluster". Does this mean that migration of data happens when a node goes down? Is this a feature available only in Datastax's Cassandra and not Apache Cassandra?

No, instead a "hint" will be stored in the coordinator node and will get eventually written to the node which owns the token range when the node comes back up - the write will succeed depending on your consistency level. So in the above example the write will succeed if you are writing with consistency level as ONE.
If the node is down only for short period - the node will receive the data back from hints from other nodes when it comes back. But if you decommission a node, then the data gets replicated to other nodes and the other nodes will have the new token ranges (same case when a node is added to the cluster as well).
Over time the data in one replica can become inconsistent with others and the repair process helps Cassandra in fixing them - https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesTOC.html
This is applicable in Apache Cassandra as well.

Related

Cassandra: Are SS table created for replicated data

Suppose we have 3 node cluster in Cassandra and replication factor is 2.
During the compaction process, the replicated data of some other primary node also get compacted during the node compaction process.
For example, If node 1 is range partitioned for 1-10 and node 2 and node 3 are replicas for node 1. When we initiate the compaction process on node 2, will the SS table of node 2 have replicated the replication data of node 1?
TIA
The data replication happens either in real-time, when you write data, or via hints, if node was offline, or via explicit repair operation (although repair is the special type of compaction).
Compaction process is independent on every node - there are different factors that could lead to flushes happening on different times, different size of the data, etc. Compaction of data on one node won't send data to other nodes, until it's a validation compaction triggered via repair.
I recommend to read DSE Architecture guide, that explains how data is replicated, etc.
P.S. in your example, if you have RF=2, then only one node will be the replica for Node1...

Cassandra cluster scaling down

I have a 3-node cassandra cluster in aws cloud which is running perfectly.
The traffic is low and I want to scale it down to two or single node due to economic constraints.
What could be the right practice here? Can I pause the other 2 nodes?
Is some data loss is expected?
If the cassandra nodes are available and you decommission them "gracefully", no data loss occurs. The reason is because when you decommission nodes token/data re-distribution occurs (so the process takes some time). If you "hard force" a node out (or if it becomes unavailable for any reason) and your RF is not configured to have data redundancy (e.g. set to 1), you will lose data. So try to remove the node "gracefully" (nodetool decommission (not sure how that's done in AWS)) and when you're done, be sure your RF settings per keyspace are correct (i.e. don't have RF > nodes and be sure it's > 1 if you want redundancy).
-Jim

Cassandra: For a single node cluster, will keyspace replication factor >1 increase disk space usage?

I have a keyspace with replication factor set to 3 but I have only a single node. Will then the disk space be used 3 times the data size? As the replicas are not yet assigned to any other nodes, will cassandra stop creating replicas unless new nodes join the cluster?
No, the disk space used would not be three times the size. The single node would own the entire token range and all writes would be written to that single node once.
What happens with the writes for the other two replicas would depend on if those nodes were previously present in the cluster and are currently down, or if they have never been added to the cluster. If they had never been added, then C* would just skip trying to write to them.
If they had been added but are currently down, and if you have hinted handoffs enabled and are still within the hinted handoff window, then C* will store hints for the down nodes on the single up node.
It depends on the replication strategy you have used . Assuming your queries are working you might have used SimpleStrategy , if you try to write to such a configuration your write should fail as it needs to write to 2 additional replica node before it gives a acknowledgement to the client ,which in case of SimpleStratagy are the next two clockwise nodes in the Ring.

Cassandra Replication With in cluster Without partitioning Data

I have 3 nodes in cluster
Node1 = 127.0.0.1:9160
Node2 = 127.0.0.2:9161
Node3 = 127.0.0.3:9162
I want to use only one node(node1) for insertion. Other two nodes should be used for fault tolerance on writing millions of records. i.e. when node1 is down either node2 or node3 should take care of writing.For that I formed a cluster with replication factor of 2 and added seed nodes properly in cassandra.yalm file. It is working fine. But due to partition whenever I write the data to the node 1, rows are getting scattered across all the node in the cluster. So is there any way to use the nodes for only replication in the cluster?...Or is there any way to disable the partitioning?...
thanks in advance..
No. Cassandra is a fully distributed system.
What are you trying to achieve here? We have a 6 node cluster with RF=3 and since PlayOrm fixed the config bug they had in astyanax, even if we start getting one slow node, it automatically starts going to the other nodes to keep the system fast. Why would you want to avoid great features like that???? IF your primary node gets slow you would be screwed in your situation.
If you describe your use-case better, we might be able to give you better ideas.

How to migrate data from Cassandra cluster of size N to a different cluster of size N+/-M

I'm trying to figure out how to migrate data from one cassandra cluster, to another cassandra cluster of a different ring size...say from a 5 node cluster to a 7 node cluster.
I started looking at sstable2json, since it creates a json file for the SSTable on that specific cassandra node. My thought was to do this for a column family on each node in the ring. So on a 5 node ring, this would give me 5 json files, one file for the data stored on in the column family that resides on each node.
Then I'd merge the json files into one file, and use json2sstable to import into a new cluster, of size, lets say 7. I was hoping that cassandra would then replicate/balance the data out evenly across the nodes in the ring, but I just read that SSTables are immutable once written. So if I did what I just mentioned, I'd end up with a ring with all the data in my column family on one node.
So can anyone help me figure out the process for migrating data from one cluster to a different cluster of a different ring size?
Better: use bin/sstableloader on the sstables from the old ring, to stream to the new one.
Normally sstableloader is used in a sequence like this:
Create sstables locally using SSTableWriter
Use sstableloader to stream the data in the sstables to the right nodes (bin/sstableloader path-to-directory-full-of-sstables). The directory name is assumed to be the keyspace, which will be the case if you point it at an existing Cassandra data directory.
Since you're looking to stream data from an existing cluster A to a new cluter B, you can skip straight to running sstableloader against the data on each node in cluster A.
More details on using sstableloader in this blog post.
You don't need to use sstable2json. If you have the space you can:
get all the sstables from all of the nodes on the old ring
put them all together on each of the new servers (renaming any which have the same names)
run nodetool cleanup on each node in the new ring and they will throw away the data that doesn't belong to them.
You may do some steps as following:
1. Join 7 nodes into 5 nodes clusters and set up each node with its own ring token. At this time, you may have a cluster with 12 nodes.
2. Remove 5 nodes from new cluster in step 1.
3. Set up the token ring for each node after moving 5 nodes in your own.
4. Repairing the 7 nodes cluster.
I would venture to say that this isn't as big of a problem as it may seem.
Create your new ring and define the tokens for each node appropriately as per http://wiki.apache.org/cassandra/Operations#Token_selection
Import data into the new ring.
The ring will balance itself based on the tokens you have defined http://wiki.apache.org/cassandra/Operations#Import_.2BAC8_export

Resources