existing Cassandra 2.2.x cluster, changing the number of vNodes - will data be lost or not?

existing Cassandra 2.2.x cluster, changing the number of vNodes - will data be lost or not? - cassandra

If the number of vNodes in the existing Cassandra 2.2.x cluster is changed - will it cause all the data in that cluster to be lost or not?
Is it possible to change # of vNodes and keep all the data stored in the Cassandra cluster?

The value in the config (cassandra.yaml) is only read on startup. Changing the value here will basically have no effect. You won't lose data.
There used to be a feature called shuffle - but it turned out you really don't want to change the token layout in this way, the streaming associated with shuffle will pretty much kill your cluster.
If you need to do this - the best method is to create a new DC with the desired token ranges and then rebuild them as per the instructions here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
You can then point your app at the new DC and throw away the old.

Related

Data Inconsistency in Cassandra Cluster after migration of data to a new cluster

I see some data inconsistency after moving data to a new cluster.
Old cluster has 9 nodes in total and each has got 2+ TB of data on it.
New cluster has same set of nodes as old and configuration is same.
Here is what I've performed in order:
nodetool snapshot.
Copied that snapshot to destination
Created a new Keyspace on Destination Cluster.
Used sstableloader utility to load.
Restarted all nodes.
After successful completion of transfer, I ran few queries to compare(Old vs New Cluster) and found out that the new cluster is not consistent but the data I see is properly distributed on each node (nodetool status).
Same query returns different sets of results for some of the partitions and I get zero rows first time, second time 100 rows,200 rows and eventually it becomes consistent for few partitions and record count matches with old cluster.
Few partitions have no data in the new cluster where as old cluster has data for those partitions.
I tried running queries on cqlsh with CONSISTENCY ALL but the problem still exist.
Did i miss any important steps to consider before and after?
Is there any procedure to find out the root cause of this?
I am currently running "nodetool repair" but I doubt if that could solve as I tried with Consistency ALL.
Highly Appreciate your help!

The fact that the results eventually becomes consistent indicates that the replicas are out-of-sync.
You can verify this by reviewing the logs around the time that you were loading data, particularly for dropped mutations. You can also check the output of nodetool netstats. If you're seeing blocking read repairs, that's another confirmation that the replicas are out-of-sync.
If you still have other partitions you can test, enable TRACING ON in cqlsh when you query with CONSISTENCY ALL. You will see if there are digest mismatches in the trace output which should also trigger read repairs. Cheers!
[EDIT] Based on your comments below, it sounds like you possibly did not load the snapshots from ALL the nodes in the source cluster with sstableloader. If you've missed loading SSTables to the target cluster, then that would explain why data is missing.

Migrate data from one cassandra cluster to another

Hi I want to migrate data from my cassandra cluster to another cassandra cluster. I have seen many posts suggesting various methods but are not very clear or have limitations. The methods seen are as follows:
Using COPY TO and COPY FROM command: The is easy to use but seems to have a limitation on the number of rows it can copy.
Using SSTABLELOADER: Most articles suggests using sstableloader to move data from one cluster to another. But did not get clear details on steps to create sstables (is it possible to use some nodetool command or require java application to be created? Are these created per node or per cluster? If created how to move them from one cluster to another?) or creating snapshots which is tedious in way that they are created per node and have to be transferred to another cluster. Have also seen answers suggesting using parallel ssh to create snapshot for whole cluster but did not get any example for this as well.
Any help would be appreciated.

It's really a question that requires more information to provide definitive answer. For example, do you need to keep the metadata, such as, WriteTime and TTLs on data, or not? Does the destination cluster has the same topology (number of nodes, token allocation, etc.).
Basically, you have following options:
Use sstableloader - tool shipped with Cassandra itself that is used for restoring from backups, etc. To perform data migration you need to create a snapshot of the table to load (using nodetool snapshot), and run sstableloader on that snapshot. Main advantage is that it will keep metadata (TTL/WriteTime). Main disadvantage is that you need to perform taking snapshot/loading on all nodes of the source cluster, and you need to have exactly the same schema and partitioner in the destination cluster;
You can use backup/restore tool, such as, medusa, that basically automating the taking of snapshot & loading the data;
You can use Apache Spark to copy data from one table to another using Spark Cassandra Connector, for example, as described in this blog post - just read table for one cluster, and write to a table in another cluster. Works fine for simple copy operations, and you have a possibility to perform transformation of data if necessary, but becomes more complex if you need to preserve metadata. Plus it needs Spark;
Use DataStax Bulk Loader (DSBulk) to export data to files on disk, and load into another cluster. In contrast to cqlsh's COPY command, it's heavily optimized for loading/unloading of big amounts of data. It works with Cassandra 2.1+ and most DSE versions (except ancient ones).

If you are able to set up the target cluster with exactly the same topology as the source cluster, the fastest way may be to simply copy the data files from the source to the target cluster, since this avoids the overhead of processing the data to redistribute it to different nodes. In order for this to work, your target cluster must have the same number of nodes, the same rack configuration, and even the same tokens assigned to each node.
To get the tokens for a source node, you can run nodetool info -T | grep Token | awk '{print $3}' | tr '\n' , | sed 's/,$/\n/'. You can then copy the comma-separated list of tokens from the output and paste it into the initial_token setting in your target node's cassandra.yaml. Once you start the node, check its tokens using nodetool info -T to verify that it has the correct tokens. Repeat these steps for each node in the target cluster.
Once you have all of your target nodes set up with exactly the same tokens, DC, and racks as the source cluster, take a snapshot of the desired tables on the source cluster and copy the snapshots to the corresponding node's data directories on the target cluster. DataStax OpsCenter can automate the process of backing up and restoring data and will use direct copying for clusters with the same topology. It appears that medusa can do this too though I have not used this tool before.

What is the best way to change the partitioner in cassandra

Currently we are using random partitioner and we want to update that to murmur3 partitioner. I know we can achive this by using sstable2json and then json2sstable to convert your SSTables manually. Then I can use sstableloader or we need to create new cluster with murmur3 and write an application to pull all the data from old cluster and write to a new cluster.
is there a other easy way to achieve this?

There is no easy way, its a pretty massive change so might want to check on if its absolutely necessary (do some benchmarks, its likely undetectable). Its more a kind of change to make if your switching to a new cluster anyway.
To do it live: Create a new cluster thats murmur3, write to both clusters. In background read and copy data to new cluster while the writes are duplicated. Once background job is complete flip reads from old cluster to new cluster and then you can decommission old cluster.
Offline: sstable2json->json2sstable is pretty inefficient mechanism. Will be a lot faster if you use an sstable reader and use sstable writer (ie edit SSTableExport in cassandra code to write a new sstable instead of dumping output). If you have smaller dataset the cqlsh COPY command may be viable.

Cassandra cluster - Store equal data among the nodes

In Cassandra Cluster, how can we ensure all nodes are having almost equal data, instead one node has more data, another has very less.
If this scenario occurs, what are the best practices
Thanks

It is ok to expect a slight variation of 5-10%. The most common causes are the distribution of your partitions may not be truly random (more partitions on some nodes) and there may be a large variation in the size of the partitions (smallest partition is a few kilobytes but largest partition is 2GB).
There are also 2 other possible scenarios to consider.
SINGLE-TOKEN CLUSTER
If the tokens are not correctly calculated, some nodes may have a larger token range compared to others. Use the token generation tool to get a list of tokens that is correctly distributed around the ring.
If the cluster is deployed with DataStax Enterprise, the easiest way is to rebalance your cluster with OpsCenter.
VNODES CLUSTER
Confirm that you have allocated the same number of tokens in cassandra.yaml with the num_tokens directive.

Unless you are using ByteOrderedPartitioner for your cluster that should not happen. See DataStax documentation here for more information about available partitioners and why it should not (normally) happen.

How to migrate single-token cluster to a new vnodes cluster without downtime?

We have Cassandra cluster with single token per node, total 22 nodes, average load per node is 500Gb. It has SimpleStrategy for the main keyspace and SimpleSnitch.
We need to migrate all data to the new datacenter and shutdown the old one without a downtime. New cluster has 28 nodes. I want to have vnodes on it.
I'm thinking of the following process:
Migrate the old cluster to vnodes
Setup the new cluster with vnodes
Add nodes from the new cluster to the old one and wait until it balances everything
Switch clients to the new cluster
Decommission nodes from the old cluster one by one
But there are a lot of technical details. First of all, should I shuffle the old cluster after vnodes migration? Then, what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch? I want to switch to NetworkTopologyStrategy because new cluster has 2 different racks with separate power/network switches.

should I shuffle the old cluster after vnodes migration?
You don't need to. If you go from one token per node to 256 (the default), each node will split its range into 256 adjacent, equally sized ranges. This doesn't affect where data lives. But it means that when you bootstrap in a new node in the new DC it will remain balanced throughout the process.
what is the best way to switch to NetworkTopologyStrategy and to GossipingPropertyFileSnitch?
The difficulty is that switching replication strategy is in general not safe since data would need to be moved around the cluster. NetworkToplogyStrategy (NTS) will place data on different nodes if you tell it nodes are in different racks. For this reason, you should move to NTS before adding the new nodes.
Here is a method to do this, after you have upgraded the old cluster to vnodes (your step 1 above):
1a. List all existing nodes as being in DC0 in the properties file. List the new nodes as being in DC1 and their correct racks.
1b. Change the replication strategy to NTS with options DC0:3 (or whatever your current replication factor is) and DC1:0.
Then to add the new nodes, follow the process here: http://www.datastax.com/docs/1.2/operations/add_replace_nodes#adding-a-data-center-to-a-cluster. Remember to set the number of tokens to 256 since it will be 1 by default.
In step 5, you should set the replication factor for DC0 to be 0 i.e. change replication options to DC0:0, DC1:3. Now those nodes aren't being used so decommission won't stream any data but you should still do it rather than powering them off so they are removed from the ring.
Note one risk is that writes made at a low consistency level to the old nodes could get lost. To guard against this, you could write at CL.LOCAL_QUORUM after you switch to the new DC. There is still a small window where writes could get lost (between steps 3 and 4). If it is important, you can run repair before decommissioning the old nodes to guarantee no losses or write at a high consistency level.

If you are trying to migrate to a new cluster with vnodes, wouldn't you need to change the Partitioner. The documents say that it isn't a good idea to migrate data between different Partitioners.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string