Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster? - cassandra

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?

Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Related

Cassandra Cluster Migration Issues

Now:
Single Node
cassandra 3.11.3 + kairosdb 1.2
Two data storage path
/data/cassandra/data/kairosdb 4T Old data
/data1/cassandra/data/kaiosdb 1.1T Now Wrting data
target:
Three Node
cassandra 3.11.3 + kairosdb 1.2
One data storage path
/data/cassandra/data/kairosdb
In this case, how to migrate the data in two data directories under a single node to a three-node cluster, each node of this three-node cluster has only one data directory
I understand how to do it (and have practiced it) when migrating a single node to a three-node cluster, but only when there is only one data directory.2 data directories are migrated to 1,I have searched the Internet for a long time, but there is no reference material.
Data directories are something that the individual Cassandra node cares about but the Cluster doesn't.
Usually you'd want to have all nodes share the same configuration but for replication it really doesn't matter where the SSTables are on Disk on each node.
So migrating here would be the same as you've practiced.
That said the process I'd choose would be to add the new nodes as a second DC with the right replication, run a repair to have all the data in sync and then decommission the original node.

DSE Analytics in Cluster Configuration

Previously we had three nodes cluster with two Cassandra nodes datacenter in one dc and one spark enabled node in different dc.
Spark was running smoothly in that configurations.
Then we tried adding another node in analytics dc with spark enabled. We had configured GossipingPropertyFileSnitch as well as added seeds.
But now when we start the cluster, spark master is assigned to both the nodes separately. So spark job still runs on a single node. What configurations are we missing regarding running spark job in a cluster?
Most probably you didn't make an adjustments in the Analytics keyspace replication, or didn't run the repair after you added a node. Please refer to instructions in official documentation.
Also, please check that you configured the same DC for both of Analytics nodes, because the Spark master is elected per DC.

Migrate Datastax Enterprise Cassandra to Apache Cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER
Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.
alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

Migrate from Open Source Cassandra to Datastax Enterprise

We have our current production cluster running on Cassandra 2.2.4
[cqlsh 5.0.1 | Cassandra 2.2.4 | CQL spec 3.3.1 | Native protocol v4]
We want to migrate this setup to a new cluster with DSE 5.0 without disturbing our current production.
What are the steps to do this, with zero/minimal downtime?
We want to have this as a separate cluster.
Can we use sstableloader from source to destination cluster and do a sstableupgrade at destination?
Should we stop compaction on the existing cluster, when running sstabloader?
How to transfer newly created sstables because of the production traffic?
Should we make application to write to both clusters, but only read from the old cluster, until new cluster is in sync with old cluster?
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Setup your new cluster using DSE 5.0
Begin writing to both of your clusters
SCP your SSTables over from the old cluster to the new cluster
Use sstableloader, from your new cluster, on the SSTables you just copied to the new cluster.
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Depends, the snapshots are exactly that, a snapshot of a specific state your cluster was in at some point in time. If you want the freshest data, use the SSTables currently in your data directory.

Copying Cassandra data from one cluster to another

I have a cluster setup in Cassandra on AWS. Now, I need to move my cluster to some other place. Since taking image is not possible, I will create a new cluster exactly a replica of the old one. Now I need to move the data from this cluster to another. How can I do so?
My cluster has 2 data centers and each data center has 3 Cassandra machines with 1 seed machine.
Do you have connectivity between the old and new cluster? If yes, why not linking the cluster and let cassandra replicate the data to the new cluster? After data is transferred, shut down the old cluster. Ideally you wouldn`t even have any downtime.
You can take your SSTables from your data directory and then can use sstableloader in new data center to import the data.
Before doing this activity you might consider doing compaction so that you have only one SSTable per table.
SSTable Loader
Using SFTP server or through some other way, transfer the SSTables from old cluster to new cluster (one DC is enough) and use SSTableLoader. The data replication to another DC will be taken care by Cassandra.
In cassandra there are two type of strategy SimpleStrategy and NetworkTopologyStrategy by using NetworkTopologyStrategy you can replicate in different cluster. see this documentation Data replication
You can use COPY command to export and import csv from one table to another
Simple data importing and exporting with Cassandra

Resources