Migrate from Open Source Cassandra to Datastax Enterprise - cassandra

We have our current production cluster running on Cassandra 2.2.4
[cqlsh 5.0.1 | Cassandra 2.2.4 | CQL spec 3.3.1 | Native protocol v4]
We want to migrate this setup to a new cluster with DSE 5.0 without disturbing our current production.
What are the steps to do this, with zero/minimal downtime?
We want to have this as a separate cluster.
Can we use sstableloader from source to destination cluster and do a sstableupgrade at destination?
Should we stop compaction on the existing cluster, when running sstabloader?
How to transfer newly created sstables because of the production traffic?
Should we make application to write to both clusters, but only read from the old cluster, until new cluster is in sync with old cluster?
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?

Setup your new cluster using DSE 5.0
Begin writing to both of your clusters
SCP your SSTables over from the old cluster to the new cluster
Use sstableloader, from your new cluster, on the SSTables you just copied to the new cluster.
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Depends, the snapshots are exactly that, a snapshot of a specific state your cluster was in at some point in time. If you want the freshest data, use the SSTables currently in your data directory.

Related

Cassandra Cluster Migration Issues

Now:
Single Node
cassandra 3.11.3 + kairosdb 1.2
Two data storage path
/data/cassandra/data/kairosdb 4T Old data
/data1/cassandra/data/kaiosdb 1.1T Now Wrting data
target:
Three Node
cassandra 3.11.3 + kairosdb 1.2
One data storage path
/data/cassandra/data/kairosdb
In this case, how to migrate the data in two data directories under a single node to a three-node cluster, each node of this three-node cluster has only one data directory
I understand how to do it (and have practiced it) when migrating a single node to a three-node cluster, but only when there is only one data directory.2 data directories are migrated to 1,I have searched the Internet for a long time, but there is no reference material.
Data directories are something that the individual Cassandra node cares about but the Cluster doesn't.
Usually you'd want to have all nodes share the same configuration but for replication it really doesn't matter where the SSTables are on Disk on each node.
So migrating here would be the same as you've practiced.
That said the process I'd choose would be to add the new nodes as a second DC with the right replication, run a repair to have all the data in sync and then decommission the original node.

Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster?

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?
Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Restore snapshots from 3 node Cassandra cluster to a new 6 node cluster

I am new to cassandra and would like some help on restoring snapshots from 3 node Cassandra cluster to a new 6 node cluster.
We have few keyspaces and would like to copy data from dev to production.
Thanks in advance.
The easiest way is to use the sstableloader tool that is bundled with Cassandra. You can find it in %installdir%/bin/sstableloader.
You will first need to re-create the schema on your new cluster:
dump the schema for the keyspace you want to transfer from your original cluster using cqlsh -e 'DESC KEYSPACE mykeyspace;' > mykeyspace.cql
load it into your new cluster using cqlsh -f mykeyspace.cql.
(optional) If you new cluster will have a different replication configuration you'll need to modify it manually after loading the schema. (ALTER KEYSPACE mykeyspace WITH REPLICATION = ...;)
Once that's done, you can start bulk-loading the SSTables from your keyspace snapshots into the new cluster:
sstableloader --nodes 10.0.0.1,10.0.0.2 -f /etc/cassandra/cassandra.yaml /path/to/mykeyspace/snapshot/
Note that this might take a while if you have a lot of data to load. You should also run a full repair on the new cluster afterwards to ensure that the replicas are properly distributed.

Migrate Datastax Enterprise Cassandra to Apache Cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER
Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.
alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

How to use sstableloader from a node not in Cassandra Cluster ring

we are using apache-cassandra 1.1.9 version on Production Cassandra Cluster on linux.I want to upload some data using sstableloader.
I was able to generate sstables for a small data and then tried to upload these sstables into Cassandra Cluster using sstableloader from another machine(which is in same network but not in cassandra cluster ring) but get below error
"Could not retrieve endpoint ranges:"
I do not understand why this error is coming.
This machine , where i am running sstableloader, has same cassandra installation.I copied the cassandra.yaml from production cassandra into my host machine's apache-cassandra/conf folder.
My sstables are in below directory structure:-
/path/to/keyspace dir/Keyspace/*.db
SStable command I am running is below
./sstableloader -d -i , /home/Data/Keyspace/
Could not retrieve endpoint ranges:
Please advise , if i am doing wrong here ?
Found the solution.
The sstableloader command needs to be executed from the directory containing the Keyspace subdirectory.
For e.g
if /home/Data is the directory structure, under which there is sub directories keyspace/ColumnFamily/
then execute command like below from /home/Data/ directory.
~/apache-cassandra/bin/sstableloader -d /keyspace/ColumnFamily
This is a bit old, but I ran into the "Could not retrieve endpoint ranges" error recently with a different root cause.
In our case, data was being exported from a production system, and loaded to a new development instance. The development instance had been incorrectly set up so the sstables were generated using dse 4.7, and the sstableloader being run was dse 4.6.
Note that it is possible to ingest tables from 4.6 into dse 4.7 for debugging, etc, but that it is necessary to run nodetool upgradesstables first. That isn't what was going on here.

Resources