Multi DC replication between different Cassandra versions - cassandra

We have an existing Cassandra cluster (3.0.9) running on production.
Now ,we want to create data pipelines to ingest data from Cassandra and persist in hadoop. We are thinking of using CDC feature (available from Cassandra 3.8) along with Kafka Connect.
We are thinking of creating a new read-only DC which will replicate data from the Production DC.This new DC will be running the latest Cassandra version (3.8+) with CDC enabled.
My questions:
For replication to work, do we need both dc's running same version of Cassandra? Can't we achieve this without upgrading the DC used by the service?
Is it possible to enable CDC feature only in the new read-only DC?
UPDATE :
More information from C* mailing list https://lists.apache.org/thread.html/r9e705895c480f264998c29cf69c0eb2296382049467e31c447f676c7%40%3Cuser.cassandra.apache.org%3E

I think, it should be the same version as existing DC for replication of the data by adding a DC. you may refer below recommended document below for adding new datacenter in existing cluster.
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToCluster.html
You should upgrade the existing DC from lower to upper version of Cassandra to get expected feature.
You can make your DC as read only without sending any direct traffic in the new DC. all connection should be on older DC.

Related

Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster?

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?
Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Does Scylla DB have a similar migration support to GKE as K8ssandra's Zero Downtime Migration feature?

We are trying to migrate our ScyllaDB cluster deployed on GCE machines to the GKE cluster in Google Cloud, we came across one approach of Cassandra migration and want to implement the same here in ScyllaDB migration. Below is the link for the same, can you please suggest if this is possible in Scylla ?
or if Scylla hasn't introduced such a migration technique with the Scylla K8S operator ?
https://k8ssandra.io/blog/tutorials/cassandra-database-migration-to-kubernetes-zero-downtime/
Adding a new "destination" DC to your existing cluster "source" DC, is a very common technic to migrate to a new DC.
Add the new "destination" DC
Change replication factor settings accordingly
nodetool rebuild --> stream data from the "source" DC to the "destination" DC
nodetool repair the new DC.
Update your application clients to connect to the new DC once it's ready to serve (all data streamed + repaired)
Decommission the "old" (source) DC
For the gory details see here:
https://docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.html
https://docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/decommissioning-data-center.html
If you prefer to go the full scan route. CQL reads on the source and CQL writes on the destination, with some ability for data manipulation and save points to resume from, than the Scylla Spark Migrator is a good option.
https://github.com/scylladb/scylla-code-samples/tree/master/spark-scylla-migrator-demo
You can also use the Scylla Spark migrator to migrate parquet files
https://www.scylladb.com/2020/06/10/migrate-parquet-files-with-the-scylla-migrator/
Remember not to migrate Materialized views (MV), you can always re-create them post migration again from the base tables.
We use an Apache Spark-based Migrator: https://github.com/scylladb/scylla-migrator
Here's the blog we wrote on how to do this back in 2019: https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/
Though in this case, you aren't moving from Cassandra to ScyllaDB; just moving from one ScyllaDB instance to another. If this makes sense to you, it should be straight forward. If you have questions, feel free to join our Slack community to get more interactive assistance:
http://slack.scylladb.com/

Migrate Datastax Enterprise Cassandra to Apache Cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER
Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.
alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

Can I set cassandra cluster with vnode using in one datacenter and another not

I had a cassandra cluster with one datacenter and it is upgraded from 1.0.7 to 2.1.8. However, the old datacenter does not use vnode settings because the 1.0.7 does not support it.
Right now, I want to add a new datacenter of 2.1.8 version and want to use vnode settings in the new data center. Can I keep the old datacenter not using vnode and the new datacenter with vnode settings?
You could try the procedure listed here.
"You cannot directly convert a single-token nodes to a vnode. However, you can configure another data center configured with vnodes already enabled and let Cassandra automatic mechanisms distribute the existing data into the new nodes. This method has the least impact on performance."

Creating new datacenter with Datastax OpsCenter

I'd like to enable vnodes on my cassandra cluster, which has an Analytics dc and a regular Cassandra dc. I am using OpsCenter 5.0.1 and DSE 4.5. My question is: how can I create a new dc with OpsCenter, with vnodes enabled, so I can transfer my data over from my existing dc's. I am following the instructions on this page, but surely I don't have to manually edit the config file on every node, to enable a new datacenter, right? Any help much appreciated.
Unfortunately OpsCenter's automated provisioning doesn't currently support creating multi-dc clusters or adding data centers to existing clusters. We know this is important functionality that's missing, and are working on making that available as soon as we can.

Resources