Migrate Datastax Enterprise Cassandra to Apache Cassandra - cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER

Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.

alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

Related

Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster?

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?
Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Multi DC replication between different Cassandra versions

We have an existing Cassandra cluster (3.0.9) running on production.
Now ,we want to create data pipelines to ingest data from Cassandra and persist in hadoop. We are thinking of using CDC feature (available from Cassandra 3.8) along with Kafka Connect.
We are thinking of creating a new read-only DC which will replicate data from the Production DC.This new DC will be running the latest Cassandra version (3.8+) with CDC enabled.
My questions:
For replication to work, do we need both dc's running same version of Cassandra? Can't we achieve this without upgrading the DC used by the service?
Is it possible to enable CDC feature only in the new read-only DC?
UPDATE :
More information from C* mailing list https://lists.apache.org/thread.html/r9e705895c480f264998c29cf69c0eb2296382049467e31c447f676c7%40%3Cuser.cassandra.apache.org%3E
I think, it should be the same version as existing DC for replication of the data by adding a DC. you may refer below recommended document below for adding new datacenter in existing cluster.
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsAddDCToCluster.html
You should upgrade the existing DC from lower to upper version of Cassandra to get expected feature.
You can make your DC as read only without sending any direct traffic in the new DC. all connection should be on older DC.

DSE Analytics in Cluster Configuration

Previously we had three nodes cluster with two Cassandra nodes datacenter in one dc and one spark enabled node in different dc.
Spark was running smoothly in that configurations.
Then we tried adding another node in analytics dc with spark enabled. We had configured GossipingPropertyFileSnitch as well as added seeds.
But now when we start the cluster, spark master is assigned to both the nodes separately. So spark job still runs on a single node. What configurations are we missing regarding running spark job in a cluster?
Most probably you didn't make an adjustments in the Analytics keyspace replication, or didn't run the repair after you added a node. Please refer to instructions in official documentation.
Also, please check that you configured the same DC for both of Analytics nodes, because the Spark master is elected per DC.

Migrate from Open Source Cassandra to Datastax Enterprise

We have our current production cluster running on Cassandra 2.2.4
[cqlsh 5.0.1 | Cassandra 2.2.4 | CQL spec 3.3.1 | Native protocol v4]
We want to migrate this setup to a new cluster with DSE 5.0 without disturbing our current production.
What are the steps to do this, with zero/minimal downtime?
We want to have this as a separate cluster.
Can we use sstableloader from source to destination cluster and do a sstableupgrade at destination?
Should we stop compaction on the existing cluster, when running sstabloader?
How to transfer newly created sstables because of the production traffic?
Should we make application to write to both clusters, but only read from the old cluster, until new cluster is in sync with old cluster?
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Setup your new cluster using DSE 5.0
Begin writing to both of your clusters
SCP your SSTables over from the old cluster to the new cluster
Use sstableloader, from your new cluster, on the SSTables you just copied to the new cluster.
Should we run sstableloader from old data directory or from snapshot directory. What is the difference between the 2 approaches?
Depends, the snapshots are exactly that, a snapshot of a specific state your cluster was in at some point in time. If you want the freshest data, use the SSTables currently in your data directory.

Drop keyspace is not working

I'm trying use Datastax Enterprise for deploying database system
My cluster:
2 dse cassandra node
2 dse solr node
I created a keyspace by cqlsh on one node but I could not drop that keyspace from another nodes on cluster, unless drop from node creating keyspace. Anybody know why?
This sounds like https://issues.apache.org/jira/browse/CASSANDRA-5202
I have had to delete the data directory for the troubled key space in all nodes and restart DSE to fix.
Do you have the output from cqlsh on the node that won't allow you to drop the keyspace?
What version of DSE are you using?

Resources