Cassandra Replication not working accross data centers - cassandra

I am new to Cassandra and have configured cassandra cluster as multiple aws data center.
I have 3 replicat in eu-central-1 and 3 replicat in eu-west-1.
I have created keyspace from eu-central-1 seed as following: CREATE KEYSPACE my_test WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'eu-west-1':'3', 'eu-central-1':'3'}; after that I have created several tables under this keyspace.
Those keyspace and tables didn't replicated to the eu-west-1 3 replicat, should those keyspace and the tables be replicated to eu-west-1 seeds automatically ? if yes , what's wrong with my configurations.

Yes, whatever tables belong in the keyspace my_test should have replicated to both DCs.
How are you determining that the tables have not replicated? I'd be happy to update my answer when you update your original question.
Since you're new to Cassandra, I recommend datastax.com/dev which has links to free hands-on tutorials where you can quickly learn the basics of Cassandra.
This tutorial is a good place to start -- datastax.com/try-it-out.
We also have FREE live workshops where you get to learn hands-on in a fun environment with other participants and have a chance to win prizes. Have a look at the list of upcoming workshops here -- datastax.com/workshops. Cheers!

Related

Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster?

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?
Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Does Scylla DB have a similar migration support to GKE as K8ssandra's Zero Downtime Migration feature?

We are trying to migrate our ScyllaDB cluster deployed on GCE machines to the GKE cluster in Google Cloud, we came across one approach of Cassandra migration and want to implement the same here in ScyllaDB migration. Below is the link for the same, can you please suggest if this is possible in Scylla ?
or if Scylla hasn't introduced such a migration technique with the Scylla K8S operator ?
https://k8ssandra.io/blog/tutorials/cassandra-database-migration-to-kubernetes-zero-downtime/
Adding a new "destination" DC to your existing cluster "source" DC, is a very common technic to migrate to a new DC.
Add the new "destination" DC
Change replication factor settings accordingly
nodetool rebuild --> stream data from the "source" DC to the "destination" DC
nodetool repair the new DC.
Update your application clients to connect to the new DC once it's ready to serve (all data streamed + repaired)
Decommission the "old" (source) DC
For the gory details see here:
https://docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/add-dc-to-existing-dc.html
https://docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/decommissioning-data-center.html
If you prefer to go the full scan route. CQL reads on the source and CQL writes on the destination, with some ability for data manipulation and save points to resume from, than the Scylla Spark Migrator is a good option.
https://github.com/scylladb/scylla-code-samples/tree/master/spark-scylla-migrator-demo
You can also use the Scylla Spark migrator to migrate parquet files
https://www.scylladb.com/2020/06/10/migrate-parquet-files-with-the-scylla-migrator/
Remember not to migrate Materialized views (MV), you can always re-create them post migration again from the base tables.
We use an Apache Spark-based Migrator: https://github.com/scylladb/scylla-migrator
Here's the blog we wrote on how to do this back in 2019: https://www.scylladb.com/2019/02/07/moving-from-cassandra-to-scylla-via-apache-spark-scylla-migrator/
Though in this case, you aren't moving from Cassandra to ScyllaDB; just moving from one ScyllaDB instance to another. If this makes sense to you, it should be straight forward. If you have questions, feel free to join our Slack community to get more interactive assistance:
http://slack.scylladb.com/

How to read and write on cassandraDB node that is located on a different machine in the datacenter?

I am a Cassandra newbie, I read that Cassandra distributes the data across the network (cluster, datacenter, etc), so I'd like to understand clearly something:
let's say I got 3 physical computers (host1,host2, and host3) with ubuntu16.04 and Cassandra installed on each one.
These computers are on my LAN, they can ping well one another
Now I create a keyspace on my host1
create KEYSPACE mykeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'MyLAN': 3};
can I interact with created mykeyspace on the other hosts? (host2 and host3) using cqlsh client?
When I add another host4 to LAN can I still view mykeyspace after altering mykeyspace with replica 4?
any clear explanation or idea?
Yes, you can. There's nothing special about each individual C* node -- you can connect to any node.
Yes, you can. But you don't generally change the keyspace replication when you add nodes. Our general recommendation is 3 replicas per data center.
If you're interested, I've explained and included diagrams to illustrate it in great detail in this post -- https://community.datastax.com/questions/5486/. Cheers!

Adding new keyspace in existing production cassandra cluster

I" have an existing cassandra cluster running in AWS. It has total 6 nodes in the same data center but in multiple regions. We are using cassandra version 2.2.8 in production. There are two existing keyspaces already present in the production environment. I want to add a new keyspace to the production cluster.
I am new to Cassandra so looking for following answers:
Can I add new keyspace in the existing production cluster without taking the cluster down?
Any best practices you would recommend to add the new keyspace to the existing cluster.
Possible steps to add new Keyspace?
I really appreciate your help!
Yes, you can add keyspaces online.
When you add a keyspace, you have to choose the Replication Factor. As you have AWS Multi Region, probably you are using Ec2MultiRegionSnitch as endpoint_snitch, right?
If you do, probably you configured dc_suffix=_XYZ and now you have your DCs like this: "us-east_XYZ" (See on nodetool status).
Then, you can use something like this:
CREATE KEYSPACE my_keysace
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy','us-east_XYZ' : 2, 'us-west_XYZ':2 }
AND DURABLE_WRITES = true
See docs: CREATE KEYSPACE

Copying Cassandra data from one cluster to another

I have a cluster setup in Cassandra on AWS. Now, I need to move my cluster to some other place. Since taking image is not possible, I will create a new cluster exactly a replica of the old one. Now I need to move the data from this cluster to another. How can I do so?
My cluster has 2 data centers and each data center has 3 Cassandra machines with 1 seed machine.
Do you have connectivity between the old and new cluster? If yes, why not linking the cluster and let cassandra replicate the data to the new cluster? After data is transferred, shut down the old cluster. Ideally you wouldn`t even have any downtime.
You can take your SSTables from your data directory and then can use sstableloader in new data center to import the data.
Before doing this activity you might consider doing compaction so that you have only one SSTable per table.
SSTable Loader
Using SFTP server or through some other way, transfer the SSTables from old cluster to new cluster (one DC is enough) and use SSTableLoader. The data replication to another DC will be taken care by Cassandra.
In cassandra there are two type of strategy SimpleStrategy and NetworkTopologyStrategy by using NetworkTopologyStrategy you can replicate in different cluster. see this documentation Data replication
You can use COPY command to export and import csv from one table to another
Simple data importing and exporting with Cassandra

Resources