Cassandra Cluster Migration Issues - cassandra

Now:
Single Node
cassandra 3.11.3 + kairosdb 1.2
Two data storage path
/data/cassandra/data/kairosdb 4T Old data
/data1/cassandra/data/kaiosdb 1.1T Now Wrting data
target:
Three Node
cassandra 3.11.3 + kairosdb 1.2
One data storage path
/data/cassandra/data/kairosdb
In this case, how to migrate the data in two data directories under a single node to a three-node cluster, each node of this three-node cluster has only one data directory
I understand how to do it (and have practiced it) when migrating a single node to a three-node cluster, but only when there is only one data directory.2 data directories are migrated to 1,I have searched the Internet for a long time, but there is no reference material.

Data directories are something that the individual Cassandra node cares about but the Cluster doesn't.
Usually you'd want to have all nodes share the same configuration but for replication it really doesn't matter where the SSTables are on Disk on each node.
So migrating here would be the same as you've practiced.
That said the process I'd choose would be to add the new nodes as a second DC with the right replication, run a repair to have all the data in sync and then decommission the original node.

Related

Is it possible to backup a 6-node DataStax Enterprise cluster and restore it to a new 4-node cluster?

I have this case. We have 6 nodes DSE cluster and the task is to back it up, and restore all the keyspaces, tables and data into a new cluster. But this new cluster has only 4 nodes.
Is it possible to do this?
Yes, it is definitely possible to do this. This operation is more commonly referred to as "cloning" -- you are copying the data from one DataStax Enterprise (DSE) cluster to another.
There is a Cassandra utility called sstableloader which reads the SSTables and loads it to a cluster even when the destination cluster's topology is not identical to the source.
I have previously documented the procedure in How to migrate data in tables to a new Cassandra cluster which is also applicable to DSE clusters. Cheers!

Cassandra cluster bulk loader hangs during export

I am migrating a simple 4 node Cassandra cluster from one cloud provider to another. The number of nodes in both the clouds are same however the newer cluster is at version 3.11.0 and the older one is at 3.0.11. I am using sstableloader to stream data from one cluster to another (schema has been created on new cluster separately). As per the release notes this should not be a problem.
However, for certain column families with sstableloader I get progress to 100% but then it hangs there for hours (time hang >> time to stream). The total data to stream on each node is below 500 GB. Any help on why this is happening and how to avoid is appreciated.
Create a new node and add to the existing cluster from the new cloud server
Flush tables from the memtable to SSTables on disk
Delete one node from old cloud server. Like wise repeat for each node.

Copying Cassandra data from one cluster to another

I have a cluster setup in Cassandra on AWS. Now, I need to move my cluster to some other place. Since taking image is not possible, I will create a new cluster exactly a replica of the old one. Now I need to move the data from this cluster to another. How can I do so?
My cluster has 2 data centers and each data center has 3 Cassandra machines with 1 seed machine.
Do you have connectivity between the old and new cluster? If yes, why not linking the cluster and let cassandra replicate the data to the new cluster? After data is transferred, shut down the old cluster. Ideally you wouldn`t even have any downtime.
You can take your SSTables from your data directory and then can use sstableloader in new data center to import the data.
Before doing this activity you might consider doing compaction so that you have only one SSTable per table.
SSTable Loader
Using SFTP server or through some other way, transfer the SSTables from old cluster to new cluster (one DC is enough) and use SSTableLoader. The data replication to another DC will be taken care by Cassandra.
In cassandra there are two type of strategy SimpleStrategy and NetworkTopologyStrategy by using NetworkTopologyStrategy you can replicate in different cluster. see this documentation Data replication
You can use COPY command to export and import csv from one table to another
Simple data importing and exporting with Cassandra

Cassandra ring configuration

I am trying to connect Apache Cassandra nodes into a ring. They are not Datastax versions, but Cassandra 1.2.8 from the Apache website. When trying to add one as the seed of the other I get following exception:
Unable to find compaction strategy class 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'
Before that I change the "listen_address" and "rpc_address" to local IP address of each node. Next step I add one IP as a seed to another node. The nodes start up, an exception is printed but both nodes run fine until restart. After restarting either node the exception is printed and nodes do not run.
This is very strange - I do not have any DSE components.
Did you previously use any DSE components? If you did and are using the same data directory on any of your nodes, it may find old column families that were created with this compaction strategy. If you have no data you want in the data directories on all your nodes, you should clear them by stopping all nodes, deleting the directories, then starting the nodes.
Or if you have any DSE nodes still up, they may be joining the new cluster and propagating their schema, so creating column families with this compaction strategy. You can find out by looking in the logs and seeing which nodes try to connect. If any aren't from your 1.2.8 ring then this is probably the cause.
That error means you either had a DSE Analytics node in your ring at some point, or you restored your schema from someplace that had an Analytics node.
I would check if you have the folder /etc/dse/ on your VM, that would mean DSE was installed there.
To just wipe the node and start from scratch schema wise, you can stop the node, remove the /system/schema_* folders, then start the node. When it starts it will have no schema. Re-create any keyspaces/column families you had before, and they will get read from disk.

Priam backup automatic restore

I have a Cassandra cluster managed by Priam, with 3 nodes. I use ephemeral disks to store my Cassandra data, so when I start 1 node, the Cassandra data dir is empty.
I have Priam properly configured and I can see backups are saved in Amazon S3. Suppose a node goes down and then I start another node. Will Priam know how to automatic restore backup from S3 when the node comes up again? The Cassandra data dir will start empty, so I am assuming Priam would give the new node the same token as the old one and it would restore the data... Right?
Yes. I have been running standalone Cassandra on EC2, small Cassandra clusters on mesos on EC2, and larger DataStax Enterprise clusters (with Cassandra) on EC2.
I have been using the Priam 3.x branch.
On restore, it calculates the initial_token, updates the cassandra.yaml file, restores the snapshot and incremental backup files, and restarts Cassandra.
According to Priam/Netflix conventions, if you have a 3 node cluster with Cassandra, your nodes should be named some_thing-other-things. Each node should be a part of an Auto-scaling group called some_thing. Each node should also use a Security Group named some_thing.
Create a 3 node dev cluster and test your backups and restores with data that you can easily recreate, that you don't care about too much. Get used to managing the Auto-scaling groups and Priam. Then, try it on test clusters with data that you care about.

Resources