how to migrate Cassandra data from windows server on to LINUX server for Cassandra? - cassandra

I am not sure if my windows based Cassandra installation will work as it is on Linux based Cassandra nodes.
My data resides on windows Cassandra-DB and plans to shift on to LINUX server in order to use ELASSANDRA now.
Can same data files be copied from Win-OS to Linux-OS in same directories of Cassandra Folders?
As both are with different file system so i have some doubts if that will ever work.
If not what is the workaround to migrate all data?

The issue with the files has more to do with the version of Cassandra, rather than the OS. Cassandra's implementation in Java makes the underlying OS somewhat (albeit not completely) irrelevant.
Each version of Cassandra has a specific format for writing its SSTable files. As long as the version of Cassandra is the same between each server, copying the files should work.
Otherwise, if the Windows and Linux servers can see each other on the network, the easiest way to migrate would be to join the Linux server to the "cluster" on Windows. Just give the Linux server the IP of the Windows machine as its seed, set the cluster_name to be the same, and it should join. Then adjust the keyspace replication and run a repair.

You should not repair, but stream data from your existing DC with a nodetool rebuild -- [source_dc_name]:
1-Just start all nodes in your new DC with auto_bootstrap: false in conf/cassandra.yaml
2-Run nodetool rebuild on these nodes
3-Remove auto_bootstrap: false in conf/cassandra.yaml
If you start Elassandra in your new DC, Elasticsearch indices wil be rebuild while streaming or reparing, so have fun with it !

Related

cassandra 3.11.x mixing vesions

We have a 6 node cassandra 3.11.3 cluster with ubuntu 16.04. These are virtual machines.
We are switching to physical machines on brand (8!) new servers that will have debian 11 and presumably cassandra 3.11.12.
Since the main version is always 3.11.x and ubuntu 16.04 is out of support, the question is: can we just let the new machines join the old cluster and then decommission the outdated?
I hope to get a tips about this becouse intuitively it seems fine but we are not too sure about that.
Thank you.
We have a 6 node cassandra 3.11.3 cluster with ubuntu 16.04. These are virtual machines. We are switching to physical machines on brand (8!)
Quick tip here; but it's a good idea to build your clusters in multiples of your RF. Not sure what your RF is, but if RF=3, I'd either stay with six or get one more and go to nine. It's all about even data distribution.
can we just let the new machines join the old cluster and then decommission the outdated?
In short, no. You'll want to upgrade the existing nodes to 3.11.12, first. I can't recall if 3.11.3 and 3.11.12 are SSTable compatible, but I wouldn't risk it.
Secondly, the best way to do this, is to build your new (physical) nodes in the cluster as their own logical data center. Start them up empty, and then run a nodetool rebuild on each. Once that's complete, then decommission the old nodes.
There is a bit simpler solution - move data from each virtual machine into a physical server, as following:
Prepare Cassandra installation on a physical machine, configure the same cluster name, etc.
1.Stop Cassandra in a virtual machine & make sure that it won't start
Copy all Cassandra data /var/lib/cassandra or something like from VM to the physical server
Start Cassandra process on a physical server
Repeat that process for all VM nodes, at some point, updating seeds, etc. After process is finished, you can add two physical servers that are left. Also, to speedup process, you can do initial copy of the data before stopping Cassandra in the VM, and after it's stopped, re-sync data with rsync or something like. This way you can minimize the downtime.
This approach would be much faster compared to the adding a new node & decommissioning the old one as we won't need to stream data twice. This works because after node is initialized, Cassandra identify nodes by assigned UUID, not by IP address.
Another approach is to follow instructions on replacement of the dead node. In this case streaming of data will happen only once, but it could be a bit slower compared to the direct copy of the data.

Migrate Datastax Enterprise Cassandra to Apache Cassandra

We have currently using DSE 4.8 and 5.12. we want to migrate to apache cassandra .since we don't use spark or search thought save some bucks moving to apache. can this be achieved without down time. i see sstableloader works other way. can any one share me the steps to follow to migrate from dse to apache cassandra. something like this from dse to apache.
https://support.datastax.com/hc/en-us/articles/204226209-Clarification-for-the-use-of-SSTABLELOADER
Figure out what version of Apache Cassandra is being run by DSE. Based on the DSE documentation DSE 4.8.14 is using Apache Cassandra 2.1 and DSE 5.1 is using Apache Cassandra 3.11
Simplest way to do this is to build another DC (Logical DC per Cassandra) and add it to the existing cluster.
As usual, with a "Nodetool Rebuild {from-old-DC}" on to the new DC nodes, let Cassandra take care of streaming data to the new Apache Cassandra nodes naturally.
Once data streaming is completed, based on the LoadBalancingPolicy being used by applications, switch their local_dc to DC2 (the new DC). Once the new DC starts taking traffic, shutdown nodes in old DC say DC1 one by one.
alter keyspace dse_system and dse_security not using everywhere
on non-seed nodes, cleanup cassandra data directory
turn on replace in cassandra-env.sh
start instance
monitoring streaming process using command 'nodetool netstats|grep Receiving'
change seeds node definition and rolling restart before finally migrate previous seeds nodes.

Running remote cqlsh to execute commands on Cassandra Cluster

So I have a Cassandra cluster of 6 nodes on my Ubuntu machines, now I have got another machine running Windows Server 2008. I have installed DataStax Apache Cassandra on this new Windows machine, and I want to be able to run all the CQL commands from Windows machine onto Ubuntu machines. So its like remote command execution.
I tried opening cqlsh in cmd using cqlsh with the IP of my one of the nodes and port like cqlsh 192.168.4.7 9160
But I can't seem to make it work. Also I don't want to add the new machine to my existing cluster Please suggest.
Provided version 3.1.1 is not supported by this server (supported: 2.0.0, 3.0.5)
any workaround u could suggest?
Basically, you have two options here. The harder one would be to upgrade your cluster (the tough, long-term solution). But there have been many improvements since 1.2.9 that you could take advantage of. Not to mention bugs fixed long ago that you may be running into.
The other, quicker option would be to install 1.2.9 on your Windows machine. Probably the easiest way to do this, would be to zip-up your Cassandra dir on Ubuntu (minus the data, commitlog, and saved caches dirs of course), copy it to your Windows machine, and expand it. Then the cqlsh versions would match-up, and you could solve your immediate problem.

How to use sstableloader from a node not in Cassandra Cluster ring

we are using apache-cassandra 1.1.9 version on Production Cassandra Cluster on linux.I want to upload some data using sstableloader.
I was able to generate sstables for a small data and then tried to upload these sstables into Cassandra Cluster using sstableloader from another machine(which is in same network but not in cassandra cluster ring) but get below error
"Could not retrieve endpoint ranges:"
I do not understand why this error is coming.
This machine , where i am running sstableloader, has same cassandra installation.I copied the cassandra.yaml from production cassandra into my host machine's apache-cassandra/conf folder.
My sstables are in below directory structure:-
/path/to/keyspace dir/Keyspace/*.db
SStable command I am running is below
./sstableloader -d -i , /home/Data/Keyspace/
Could not retrieve endpoint ranges:
Please advise , if i am doing wrong here ?
Found the solution.
The sstableloader command needs to be executed from the directory containing the Keyspace subdirectory.
For e.g
if /home/Data is the directory structure, under which there is sub directories keyspace/ColumnFamily/
then execute command like below from /home/Data/ directory.
~/apache-cassandra/bin/sstableloader -d /keyspace/ColumnFamily
This is a bit old, but I ran into the "Could not retrieve endpoint ranges" error recently with a different root cause.
In our case, data was being exported from a production system, and loaded to a new development instance. The development instance had been incorrectly set up so the sstables were generated using dse 4.7, and the sstableloader being run was dse 4.6.
Note that it is possible to ingest tables from 4.6 into dse 4.7 for debugging, etc, but that it is necessary to run nodetool upgradesstables first. That isn't what was going on here.

Cassandra store Keyspace to new Disk

I just setup a fresh windows server with a fresh datastax installation including cassandra 1.2 and opscenter 2.1.3. I've tried finding solutions to these questions on cassandra wikis and datastax website, but I can only find unix specific information or datastax API information.
Cassandra is defaulted to using C: drive (I was never asked to select a drive for cassandra during install).
In the same cassandra instance, can I have keyspaces on separate
disks?
If not, how do I migrate the existing keyspace to the new
drive? (just reconfiguring cassandra.yaml to use a new directory
would lose my opscenter data and may even break opscenter).
If yes, how can I create a new keyspace on a separate drive? cassandra.yaml
seems to only have configuration options for a single store location.
Should I be creating a new cluster to store my data in? If I start
adding new nodes to the default cluster, that will mean the datastax
opscenter data will be getting replicated - that seems like a bad
idea.
If there is good documentation on this somewhere, please point me there.
Thanks,
Adam
You cannot get cassandra to split the keyspaces and store them in different directories. They are all stored under a common data directory that is specified in the cassandra.yaml file.
However, you can set this up and use NTFS to mount different drives under the data directory on your server but this will not be simple or expandable.
If you want to move where the data is stored on cassandra, then stop the cassandra daemon/service, change the cassandra.yaml file to store the data at a new location, then copy/move the entirety of the data directory to this new location. THEN start cassandra back up and it will work fine with the data in the new location. I have done this quite a few times now and cassandra comes back up without incident and no lost data (if you do not move the data, then it will lose it all and recreate the directory structure under the new location).
Data getting replicated is not a bad thing - it is what cassandra was designed for. I don't know what replication factor opscenter uses, but it does not store a massive amount of data so replication is not a problem.

Resources