Cassandra upgrade from 2.0.x to 2.1.x or 3.0.x - cassandra

I've searched for previous versions of this question, but none seem to fit my case. I have an existing Cassandra cluster running 2.0.x. I've been allocated new VMs, so I do NOT want to upgrade my existing Cassandra nodes - rather I want to migrate to a) new VMs and b) a more current version of Cassandra.
I know for in-place upgrades, I would upgrade to the latest 2.0.x, then to the latest 2.1.x. AFAIK, there's no SSTable inconsistency here. If I go this route via addition of new nodes, I assume I would follow the datastax instructions for adding new nodes/decommissioning old nodes?
Given the above, is it possible to move from 2.0.x to 3.0.x? I know the SSTable format is different; however, if I'm adding new nodes (rather than re-using SSTables on disk), does this matter?
It seems to me that #2 has to work - otherwise, it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously; otherwise, there would be mixed 2.x.x and 3.0.x versions running in the same cluster at some point.
Am I completely wrong? Does anyone have any experience doing this?

Yes, it is possible to migrate data to a different environment (the new vm's with the updated Cassandra using sstableloader, but you will need C* 3.0.5 and above, as that version added support to upload sstables from previous versions.
Once that the process is completed it is recommended to execute nodetool upgradesstables to ensure that there are no incompatibilities on the data, and a nodetool cleanup.
Regarding your comment ... it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously;... is not true; doing the upgrade one node at a time will create a mixed cluster with nodes with the two versions as you mentioned, which is not optimal, but will allow you to avoid any downtime in production. (Note that the impact of this operation will depend on the consistency level used in your application.)

Don't worry about the migration. You can simply migrate your Cassandra 2.0.X cluster to Cassandra 3.0.X. But its better if you migrate your cluster Cassandra 2.0.X to latest Cassandra 2.X.X then Cassandra 3.0.X. You need to follow some steps-
Backup data
Uninstall present version
Install the version you want to upgrade
Restore data
As you are doing migration, you need to be careful about your data always. For the data backup and restore you can follow two ways-
Creating snapshots of your sstables and then after installing the new version of cassandra, placing the files to the data location and run sstableloader.
Backup your schema's to a .cql file and copy all the tables to .csv and then after installing the new version of cassandra, source your schema from .cql and copy all the tables from every single .csv file.
If you are fully convinced how you will complete the migration then you can write a bash script to complete the backup and restore steps.

Related

Migrate Data from one Riak cluster to another

I have a situation where we need to migrate data from one Riak cluster to another and then remove the old cluster. The ring size will be same, even the region will be the same. We need to do this to upgrade the instances to AL2. Is there a clean approach to do so on Prod, without realtime data loss?
The answer to this may be tied to your version of Riak KV. If you have the open source version of Riak KV 2.2.3 or earlier, this will require an in-situ upgrade to Riak KV 2.2.6 before progressing. See https://www.tiot.jp/riak-docs/riak/kv/2.2.6/setup/upgrading/version/ with packages at https://files.tiot.jp/riak/kv/2.2/2.2.6/
For an Enterprise Editions of Riak KV 2.2.3 and earlier or the open source edition of Riak KV 2.2.6 or higher, you can use multi-data centre replication (MDC).
Use both of these at the same time for proper replication and to prevent data loss:
fullsync replication will copy across all stored data on its first run and then any missing data on subsequent runs.
realtime replication will replicate all transactions in almost realtime.
If you then set this up as bidirectional replication (get each cluster to replicate to the other for both fullsync and realtime) then you will be able to seemlessly switch your production environment from one cluster to the other without any issues. Once you are happy everything is working as expected, you can kill the old cluster.
Please see the documentation for replication at https://www.tiot.jp/riak-docs/riak/kv/2.2.6/using/cluster-operations/v3-multi-datacenter/

Does cassandra upgrade require to run nodetool upgradesstables for cluster holding TTLed data

I am running 3 node apache cassandra cluster as docker container holding timeseries data with 45 days TTL.
I am planning to upgrade the current cassandra version 2.2.5 to cassandra 3.11.4 release. Following steps are identified for the upgrade -
Backup existing data
Flush one of the cassandra node
bin/nodetool -h cassandra1 -u ca_itoa -pw ca_itoa drain
Stop the cassandra1 node
Start the new cassandra 3.11.4 container
Upgrade the SSTable
bin/nodetool -u ca_itoa -pw ca_itoa upgradesstables
Check the node status. Repeat the process for the rest of the nodes
I have few questions about the upgrade process -
Are the steps correct?
Is it manodatory to run upgradesstables command. It is time consuming, and I want to see if I can avoid. The data has TTL set. Will the cassandra continue writing in new SSTable format whereas the old SSTable data get cleaned-up on expiring? Assumption is that, after 45 days, all SSTable would be in new shiny format.
Just some additional thoughts:
For Step #6, you actually don't have to run upgradesstables right away. In fact, if you're upgrading a production system, it's probably better that you don't until the application team verifies that they can connect ok. Remember, older versions of the driver which work in 2.2 may not work with 3.11.4.
To this end, I would wait until the entire cluster is running on the new version before running upgradesstables on each node.
Is it manodatory to run upgradesstables command?
As each Cassandra version is capable of reading its own SSTable format as well as the prior major version, I guess it's not mandatory. But it's definitely something that you should want to do. Especially when upgrading to 3.x.
Cassandra 3 contains a significant upgrade to the storage engine, which results is a much smaller disk footprint. One cluster I upgraded saw a 90% reduction in disk needs.
Plus, you'd be incurring additional latency when reading records which may be spread across the old SSTable files as well as the new. Reads for records across multiple files are bad enough as it is. But now you'd be forcing Cassandra to read and collate results from two formats.
So while I wouldn't say it's "mandatory," I'd definitely say it qualifies as a "good idea."
Yes, you need to run nodetool sstableupgrade on each node after cassandra upgrade as you are upgrading from 2.2.x to 3.11.4. sstable file format and ext also will change. You may run this process on background and it will not create any issue. please refer below links for more details https://blog.thethings.io/upgrading-apache-cassandra-cluster/

Can we use sstableloader when the cluster is in partially upgraded state?

We have a cluster with 2 nodes on dse4.8 and one on dse4.5 . can we use sstableloader to stream snapshot data of dse4.5 in the cluster ?
Streaming is one of the operations you should avoid until your cluster is fully upgraded. Note during an upgrade you may see a schema mismatch across nodes. The upgrade limitations docs here outline some of the things you should avoid:
https://docs.datastax.com/en/upgrade/doc/upgrade/datastax_enterprise/upgrdDSE47to48.html#upgrdDSE47to48__upglim
I can see that you're upgrading to DSE4.8 from DSE4.5. These versions use Cassandra 2.1 and 2.0 respectively. The sstable format changed between these two versions. So make sure you run upgradesstables also
It would be a good idea to complete your upgrade and then try to stream the data. You should use the DSE4.8 / C2.1 sstableloader to do the loading. It should stream in the older format tables. The following jira seems to infer that support for this was added
https://issues.apache.org/jira/browse/CASSANDRA-5772

Regarding upgrade from 2.0.3 to 2.0.7

I am currently planning for an upgrade to 2.0.7 cassandra version . My base version is 2.0.3. I have not done an upgrade so far and hence want to be absolutely sure about what am doing . Can someone explain what needs to be done apart front this.
Do a nodetool drain to stop all writes to the particular node.
Stop the cassandra node(I have a 8 node , 2 data center network topology. I am bringing down one node in DC1)
Change the cassandra.yaml accordingly in the new binary tarball.
Make the required changes for the new node(using gossiping property file snitch. So , making changes for that)
Start off the new cassandra binary(2.0.7)
Question striking me the most
Do I have to copy the data from 2.0.3 to 2.0.7?
2.Even if it's a rolling upgrade , I think the following steps will do( Except moving from one version to another ) . My assumption is right?
Am going to do this operation on a running application. Am planning to have the application running while doing this as I have enough replicas in local quorum to satisfy reads and writes. Does this idea have any disadvantages ? I loved cassandra for this kind of operation but would like to know of there are any potential problems ?
I will be having the existing 2.0.3 in my running machine while doing this. If there is a problem in 2.0.7 , I shall start off 2.0.3 version again right? Just wanted to know whether there will be any data conflicts with other nodes in the cluster? Or having a snapshot to recover the data is the best option?
5.Apart from this, any other thing I have bear in mind?
Do I have to copy the data from 2.0.3 to 2.0.7? 2.Even if it's a rolling upgrade , I think the following steps will do( Except moving from one version to another ) . My assumption is right?
If you just upgrade the binaries, you can leave all of the data in place and it will use it automatically.
Am going to do this operation on a running application. Am planning to have the application running while doing this as I have enough replicas in local quorum to satisfy reads and writes. Does this idea have any disadvantages ? I loved cassandra for this kind of operation but would like to know of there are any potential problems ?
Normal read and write operations are fine. While you are temporarily running a mixed-version cluster, it's best to avoid doing anything that involves streaming (repairs) or topology changes (bootstrapping or decommissioning nodes). They might work, but they're not officially supported and you're more likely to have problems.
I will be having the existing 2.0.3 in my running machine while doing this. If there is a problem in 2.0.7 , I shall start off 2.0.3 version again right? Just wanted to know whether there will be any data conflicts with other nodes in the cluster? Or having a snapshot to recover the data is the best option?
You want to have a snapshot to recover from. Newer versions of Cassandra may use new SSTable or commitlog formats which the older version will not be able to read.

migrating cassandra from 1.1.2 to 1.2.6

My current cassandra version is 1.1.2, it is implemented with a single node cluster, i would like to upgrade it 1.2.6 with multiple nodes in the ring. is it a proper way to migrate it directly to 1.2.6 or i should follow version by version migration.
I found the upgrading steps from this link
http://fossies.org/linux/misc/apache-cassandra-1.2.6-bin.tar.gz:a/apache-cassandra-1.2.6/NEWS.txt.
There are 9 other releases available between this two versions.
I migrate a two cluster nodes from 1.1.6 to 1.2.6 without problems and without doing version by version. Anyway, you should take a closer look into:
http://www.datastax.com/documentation/cassandra/1.2/index.html?pagename=docs&version=1.2&file=index#upgrade/upgradeC_c.html#concept_ds_smb_nyr_ck
Because there are a lot of new features from version 1.2 like the partioners maybe you need to change some configurations for your cluster.
You may directly hop on to C1.2.6.
We migrated our 4-node cluster from C1.0.9 to C1.2.8 recently without any issues. This was a rolling upgrade i.e. upgrade one node at a time and after each upgrade of a node, allow the cluster to stabilize (depends upon the traffic during upgrade)
These are the steps that we followed:
Perform below on each node,
Run Disablegossip and disablethrift, such that this node is seen as DOWN by other nodes.
flush/drain the memtables, run compaction to merge SSTables
take snapshot and enable incremental backups
This stops all the other nodes/clients from writing to this node and since memtables are flushed to disk, startup times are fast as it need not walk-through commit logs.
stop Cassandra (though this node is down, cluster is available for write/read, so zero downtime)
upgrade sstables to new storage format using sstableupgrade
install/untar Cassandra 1.2.8 on the new locations
move upgraded sstables to appropriate location
merge Cassandra.yaml from previous version and current version by a manual diff (need to detail out difference)
start Cassandra
watch the startup messages to ensure the node comes up without difficulty and is shown in the ring with mixed 1.0.x/1.2.x

Resources