Issues clearing data from a keyspace on Cassandra 2.1.3 + Stargate - cassandra

Our QA Team has requested that we completely clear all data within the app's keyspace on our Cassandra 2.1.3 server prior to testing. (Cassandra 2.1.3 is running on an Ubuntu 14.04LTS Azure D12 instance [4 cores, 28GB Memory]).
We have attempted to TRUNCATE the column families and had problems with both Cassandra and Stargate index corruption afterwards. (returning incorrect/no data).
We have attempted to DELETE the data from the column families and had the same problem with indexes and tombstoning.
We were told to use DROP KEYSPACE with snapshot turned off; this resulted in Cassandra shutting down with all remote connections forcibly shut down, a partially deleted state on several occasions where we were able to access the keyspace via DevCenter, but it did not appear in the schema_keyspaces table, and/or corrupted indexes.
There are less than 100,000 records across 30 column families, so not a whole lot of data.
We cannot upgrade Cassandra to the latest version because Stargate only supports the C* 2.1.3 version.
Any other recommendations of how we can resolve this problem?

We answered the question internally.
Remove StarGate. Once we removed StarGate, the TRUNCATE and DROP KEYSPACE functionalities began to work appropriately again.
We notified StarGate support.

Related

Does cassandra upgrade require to run nodetool upgradesstables for cluster holding TTLed data

I am running 3 node apache cassandra cluster as docker container holding timeseries data with 45 days TTL.
I am planning to upgrade the current cassandra version 2.2.5 to cassandra 3.11.4 release. Following steps are identified for the upgrade -
Backup existing data
Flush one of the cassandra node
bin/nodetool -h cassandra1 -u ca_itoa -pw ca_itoa drain
Stop the cassandra1 node
Start the new cassandra 3.11.4 container
Upgrade the SSTable
bin/nodetool -u ca_itoa -pw ca_itoa upgradesstables
Check the node status. Repeat the process for the rest of the nodes
I have few questions about the upgrade process -
Are the steps correct?
Is it manodatory to run upgradesstables command. It is time consuming, and I want to see if I can avoid. The data has TTL set. Will the cassandra continue writing in new SSTable format whereas the old SSTable data get cleaned-up on expiring? Assumption is that, after 45 days, all SSTable would be in new shiny format.
Just some additional thoughts:
For Step #6, you actually don't have to run upgradesstables right away. In fact, if you're upgrading a production system, it's probably better that you don't until the application team verifies that they can connect ok. Remember, older versions of the driver which work in 2.2 may not work with 3.11.4.
To this end, I would wait until the entire cluster is running on the new version before running upgradesstables on each node.
Is it manodatory to run upgradesstables command?
As each Cassandra version is capable of reading its own SSTable format as well as the prior major version, I guess it's not mandatory. But it's definitely something that you should want to do. Especially when upgrading to 3.x.
Cassandra 3 contains a significant upgrade to the storage engine, which results is a much smaller disk footprint. One cluster I upgraded saw a 90% reduction in disk needs.
Plus, you'd be incurring additional latency when reading records which may be spread across the old SSTable files as well as the new. Reads for records across multiple files are bad enough as it is. But now you'd be forcing Cassandra to read and collate results from two formats.
So while I wouldn't say it's "mandatory," I'd definitely say it qualifies as a "good idea."
Yes, you need to run nodetool sstableupgrade on each node after cassandra upgrade as you are upgrading from 2.2.x to 3.11.4. sstable file format and ext also will change. You may run this process on background and it will not create any issue. please refer below links for more details https://blog.thethings.io/upgrading-apache-cassandra-cluster/

the node went into the draining state after making keyspaces on Cassandra

While setting up the Cassandra on 15 nodes, we found that one of the node went into the draining state after making keyspaces on Cassandra.
We don’t know why the node went into the draining state. We are working with ALL CONSISTENCY setup, so in such case we are not able to query Cassandra.
Later we were trying to upgrade DC/OS to latest version along with latest Cassandra version, but this upgradation resulted in corruption of data on all the nodes. From this point on we were not able to add existing data as it was getting corrupted.

Cassandra Upgrade limitations

We are upgrading from DSE 4.5 to DSE 4.8.9 in 10 node PRODUCTION cluster.
We have daily batch jobs running in our application which bulk load the data in the cluster , some jobs TRUNCATE the tables and load fresh data and some loader jobs which continuously insert the data.
Consider these scenarios :
Case 1 :
Let say my one node has DSE 4.8 installed but upgradesstables is running .
All nodes are online at this moment and 2 different schema exist (9 nodes on dse4.5 and 1 node on dse4.8.9).
In this case , will TRUNCATE work ?
Case 2:
One of my nodes is fully upgraded to DSE 4.8 , which makes my cluster to be in partially upgraded state, all nodes online,2 schema exists (9 nodes on DSE 4.5 and 1 node on DSE 4.8).
Will TRUNCATE work in this case ?
Please suggest.
Thanks!
Its not recommended to issue a TRUNCATE command during an upgrade, this is one of the limitations outlined here
To quote the link:
Do not enable new features.
Do not run nodetool repair.
Do not issue these types of CQL queries during a rolling restart: DDL
and TRUNCATE.
During upgrades, the nodes on different versions might show a schema
disagreement.
Failure to upgrade SSTables when required results in a significant
performance impact and increased disk usage. Upgrading is not complete
until the SSTables are upgraded.
It should be a practice to upgrade the binaries first on all the nodes so that we have one schema across the cluster .
Avoid the use of TRUNCATE till all nodes have completed running "upgradesstables".
Comment given by markc is also to be noted :
Do not enable new features.
Do not run nodetool repair.
Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
During upgrades, the nodes on different versions might show a schema disagreement.
Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.

Cassandra upgrade from 2.0.x to 2.1.x or 3.0.x

I've searched for previous versions of this question, but none seem to fit my case. I have an existing Cassandra cluster running 2.0.x. I've been allocated new VMs, so I do NOT want to upgrade my existing Cassandra nodes - rather I want to migrate to a) new VMs and b) a more current version of Cassandra.
I know for in-place upgrades, I would upgrade to the latest 2.0.x, then to the latest 2.1.x. AFAIK, there's no SSTable inconsistency here. If I go this route via addition of new nodes, I assume I would follow the datastax instructions for adding new nodes/decommissioning old nodes?
Given the above, is it possible to move from 2.0.x to 3.0.x? I know the SSTable format is different; however, if I'm adding new nodes (rather than re-using SSTables on disk), does this matter?
It seems to me that #2 has to work - otherwise, it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously; otherwise, there would be mixed 2.x.x and 3.0.x versions running in the same cluster at some point.
Am I completely wrong? Does anyone have any experience doing this?
Yes, it is possible to migrate data to a different environment (the new vm's with the updated Cassandra using sstableloader, but you will need C* 3.0.5 and above, as that version added support to upload sstables from previous versions.
Once that the process is completed it is recommended to execute nodetool upgradesstables to ensure that there are no incompatibilities on the data, and a nodetool cleanup.
Regarding your comment ... it implies that any upgrade requiring SSTable upgrades would require all nodes to be taken offline simultaneously;... is not true; doing the upgrade one node at a time will create a mixed cluster with nodes with the two versions as you mentioned, which is not optimal, but will allow you to avoid any downtime in production. (Note that the impact of this operation will depend on the consistency level used in your application.)
Don't worry about the migration. You can simply migrate your Cassandra 2.0.X cluster to Cassandra 3.0.X. But its better if you migrate your cluster Cassandra 2.0.X to latest Cassandra 2.X.X then Cassandra 3.0.X. You need to follow some steps-
Backup data
Uninstall present version
Install the version you want to upgrade
Restore data
As you are doing migration, you need to be careful about your data always. For the data backup and restore you can follow two ways-
Creating snapshots of your sstables and then after installing the new version of cassandra, placing the files to the data location and run sstableloader.
Backup your schema's to a .cql file and copy all the tables to .csv and then after installing the new version of cassandra, source your schema from .cql and copy all the tables from every single .csv file.
If you are fully convinced how you will complete the migration then you can write a bash script to complete the backup and restore steps.

Migrate Datastax Enterprise Cassandra to Apache Cassandra or Datastax Community?

I have a large, but simple Cassandra database on a Datastax 4.6 cluster. The license renewal is prohibitive for this very simple use case and I am trying to migrate to either a straight Apache or Datastax Comunity version. First is it possible to do an inline update?
I have altered all the keyspaces to remove the "EverywhereStrategy" replication strategy but I still get an error that the DSC version of cassandra I'm trying to get to join the cluster doesn't support it. I'm using Like Cassandra versions (2.0.16) and most other things seem to be close.
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
If it's not possible to do an inline upgrade what would be the best strategy to migrate a decent size (30 node, 150Tb) cluster?
So to make this work you have to extract any of the DSE features that you may have on any of your tables.
This meant I had to change the replication strategy on the dse_system table from EverywhereStrategy to SimpleStrategy with RF=3 (or almost anything after conversion you can drop this keyspace) The error message was:
java.lang.RuntimeException: org.apache.cassandra.exceptions.ConfigurationException: Unable to find replication strategy class 'org.apache.cassandra.locator.EverywhereStrategy'
I Also had to drop the unused CFS keyspaces. We never used the hadoop/CFS integration so we had nothing in these keyspaces anyway. I didn't capture the error for this.
We did have a solr index on a table we were testing on this cluster about a year ago so I had to drop this columnfamily. The error message was:
java.lang.RuntimeException: java.lang.ClassNotFoundException: com.datastax.bdp.search.solr.Cql3SolrSecondaryIndex
There may be other incompatibilities if you use other features of Datastax Enterprise that you would have to remove, but this was enough for me to get the migration working.
dse-core.jar contains the EverywhereStrategy class.
We solved this problem by doing the following:
Replace everything except the above JAR so nodes can come up fine. Once all nodes are migrated to OSS, drop the dse_system keyspace (that uses this replication), delete the JAR and restart the nodes one by one.

Resources