How to take backup and Restore huge Cassandra Database tables?

How to take backup and Restore huge Cassandra Database tables? - database-administration

Need to Migrate Cassandra 2.0 to 3.11 to new server
Old Server : Cenos5/6
Old Cassandra Version : 2.0
New Server : Centos8
New Cassandra Version - 3.11
There are few tables with 20 Million records tried Snapshot and Copy method but the Backup is not being restored.
Any Better approach?
Any other tool?
Tried Snapshot Method - which is not working may be coz of version difference
Tried COPY method but that is only working for small tables, I used it for other tables which were small in size.
I have 4 tables which have 5 million to 20 millions records.

Related

Cassandra back up and recovery - drop table / schema alter

I am working on a cassandra backup and recovery strategy for our cassandra system and am trying to understand how the backup and sstable recovery works in cassandra. Here are of my observations and related questions (my need is to setup a standby/backup cluster which would become active if the primary cluster goes down.. so I want to keep them in sync in terms of data, so I want to take periodic backups at my active cluster and recover to the standby cluster)
Took a snapshot backup. Dropped a table in cassandra. Stopped cassandra, recovered from the snapshot backup (copied the sstables to the data/ folder), started cassandra. Ran cqlsh on the node, and I still do not see the table created. Should this work? Am I missing any step ?
In the above scenario, I then tried to re-setup the schema (I take backup of the schema in the snapshot) using the cql commant source . This created the table for me. However it creates a "new version" of table for me. When I recover the snapshot has the older version (different uuid labelled folders for table). After recovery, I still see no data in the table. Possibly because I created a new table?
I was finally able to recover data after running nodetool repair and using sstableloader to restore table data from another node in the cluster.
My question is
a. what is the right way to setup a new (blank- no schema) cluster from a snapshot? How do you setup the schema and recover data?
b. how do you restore a cluster from a backup with table alterations. How do you bring a cluster running an older version of schema to a newer version of schema when recovering from a backup (snapshot or incremental)?
(NOTE: cassandra newbie here)

So if you want to restore a snapshot, you need to copy the snapshot files back to the sstable directory and then run: nodetool refresh. You can read:
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsBackupSnapshotRestore.html
for more information. Also, there are 3rd party tools that can back up your data and then restore it as it was at the time of the backup. We use a tool: Cohesity (formally Talena/Imanis). It has a lot of capabilities (refreshing A to B, restore/rename, etc.). There are other popular ones as well. All of them have costs associated with them.
Hopefully that helps?
-Jim

No results when query for old records in Cassandra after bulk restore using SSTableloader

I am having a very weird situation.
I was having a 3 node cluster each having arrund 300GB data.
I had to move the cluster to new 5 node cluster.
I used sstableLoader to move the data.
Once sstableloader is completed successfully I can query for the restored records over cqlsh.
But I can not query for those restored records from java code.
Any help to solve this problem would be appreciated.

Cassandra upgrade to 6.0 version from 3.2.4

We have a production environment, where we have a 4 node Cassandra cluster and this environment has a data with TTL of 730 days and the data has accumulated to a very large amount (14 TB of data). We know this is not ideal. We have a spring based java application using JDBC. The writes are around 1000 recs/sec.
The current activity we want to do as part of maintenance upgrades to Cassandra 6.0 from 3.2.4 so that in the new cluster we want to follow ideal Cassandra node configuration of having 1tb of data per node. What would be the ideal way of migration to 6.0 version of Cassandra? Without affecting the latency in the application. Also with ZDT(zero downtime) in Cassandra. 12 TB is a huge amount of data and compaction is a daunting task. We want to rectify this.
One solution we came up was using an offline and online model where old database 3.2.4 would still remain and new cluster Cassandra 6.0 would have smaller TTL. The only concern is that we want to avoid is latency in the application. Can Replication across DC with different versions of Cassandra help ?
Don't know the design decisions made during the development phase. But we want to rectify as part of maintenance.
Correct me if our understanding is wrong.

Cassandra cluster bulk loader hangs during export

I am migrating a simple 4 node Cassandra cluster from one cloud provider to another. The number of nodes in both the clouds are same however the newer cluster is at version 3.11.0 and the older one is at 3.0.11. I am using sstableloader to stream data from one cluster to another (schema has been created on new cluster separately). As per the release notes this should not be a problem.
However, for certain column families with sstableloader I get progress to 100% but then it hangs there for hours (time hang >> time to stream). The total data to stream on each node is below 500 GB. Any help on why this is happening and how to avoid is appreciated.

Create a new node and add to the existing cluster from the new cloud server
Flush tables from the memtable to SSTables on disk
Delete one node from old cloud server. Like wise repeat for each node.

Issues clearing data from a keyspace on Cassandra 2.1.3 + Stargate

Our QA Team has requested that we completely clear all data within the app's keyspace on our Cassandra 2.1.3 server prior to testing. (Cassandra 2.1.3 is running on an Ubuntu 14.04LTS Azure D12 instance [4 cores, 28GB Memory]).
We have attempted to TRUNCATE the column families and had problems with both Cassandra and Stargate index corruption afterwards. (returning incorrect/no data).
We have attempted to DELETE the data from the column families and had the same problem with indexes and tombstoning.
We were told to use DROP KEYSPACE with snapshot turned off; this resulted in Cassandra shutting down with all remote connections forcibly shut down, a partially deleted state on several occasions where we were able to access the keyspace via DevCenter, but it did not appear in the schema_keyspaces table, and/or corrupted indexes.
There are less than 100,000 records across 30 column families, so not a whole lot of data.
We cannot upgrade Cassandra to the latest version because Stargate only supports the C* 2.1.3 version.
Any other recommendations of how we can resolve this problem?

We answered the question internally.
Remove StarGate. Once we removed StarGate, the TRUNCATE and DROP KEYSPACE functionalities began to work appropriately again.
We notified StarGate support.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string