No results when query for old records in Cassandra after bulk restore using SSTableloader

No results when query for old records in Cassandra after bulk restore using SSTableloader - cassandra

I am having a very weird situation.
I was having a 3 node cluster each having arrund 300GB data.
I had to move the cluster to new 5 node cluster.
I used sstableLoader to move the data.
Once sstableloader is completed successfully I can query for the restored records over cqlsh.
But I can not query for those restored records from java code.
Any help to solve this problem would be appreciated.

Related

Cassandra Cluster Migration Issues

Now:
Single Node
cassandra 3.11.3 + kairosdb 1.2
Two data storage path
/data/cassandra/data/kairosdb 4T Old data
/data1/cassandra/data/kaiosdb 1.1T Now Wrting data
target:
Three Node
cassandra 3.11.3 + kairosdb 1.2
One data storage path
/data/cassandra/data/kairosdb
In this case, how to migrate the data in two data directories under a single node to a three-node cluster, each node of this three-node cluster has only one data directory
I understand how to do it (and have practiced it) when migrating a single node to a three-node cluster, but only when there is only one data directory.2 data directories are migrated to 1,I have searched the Internet for a long time, but there is no reference material.

Data directories are something that the individual Cassandra node cares about but the Cluster doesn't.
Usually you'd want to have all nodes share the same configuration but for replication it really doesn't matter where the SSTables are on Disk on each node.
So migrating here would be the same as you've practiced.
That said the process I'd choose would be to add the new nodes as a second DC with the right replication, run a repair to have all the data in sync and then decommission the original node.

Cassandra back up and recovery - drop table / schema alter

I am working on a cassandra backup and recovery strategy for our cassandra system and am trying to understand how the backup and sstable recovery works in cassandra. Here are of my observations and related questions (my need is to setup a standby/backup cluster which would become active if the primary cluster goes down.. so I want to keep them in sync in terms of data, so I want to take periodic backups at my active cluster and recover to the standby cluster)
Took a snapshot backup. Dropped a table in cassandra. Stopped cassandra, recovered from the snapshot backup (copied the sstables to the data/ folder), started cassandra. Ran cqlsh on the node, and I still do not see the table created. Should this work? Am I missing any step ?
In the above scenario, I then tried to re-setup the schema (I take backup of the schema in the snapshot) using the cql commant source . This created the table for me. However it creates a "new version" of table for me. When I recover the snapshot has the older version (different uuid labelled folders for table). After recovery, I still see no data in the table. Possibly because I created a new table?
I was finally able to recover data after running nodetool repair and using sstableloader to restore table data from another node in the cluster.
My question is
a. what is the right way to setup a new (blank- no schema) cluster from a snapshot? How do you setup the schema and recover data?
b. how do you restore a cluster from a backup with table alterations. How do you bring a cluster running an older version of schema to a newer version of schema when recovering from a backup (snapshot or incremental)?
(NOTE: cassandra newbie here)

So if you want to restore a snapshot, you need to copy the snapshot files back to the sstable directory and then run: nodetool refresh. You can read:
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsBackupSnapshotRestore.html
for more information. Also, there are 3rd party tools that can back up your data and then restore it as it was at the time of the backup. We use a tool: Cohesity (formally Talena/Imanis). It has a lot of capabilities (refreshing A to B, restore/rename, etc.). There are other popular ones as well. All of them have costs associated with them.
Hopefully that helps?
-Jim

Cassandra cluster bulk loader hangs during export

I am migrating a simple 4 node Cassandra cluster from one cloud provider to another. The number of nodes in both the clouds are same however the newer cluster is at version 3.11.0 and the older one is at 3.0.11. I am using sstableloader to stream data from one cluster to another (schema has been created on new cluster separately). As per the release notes this should not be a problem.
However, for certain column families with sstableloader I get progress to 100% but then it hangs there for hours (time hang >> time to stream). The total data to stream on each node is below 500 GB. Any help on why this is happening and how to avoid is appreciated.

Create a new node and add to the existing cluster from the new cloud server
Flush tables from the memtable to SSTables on disk
Delete one node from old cloud server. Like wise repeat for each node.

From where cassandra fetch data for select?

I understand the concept of SSTable in Cassandra. I have also tested the different version of files create with insert after nodetool flush.
I have also setup a snapshot backup and incremental back and tested it's working fine.
For testing purpose i deleted all the sstable files from all the nodes. Strangely , am still able to select the data.
Can someone please explain me from where cassandra is fetching the data ?
Regards
Sid

The record you queried was available in the ROW cache aka memtables(memory).
So once you restart your node strangely you will again get back the result because the commit logs got replayed eventually building those SSTables for you.
Clear all the SSTables and commit logs and restart your node .And then you can observe that you get no records for your query .

Copying Cassandra data from one cluster to another

I have a cluster setup in Cassandra on AWS. Now, I need to move my cluster to some other place. Since taking image is not possible, I will create a new cluster exactly a replica of the old one. Now I need to move the data from this cluster to another. How can I do so?
My cluster has 2 data centers and each data center has 3 Cassandra machines with 1 seed machine.

Do you have connectivity between the old and new cluster? If yes, why not linking the cluster and let cassandra replicate the data to the new cluster? After data is transferred, shut down the old cluster. Ideally you wouldn`t even have any downtime.

You can take your SSTables from your data directory and then can use sstableloader in new data center to import the data.
Before doing this activity you might consider doing compaction so that you have only one SSTable per table.
SSTable Loader

Using SFTP server or through some other way, transfer the SSTables from old cluster to new cluster (one DC is enough) and use SSTableLoader. The data replication to another DC will be taken care by Cassandra.

In cassandra there are two type of strategy SimpleStrategy and NetworkTopologyStrategy by using NetworkTopologyStrategy you can replicate in different cluster. see this documentation Data replication
You can use COPY command to export and import csv from one table to another
Simple data importing and exporting with Cassandra

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string