Cassandra restore from snaphot not restoring entire data

Cassandra restore from snaphot not restoring entire data - cassandra

I am trying to implement Cassandra(2.2.6) backup and restore using nodetool commands. Currently I have four nodes in Cassandra cluster out of which two are seed nodes with incremental_backup : enabled
I have followed below steps for backup and restore on one of the nodes:
Creating a snapshot of a keyspace with a sample table:
nodetool snapshot testkeyspace
Drop the test table from the given keyspace : cqlsh> Drop table test
Recreate the schema of the above dropped table
Stop the Cassandra service
Clear cassandra-commitlog directory
Copy the content of backup directory and latest snapshot directory from the old directory of test table into the new test table directory
Start Cassandra service
Run nodetool repair
Run nodetool refresh testkeyspace test
After the restore when I query the table, some of the records are missing. I am not sure what exactly is going wrong.

Related

Is it safe to copy cassandra snapshot files over sstable files in a running node?

Edited after reading nodetool tagged questions.
We take snapshots of our single node cassandra database daily. If I want to restore a snapshot either on that node, or on our staging server which is running a different instance of cassandra, my understanding is I have to:
nodetool disablegossip
nodetool disablebinary
nodetool drain
Copy the sstable files from the snapshot directories to the sstable directories under the keyspace directory.
Run nodetool refresh on each table.
Enable binary & gossip.
Is this sufficient to safely bring the snapshot sstable files in without cassandra overwriting them while I'm doing the refresh?
What is the opposite of nodetool drain?
Another edit: What about sstableloader? Should I use that instead? If so, how? I looked at the "documentation" and am none the wiser.

The steps you outlined isn't quite right. You don't shutdown Cassandra and you shouldn't just copy the files on top of the existing SSTables.
At a high level, the steps to restore table snapshots on a node are:
TRUNCATE the table you want to restore (will remove the SSTables from the data directories).
Copy the SSTables from data/ks_name/table-UUID/snapshots/snapshot_name subdirectory into the "live" data directory data/ks_name/table-UUID.
Run nodetool refresh -- ks_name table_name.
You will need to repeat these steps for each application table you want to restore. NOTE: Do NOT restore system tables, only application tables.
The detailed steps are documented in Restoring from a snapshot in Cassandra.
To restore a snapshot into another cluster, I prefer to refer to this as "cloning". The procedure for cloning snapshots to another cluster depends on whether the source and destination clusters have identical configuration.
If both source and destination clusters are identical, follow the steps I documented here -- https://community.datastax.com/questions/4534/. I've explained what identical configuration means in this post.
If they are not identical, follow the steps I documented here -- https://community.datastax.com/questions/4477/. Cheers!

Cassandra backup system keyspaces

I have 3 node cassandra cluster and I have a script which backup all of the keyspaces, but when it comes to restore on fresh cluster, data keyspaces restored correctly, but system_* keyspaces not.
So it is necessary to backup system keyspaces in cassandra?

You will need to backup the keyspace system_schema at the same time, as it will contain the definition of keyspaces, tables, and columns. The other system* keyspaces should be left untouched.

Fresh cluster makes own setup like token ranges etc based on configurations.you can restore the cluster on new cluster but you need to create schema and configuration same as old cluster. There is many ways to take backup and restore procedure based on requirements as below:-
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupRestore.html

Cassandra back up and recovery - drop table / schema alter

I am working on a cassandra backup and recovery strategy for our cassandra system and am trying to understand how the backup and sstable recovery works in cassandra. Here are of my observations and related questions (my need is to setup a standby/backup cluster which would become active if the primary cluster goes down.. so I want to keep them in sync in terms of data, so I want to take periodic backups at my active cluster and recover to the standby cluster)
Took a snapshot backup. Dropped a table in cassandra. Stopped cassandra, recovered from the snapshot backup (copied the sstables to the data/ folder), started cassandra. Ran cqlsh on the node, and I still do not see the table created. Should this work? Am I missing any step ?
In the above scenario, I then tried to re-setup the schema (I take backup of the schema in the snapshot) using the cql commant source . This created the table for me. However it creates a "new version" of table for me. When I recover the snapshot has the older version (different uuid labelled folders for table). After recovery, I still see no data in the table. Possibly because I created a new table?
I was finally able to recover data after running nodetool repair and using sstableloader to restore table data from another node in the cluster.
My question is
a. what is the right way to setup a new (blank- no schema) cluster from a snapshot? How do you setup the schema and recover data?
b. how do you restore a cluster from a backup with table alterations. How do you bring a cluster running an older version of schema to a newer version of schema when recovering from a backup (snapshot or incremental)?
(NOTE: cassandra newbie here)

So if you want to restore a snapshot, you need to copy the snapshot files back to the sstable directory and then run: nodetool refresh. You can read:
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/operations/opsBackupSnapshotRestore.html
for more information. Also, there are 3rd party tools that can back up your data and then restore it as it was at the time of the backup. We use a tool: Cohesity (formally Talena/Imanis). It has a lot of capabilities (refreshing A to B, restore/rename, etc.). There are other popular ones as well. All of them have costs associated with them.
Hopefully that helps?
-Jim

Not able to restore cassandra data from snapshot

We have a regular backup of our cluster and we store schema and snapshot back up on aws s3 on daily basis.
Somehow we have lost all the data and while recovering the data from backup we are able to recover schema but while copying snapshots files to /var/lib/cassandra/data directory its not showing up the data in the tables.
After copying the data we have done nodetool refresh -- keyspace table but still nothing is working out.
could you please help on this ?

Im new at Apache Cassandra, but my first focus at this topic was the Backup.
If you want to restore from a Snapshot (on new node/cluster) you have to shut down Cassandra on any node and clear any existing data from these folders:
/var/lib/cassandra/data -> If you want to safe your System Keyspaces so delete only your Userkeyspaces folders
/var/lib/cassandra/commitlog
/var/lib/cassandra/hints
/var/lib/cassandra/saved_cashes
After this, you have to start Cassandra again (the whole Cluster). Create the Keyspace like the one you want to restore and the table you want to restore. In Your Snapshot folder you will find a schema.cql script for the creation of the table.
After Creating the Keyspaces an tables again, wait a moment (time depends on the ammount of nodes in your cluster and keypsaces you want to restore.)
Shut down the Cassandra Cluster again.
Copy the Files from the Snapshot folder to the new folders of the tables you want to restore. Do this on ALL NODES!
After copying the files, start the nodes one by one.
If all nodes are running, run the nodetool repair command.
If you try to check the data via CQLSH, so think of the CONSISTENCY LEVEL! (ALL/QUORUM)
Thats the way, wich work at my Cassandra cluster verry well.

The general steps to follow for restoring a snapshot is:
1.Shutdown Cassandra if still running.
2.Clear any existing data in commitlogs, data and saved caches directories
3.Copy snapshots to relevant data directories
4.Copy incremental backups to data directory (if incremental backups are enabled)
If required, set restore_point_in_time parameter in commitlog_archiving.properties to
restore point.
5.Start Cassandra.
6.Run repair
So try running repair after copying data.

cassandra snapshot restore on different cluster on missing schema

I have snapshot backup from cassandra cluster 1, and need to restore the same on cassandra cluster 2. Is it possible to do so, without having schema?

Cassandra table-schema details will be stored as meta data in system keyspaces.
Cassandra snapshot just creates a copy of sstables for the requested keyspace/column_familes. So to restore the snapshot, you need to explicitly create the schema in destination cluster.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string