Cassandra: Simplest way to backup data from one machine and restore on a fresh machine - cassandra

I have cassandra running on a single machine. I need to backup a particular keyspace from there and setup the same schema with all the data on my local machine.
I understand that I can run the nodetool snapshot command take the point in time snapshot of the keyspace.
But from the documentation, I could understand that it requires the schema to exist. Is there not any command which can take the backup with the schema and restore it to another machine? The data is very small, hardly a few MBs.

If you have the same version of Cassandra on the single machine and on your local machine, there is a brute force solution (not to be used in production):copy all the folder $CASSANDRA_HOME/data (or sometimes /var/lib/cassandra/data) from one machine to another ...

Related

Cassandra backup restore

I am restoring Cassandra[3.10] backup using the snapshots. I have taken the backup of all the keyspaces, but there are additional keyspaces in Cassandra like system_distributed,system_auth,system_schema, system.
My question is while restoring do we also need this to be restored?
Below is the link that I followed
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_backup_snapshot_restore_t.html#ops_backup_snapshot_restore_t
You should NOT restore system keyspaces, except for the system_auth keyspace (you need this for the logins to work). You can run into issues by doing so. I supported recently a production system which had that type of problems because they backed up and restored the system keyspaces. I can't remember what the issue was, but I remember it had to do with the restore of these tables that kept values which should not be "recycled."
Cassandra will create the system keyspaces on startup if they don't exist already.
Equally important is to backup a schema version. You will need the schema corresponding to the backup that you are restoring. Otherwise, if your schema changed since your last backup, the restore will not go well.

Cassandra: Unable to read keyspace from data directory

I have a single-node Cassandra setup for my application. To reclaim disk space occupied by deleted records (tombstoned records), I triggered a nodetool compact for my keyspace. Unfortunately, this compaction process got interrupted. Now, when I try to re-start the service, it does not recognise the keyspace (from the data directory configured in cassandra.yaml) for which compaction was in progress when it got interrupted. Other keyspaces like system and system_traces are successfully initiated from the same data directory.
Has anybody encountered a similar issue before? Also, pointers to restore a keyspace only from data files would be of great help (for the lack of maintenance of snapshots).
PS: Upon analysing further it was found that an rm command on the cassandra data directory was issued but immediately cancelled. Most of the data seems to be in place, but there is a chance that the Data.db file of the system keyspace was lost. Is there a way to recover from this state?
Seems like you have corrupted your setup by deleting System keyspace files, hence Cassandra might not be checking the same at boot time.
Try this:
Download same version of cassandra again.
Create your keyspace & cf schemas
Move whatever old data is left to new data directory(cassandra will only load the non-corrupted data) -
sudo mv /data/cassandra_old/data/[keyspace]/[cf]-[md5-old]/* /data/cassandra_new/data/[keyspace]/[cf]-[md5-new]/
It should solve it if I understand the problem correctly.

What are different ways to backup and restore cassandra cluster?

I am trying to backup the whole cluster consistently. What are different ways to backup and restore Cassandra cluster?
If you are using the DataStax Enterprise version, then the easiest way is to perform the backups and restore using OpsCenter.
If you are using the DataStax Community or open-sourced version of Cassandra, then use nodetool snapshot to create backups of tables and/or keyspaces.
Please bear in mind that SSTables are immutable, i.e. they never change once they are written to disk. So unlike RDBMS data files, SSTables are not updated.
To perform a snapshot cluster-wide, use SSH tools such as pssh to perform parallel snapshots on all nodes.
More information on the snapshot utility is available here.
There are several ways to restore from snapshots. One way is to re-load the data using the sstableloader tool where the data is read back into the cluster. Another way is by copying the SSTable directory from snapshot and running nodetool refresh. Finally, you can replace the existing data with the snapshot and restarting the node.
More information on backups and restores are available here.

Cassandra: how move a database from one server to another quickly?

It's a backup restore question. We have a single node running Cassandra. We have also rsync'ed the /var/lib/cassandra folder into another (backup) server.
In case of emergency, we can quickly boot a new server and transfer files there. But the question is: will it work then? Let's assume we have the same Cassandra versions on both old and the new server, and the same OS version. Is it enough to simply transfer the whole /var/lib/cassandra folder? As you understand, backups is a critical thing so I want to be sure everything will be ok.
(we currently are using dsc2.0 package from the Ubuntu's repos)
Yes, running a 'normal' cluster with 2-3 nodes would be a better choice, I know. Both performance and reliability would increase. But for now we have what we have - it's a single node. And for some reasons right now we will not switch to a multinode cluster.
Thanks!

Cassandra store Keyspace to new Disk

I just setup a fresh windows server with a fresh datastax installation including cassandra 1.2 and opscenter 2.1.3. I've tried finding solutions to these questions on cassandra wikis and datastax website, but I can only find unix specific information or datastax API information.
Cassandra is defaulted to using C: drive (I was never asked to select a drive for cassandra during install).
In the same cassandra instance, can I have keyspaces on separate
disks?
If not, how do I migrate the existing keyspace to the new
drive? (just reconfiguring cassandra.yaml to use a new directory
would lose my opscenter data and may even break opscenter).
If yes, how can I create a new keyspace on a separate drive? cassandra.yaml
seems to only have configuration options for a single store location.
Should I be creating a new cluster to store my data in? If I start
adding new nodes to the default cluster, that will mean the datastax
opscenter data will be getting replicated - that seems like a bad
idea.
If there is good documentation on this somewhere, please point me there.
Thanks,
Adam
You cannot get cassandra to split the keyspaces and store them in different directories. They are all stored under a common data directory that is specified in the cassandra.yaml file.
However, you can set this up and use NTFS to mount different drives under the data directory on your server but this will not be simple or expandable.
If you want to move where the data is stored on cassandra, then stop the cassandra daemon/service, change the cassandra.yaml file to store the data at a new location, then copy/move the entirety of the data directory to this new location. THEN start cassandra back up and it will work fine with the data in the new location. I have done this quite a few times now and cassandra comes back up without incident and no lost data (if you do not move the data, then it will lose it all and recreate the directory structure under the new location).
Data getting replicated is not a bad thing - it is what cassandra was designed for. I don't know what replication factor opscenter uses, but it does not store a massive amount of data so replication is not a problem.

Resources