Recovering Cassandra Data From Files - cassandra

I have a development machine that had a single-node Cassandra 2.1.2 setup. The main drive failed, but the secondary drive, mounted at /var, is good. I was able to connect this drive to another system with a working OS and Cassandra install and mount it.
I can see the files for my database under /var/cassandra/data// and would like to recover that data. Nothing outside of /var survived -- no configs or binaries.
Is it as simple as copying that directory to the /var/cassandra/data/ directory on the good system, or is there some other more detailed procedure for recovering/importing that data?

All you need to remember from your config files - is what was the partitioner. Now you should save your data files somewhere safe, create another single-node cluster, then follow the procedure described here.
The files in your saved data folder is what in these manuals is referred to as snapshot.

Related

Run Spark or Flink on a distributed file system other than HDFS or S3

Is there a way to run Spark or Flink on a distributed file system say lustre or anything except from HDFS or S3.
So we are able to create a distributed file system framework using Unix cluster, Can we run spark/flink on a cluster mode rather than standalone.
you can use file:/// as a DFS provided every node has access to common paths, and *your app is configured to use those common paths for sharing source libraries, source data, intermediate data, final data
Things like lustre tend to do that and/or have a specific hadoop filesystem client lib which wraps/extends that.

Cassandra: Simplest way to backup data from one machine and restore on a fresh machine

I have cassandra running on a single machine. I need to backup a particular keyspace from there and setup the same schema with all the data on my local machine.
I understand that I can run the nodetool snapshot command take the point in time snapshot of the keyspace.
But from the documentation, I could understand that it requires the schema to exist. Is there not any command which can take the backup with the schema and restore it to another machine? The data is very small, hardly a few MBs.
If you have the same version of Cassandra on the single machine and on your local machine, there is a brute force solution (not to be used in production):copy all the folder $CASSANDRA_HOME/data (or sometimes /var/lib/cassandra/data) from one machine to another ...

cassandra backup like mysql

I am learning Cassandra now-a-days, i have successfully backup and restore tables or keyspace mentioned in this URL.
But i am looking for following options
1)Take complete backup of a keyspace at different location other then mentioned directory in cassandra.yaml. -t option create directory in snapshot folder not different HDD location.
2) Or backup/restore procedure same like mysql.
Thanks
You have a few options...for small amounts of data, you can use COPY to backup / restore from csv:
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/copy_r.html
For larger stuff, you've got the right link. You essentially take a snapshot (which puts it in the folder you mention), and then use something like tar to zip the files and output to a different directory. This is what we're doing in production... we clear the previous snapshot, take a snapshot and tar the folder to a network backup.

Cassandra store Keyspace to new Disk

I just setup a fresh windows server with a fresh datastax installation including cassandra 1.2 and opscenter 2.1.3. I've tried finding solutions to these questions on cassandra wikis and datastax website, but I can only find unix specific information or datastax API information.
Cassandra is defaulted to using C: drive (I was never asked to select a drive for cassandra during install).
In the same cassandra instance, can I have keyspaces on separate
disks?
If not, how do I migrate the existing keyspace to the new
drive? (just reconfiguring cassandra.yaml to use a new directory
would lose my opscenter data and may even break opscenter).
If yes, how can I create a new keyspace on a separate drive? cassandra.yaml
seems to only have configuration options for a single store location.
Should I be creating a new cluster to store my data in? If I start
adding new nodes to the default cluster, that will mean the datastax
opscenter data will be getting replicated - that seems like a bad
idea.
If there is good documentation on this somewhere, please point me there.
Thanks,
Adam
You cannot get cassandra to split the keyspaces and store them in different directories. They are all stored under a common data directory that is specified in the cassandra.yaml file.
However, you can set this up and use NTFS to mount different drives under the data directory on your server but this will not be simple or expandable.
If you want to move where the data is stored on cassandra, then stop the cassandra daemon/service, change the cassandra.yaml file to store the data at a new location, then copy/move the entirety of the data directory to this new location. THEN start cassandra back up and it will work fine with the data in the new location. I have done this quite a few times now and cassandra comes back up without incident and no lost data (if you do not move the data, then it will lose it all and recreate the directory structure under the new location).
Data getting replicated is not a bad thing - it is what cassandra was designed for. I don't know what replication factor opscenter uses, but it does not store a massive amount of data so replication is not a problem.

Representing Network Drive as Cassandra Data Directory

I am new to cassandra. In cassandra,in order to store cores we do specify the local directory of cassandra installed machine using the property data_file_directories in Cassandra.yalm configuration file. My need is to define the data_file_directories as network directory(something like 192..x.x.x/data/files/). I am using only single node cluster for rapid data write(For logging activities). As I don't rely on replication, My replication factor is 1.Any one help in defining network directory for cassandra data directory....
thanks in advance......
1) I have stored the data for the cassandra on amazons EBS volume (Network volume), But in EC2 case it is simple as we can mount the EBS volumes on a machine as if it is a local one.
2) In other cases you will have to use NFS to configure the network directory.I have never done this but it looks straight forword.
Cassandra is firmly designed around using local storage instead of EBS or other network-mounted data. This gives you better performance, better reliability, and better cost-effectiveness.

Resources