How to add sstablesplit to an existing Cassandra cluster? - cassandra

I am a beginner in Cassandra and currently have a small cluster with replication factor 3 and most of the parameters being set to default.
What I noticed the other day is that the SSTables have become absolutely massive (>1TB) and the logs are now starting to complain that they cannot perform a compaction anymore. I've looked into it and decided to switch to the LevelCompactionStrategy, as well as performing an sstablesplit on my existing SSTables.
However, at that point I noticed that sstablesplit did not come with my installation of Cassandra. Is there a way of installing just that tool? All the guides I've seen talk about installing the entire Datastax tech stack, which would probably invalidate my existing cluster or require a great deal of reinstalling which at the moment I cannot do. The Cassandra installation was not set up by me.
At the same time, LCS is complaining that it cannot perform re-compaction because it's trying to recompact all SSTables at once, which, since they now take slightly more than 50% of hard drive space, it can't find enough space to do.
If sstablesplit is impossible (or inadvisable), is there any other way to resolve my issue of having several SSTables which are too massive to be re-compacted into more manageable chunks?
Thanks!

sstablesplit is part of the cassandra codebase, you can use it even without it being packaged. The cassandra-all jar and lib jars to classpath have everything to run it. This is all the sstablesplit script does: https://github.com/apache/cassandra/blob/trunk/tools/bin/sstablesplit.
Is this in AWS or some cloud platform where you can get larger hosts temporarily? Easiest is to replace the hosts with new hosts with 2x the disk space or something, migrate to LCS then switch back for costs.

Related

Cassandra directory setup of replica datacenter

I have a Cassandra cluster and plan to add a new datacenter to replicate data. There will be no write on this node, only reads.
My questions are:
in this case is it still recommended to have separate drives for commit log and data?
if I know, that my cluster will receive data only by hints (and lots of them) should I create a separate disk for the hints? I did not find any mention of this.
in this case is it still recommended to have separate drives for commit log and data?
So the whole idea of putting your commitlog on a separate mount point, goes back to spinning disks being a chokepoint for I/O. If you have your cluster/nodes backed by SSDs, you shouldn't need to do that.
if I know, that my cluster will receive data only by hints (and lots of them) should I create a separate disk for the hints?
Hints only build up when a node is down. When your writes happen, the Snitch handles propagation of all of the required replicas. So no, I wouldn't worry about putting your hints dir on a separate mount point, either.

Is it possible to recover a Cassandra node without a snapshot?

Offsite backups for Cassandra seem like a challenging thing. You basically have to make yet another copy of ALL your data, including the copies of data that exist due to the replication factor. Snapshots make backups easy when you don't mind storing it on the same disk that your node already uses. I'm curious - in the event of a catastrophic failure of this disk, is it possible to recover the node using the nodes that the data was replicated to?
Yes, you can restore data on crashed node using a procedure in documentation - Replacing a dead node or dead seed node. It's for Cassandra 3.x, please pick your Cassandra version from a drop-down menu on the top of the page.
But please note that you still need to do backups if your data is valuable. If you using AWS you can use this project to backup Cassandra to S3 storage.
If you are looking for offsite or off-host backups, you can also look at opscenter from Datastax or Talena software (my company). Both provide you the ability to backup your database locally or to S3. As you may expect, you also have the ability to restore data in case of hardware failures, user errors or logical corruptions which the replicas will not protect you against.
Yes, it is possible. Just execute in terminal "nodetool repair" on the node with missed data. It can take a lot of time. Also I would recommend execute repair operation on each node every month to keep your data always replicated because cassandra does not repairs data automatically (for example after node(s) falling).

Point-In-Time Cassandra backup & recovery

I have read about Cassandra backup & recovery here, and have a few questions:
Do the native Cassandra CLI commands suffice? I see a lot of people writing scripts and custom-making their own solutions.
What other tools out there would you recommend for Cassandra backup and recovery? I am looking for something that can help me manage the backup images (e.g. with point-in-time)
Do I need to invest significantly more into storage if I opt to backup my Cassandra tables?
Any insights would be appreciated.
Please try to limit your questions to one actual question.
Do the native Cassandra CLI commands suffice?
I assume that you mean nodetool snapshot, so for the most-part, "yes." In addition, many users choose to also enable incremental backups. With a combination of using snapshots and incremental backups (from the linked doc) "provides a dependable, up-to-date backup mechanism."
I see a lot of people writing scripts and custom-making their own solutions.
I have a backup script that runs on my nodes nightly. There are two reasons for this.
I don't want to have to manually take a snapshot for each keyspace every week, so I have the script do it.
Snapshot and incremental backup files don't remove themselves, so I have the script do that after a certain time threshold.
What other tools out there would you recommend for Cassandra backup and recovery?
DataStax OpsCenter allows you to schedule backups, but I believe that is only a valid option in the Enterprise edition. You could also look at Netflix's Cassandra backup/recovery tool called Priam. There's also a company called Talena which claims to provide an extensive enterprise-grade backup solution for Cassandra (I don't know anyone who uses them, but they hit me with a marketing email recently so I thought I'd mention it).
Do I need to invest significantly more into storage if I opt to backup my Cassandra tables?
Incremental backups and snapshots can take up a great deal of space if you don't stay on top of them (deleting and/or archiving them). I would try them both out, and keep an eye on your disk usage while you do. If your business requirements have a statement on terms of service (how far back you would need to be able to restore to), you should be able to figure out how many days-worth of backups it makes sense for you to keep around. That should tell you whether or not you need more disk to fulfill those obligations.
Edit 20181205
Do you run nodetool snapshot on each node? What would be the approach if there are three nodes with 100% replication.
Typically yes, nodetool snapshot needs to be run on each node. This helps to ensure backup coverage, as not all of the nodes may be responsible for all of the data.
However, if your cluster runs in a configuration where number of nodes equals your RF, then each node has a complete copy of the data. In that case, you would need to run nodetool snapshot on only one node; as long as you are confident that repairs are running regularly and your data is consistent.
With regards to point-in-time backup and recovery of Cassandra, there are a few aspects that you need to consider depending on what your needs and limitations are:
Storage Footprint
All the solutions available today will put a big strain on your infrastructure as they would require you to store 3x the data that you absolutely need to, assuming you have a replication factor of 3.
I agree with #Aaron, you need to manage the snapshots yourself because the tools will not do “garbage collection” for you :)
Failure resiliency
All the solutions out there, opscenter and others, provide limited failure resiliency. You will lose data if a Cassandra node goes down during a backup window.
This situation is exasperated when you have incremental backups and node failure happens during an incremental
Recovery time/speed
Note that you may have to go through a “repair” process during recovery. This is needed because the node level snapshots that the native tools provide are not consistent across the cluster.
Depending on your RTO/RPO needs, this may not be adequate. I suggest you test both the backup and recovery times for your operations before you arrive at any solution.
If you are looking for enterprise grade solution for backup and recovery of Cassandra, you may want to check out the solution offered by “Datos IO”. It reduces your storage footprint by 3x while also providing failure resiliency and cluster consistency.

Cassandra: how move a database from one server to another quickly?

It's a backup restore question. We have a single node running Cassandra. We have also rsync'ed the /var/lib/cassandra folder into another (backup) server.
In case of emergency, we can quickly boot a new server and transfer files there. But the question is: will it work then? Let's assume we have the same Cassandra versions on both old and the new server, and the same OS version. Is it enough to simply transfer the whole /var/lib/cassandra folder? As you understand, backups is a critical thing so I want to be sure everything will be ok.
(we currently are using dsc2.0 package from the Ubuntu's repos)
Yes, running a 'normal' cluster with 2-3 nodes would be a better choice, I know. Both performance and reliability would increase. But for now we have what we have - it's a single node. And for some reasons right now we will not switch to a multinode cluster.
Thanks!

Enabling vNodes in Cassandra 1.2.8

I have a 4 node cluster and I have upgraded all the nodes from an older version to Cassandra 1.2.8. Total data present in the cluster is of size 8 GB. Now I need to enable vNodes on all the 4 nodes of cluster without any downtime. How can I do that?
As Nikhil said, you need to increase num_tokens and restart each node. This can be done one at once with no down time.
However, increasing num_tokens doesn't cause any data to redistribute so you're not really using vnodes. You have to redistribute it manually via shuffle (explained in the link Lyuben posted, which often leads to problems), by decommissioning each node and bootstrapping back (which will temporarily leave your cluster extremely unbalanced with one node owning all the data), or by duplicating your hardware temporarily just like creating a new data center. The latter is the only reliable method I know of but it does require extra hardware.
In the conf/cassandra.yaml you will need to comment out the initial_token parameter, and enable the num_tokens parameter (by default 256 I believe). Do this for each node. Then you will have to restart the cassandra service on each node. And wait for the data to get redistributed throughout the cluster. 8 GB should not take too much time (provided your nodes are all in the same cluster), and read requests will still be functional, though you might see degraded performance until the redistribution of data is complete.
EDIT: Here is a potential strategy to migrate your data:
Decommission two nodes of the cluster. The token-space should get distributed 50-50 between the other two nodes.
On the two decommissioned nodes, remove the existing data, and restart the Cassandra daemon with a different cluster name and with the num_token parameters enabled.
Migrate the 8 GB of data from the old cluster to the new cluster. You could write a quick script in python to achieve this. Since the volume of data is small enough, this should not take too much time.
Once the data is migrated in the new cluster, decommission the two old nodes from the old cluster. Remove the data and restart Cassandra on them, with the new cluster name and the num_tokens parameter. They will bootstrap and data will be streamed from the two existing nodes in the new cluster. Preferably, only bootstrap one node at a time.
With these steps, you should never face a situation where your service is completely down. You will be running with reduced capacity for some time, but again since 8GB is not a large volume of data you might be able to achieve this quickly enough.
TL;DR;
No you need to restart servers once the config has been edited
The problem is that enabling vnodes means a lot of the data is redistributed randomly (the docs say in a vein similar to the classic ‘nodetool move’

Resources