Cassandra : how to properly implement "global" back-pressure with multiples applications? - cassandra

As you know, with Cassandra, when nodes are overloaded, it may seriously hurt your production depending on required consistency, because nodes might become unresponsive, the entire daemon might also crash, hints might fill-up your data mount point, and so on.
So the keyword here is back-pressure.
To do appropriate back-pressure with Spark on Cassandra, there are especially the following properties :
--conf "spark.cassandra.output.throughputMBPerSec=2"
--total-executor-cores 24
(There are also similar back-pressure options with Datastax driver, or cqlsh. You basically limit the throughput per core, to apply some back-pressure)
Let say, I found my global write throuput on my Cassandra cluster, and I set appropriate settings for my application1 that works fine.
BUT still, the challenge, is that there are many developers on a Cassandra cluster. So at a given time, I may have Spark application1, application2, application3, ... that runs concurrently.
Question : What are my options to ensure that the write troughput (no matter how many applications runs concurrently) at a given time is globally NOT going to reach too much pressure for Cassandra, thus hurting my production workload ?
Thank you

What I recommend folks do to separate analytical workloads, is to spin-up another (logical) data center. Sure, it could be in the same physical data center. But what you want is separate compute and storage to keep the analytics load from interfering with the production traffic.
First, make sure that you're running with the GossipingPropertyFileSnitch (cassandra.yaml) and that your keyspaces are using the NetworkTopologyStrategy. Likewise, you'll want to make sure that your keyspace definition contains a named data center and that your production application/services are configured to use that data center (ex: dc1 as below) as their default DC:
ALTER KEYSPACE product_data WITH
REPLICATION={'class':'NetworkTopologyStrategy',
'dc1':'3'};
Once the new infra is up, install Cassandra and join the nodes to the cluster as a new DC by specifying the new name in the cassandra-rackdc.properties file. Something like:
dc=dc1_analytics
Next adjust your keyspace(s) to replicate data to the new DC.
ALTER KEYSPACE product_data WITH
REPLICATION={'class':'NetworkTopologyStrategy',
'dc1':'3','dc1_analyitcs':'3'};
Run a repair/rebuild on the new DC, and then configure the Spark jobs to only use dc1_analytics.

Related

In Cassandra 3.x, is there a way to set a limit on the cluster usage for each keyspace?

I am currently working on setting up a Cassandra cluster that will be used by different applications each with their own keyspace (in a multi-tenancy fashion).
So I was wondering if I could limitate the usage of my cluster for each keyspace individually.
For example, if keyspace1 is using 65% of the cluster resources, every new request on that keyspace would be put in a queue so it doesn't impact requests on other keyspaces.
I know I can get statistics on each keyspace using nodetool cfstats but I don't know how to take counter measures.
Cluster resources is also a term to define as it can be total CPU usage, JVM heap usage, or proportion of write/read on each keyspace on the cluster at instant t.
Also, if you have strategies to avoid entering into this kind of situation, I'm glad to hear about it !
No, Cassandra doesn't have such functionality. That's why it's recommended to setup separate clusters to isolate from noisy neighbors...
Theoretically you can do this on Docker/Kubernetes/... but it could take a lot of resources to build something working reliably.

How cassandra improve performance by adding nodes?

I'm going build apache cassandra 3.11.X cluster with 44 nodes. Each application server will have one cluster node so that application do r/w locally.
I have couple of questions running in my mind kindly answer if possible.
1.How many server Ip's should mention in seednode parameter?
2.How HA works when all the mentioned seed node goes down?
3.What is the dis-advantage to mention all the serverIP's in seednode parameter?
4.How cassandra scales with respect to data other than(Primary key and Tunable consistency). As per my assumption replication factor can improve HA chances but not performances.
then how performance will increase by adding more nodes?
5.Is there any sharding mechanism in Cassandra.
Answers are in order:
It's recommended to point to at least to 2 nodes per DC
Seed/contact node is used only for initial bootstrap - when your program reaches any of listed nodes, it "learns" the topology of whole cluster, and then driver listens for nodes status change, and adjust a list of available hosts. So even if seed node(s) goes down after connection is already established, driver will able to reach other nodes
it's harder to maintain usually - you need to keep a configuration parameters for your driver & list of nodes in sync.
When you have RF > 1, Cassandra may read or write data from/to any replica. Consistency level regulates how many nodes should return answer for read or write operation. When you add the new node, the data is redistributed to new node, and if you have correctly selected partition key, then new node start to receive requests in parallel to old nodes
Partition key is responsible for selection of replica(s) that will hold data associated with it - you can see it as a shard. But you need to be careful with selection of partition key - it's easy to create too big partitions, or partitions that will be "hot" (receiving most of operations in cluster - for example, if you're using the date as partition key, and always writing reading data for today).
P.S. I would recommend to read DataStax Architecture guide - it contains a lot of information about Cassandra as well...

Repartitioning of data in Cassandra when adding new servers

Let's imagine I have a Cassandra cluster with 3 nodes, each having 100GB of available hard disk space. Replication Factor for this cluster is set to 3 and R/W CLs are set to 2, meaning I can tolerate one of my nodes going down without sacrificing consistency or availability.
Now imagine my servers have started to fill up (80GB as an example) and I would like to add another 3 servers of the same specification to my cluster, maintaining the same CLs and RFs.
My question is: after I've added the new nodes to my cluster and run the node repair tool, is it fair to assume that each of my nodes should roughly (more or less a few GBs) contain 40GB of data each?
If not, how can I add new nodes without having the fear of running out of hard disk space?
A little background of why I'm asking this question: I am developing an app that connects to a server that runs Cassandra for its data storage. As this is only developed by me, and I have limited resources in terms of money to buy servers, I've decided that I would like to buy small, cheap "servers" instead of the more expensive rack options but I'm really worried about the nodes running out of space if the disk allocation is not (at least partially)
homogenous.
Many thanks for you help,
My question is: after I've added the new nodes to my cluster and run
the node repair tool, is it fair to assume that each of my nodes
should roughly (more or less a few GBs) 40GB of data each
After also running nodetool cleanup you should see roughly 40GB of data on each node. Cleanup removes data which the node is no longer responsible for. If you don't run this command the old data will remain on the machine.

Configure cassandra to use different network interfaces for data streaming and client connection?

I have a cassandra cluster deployed with 3 cassandra nodes with replication factor of 3. I have a lot of data being written to cassandra on daily basis (10-15GB). I have provisioned these cassandra on commodity hardware as suggested by "Big data community" and I am expecting the nodes to go down frequently which is handled using redundancy provided by cassandra.
My problem is, I have observed cassandra to slow down with writes when a new node is provisioned and the data is being streamed while bootstrapping. So, to overcome this hurdle, We have decided to have a separate network interface for inter-node communication and for client application to write data to cassandra. My question is how can this be configured, if at all this is possible ?
Any help is appreciated.
I think you are chasing the wrong solution.
I am confused by the fact that you only have 3 nodes, yet your concern is around slow writes while bootstrapping. Why? Are you planning to grow your cluster regularly? What is your consistency level on write, as this has a big impact on performance? Obviously if you only have 2 or 3 nodes and you're trying to bootstrap, you will see a slowdown, because you're tying up a significant percentage of your cluster to do the streaming.
Note that "commodity hardware" doesn't mean cheap, low-performance hardware. It just means you don't need the super high-end database-class machines used for databases like Oracle. You should still use really good commodity hardware. You may also need more nodes, as setting RF equal to cluster size is not typically a great idea.
Having said that, you can set your listen_address to the inter-node interface and rpc_address to the client address if you feel that will help.

Enabling vNodes in Cassandra 1.2.8

I have a 4 node cluster and I have upgraded all the nodes from an older version to Cassandra 1.2.8. Total data present in the cluster is of size 8 GB. Now I need to enable vNodes on all the 4 nodes of cluster without any downtime. How can I do that?
As Nikhil said, you need to increase num_tokens and restart each node. This can be done one at once with no down time.
However, increasing num_tokens doesn't cause any data to redistribute so you're not really using vnodes. You have to redistribute it manually via shuffle (explained in the link Lyuben posted, which often leads to problems), by decommissioning each node and bootstrapping back (which will temporarily leave your cluster extremely unbalanced with one node owning all the data), or by duplicating your hardware temporarily just like creating a new data center. The latter is the only reliable method I know of but it does require extra hardware.
In the conf/cassandra.yaml you will need to comment out the initial_token parameter, and enable the num_tokens parameter (by default 256 I believe). Do this for each node. Then you will have to restart the cassandra service on each node. And wait for the data to get redistributed throughout the cluster. 8 GB should not take too much time (provided your nodes are all in the same cluster), and read requests will still be functional, though you might see degraded performance until the redistribution of data is complete.
EDIT: Here is a potential strategy to migrate your data:
Decommission two nodes of the cluster. The token-space should get distributed 50-50 between the other two nodes.
On the two decommissioned nodes, remove the existing data, and restart the Cassandra daemon with a different cluster name and with the num_token parameters enabled.
Migrate the 8 GB of data from the old cluster to the new cluster. You could write a quick script in python to achieve this. Since the volume of data is small enough, this should not take too much time.
Once the data is migrated in the new cluster, decommission the two old nodes from the old cluster. Remove the data and restart Cassandra on them, with the new cluster name and the num_tokens parameter. They will bootstrap and data will be streamed from the two existing nodes in the new cluster. Preferably, only bootstrap one node at a time.
With these steps, you should never face a situation where your service is completely down. You will be running with reduced capacity for some time, but again since 8GB is not a large volume of data you might be able to achieve this quickly enough.
TL;DR;
No you need to restart servers once the config has been edited
The problem is that enabling vnodes means a lot of the data is redistributed randomly (the docs say in a vein similar to the classic ‘nodetool move’

Resources