Measuring throughput in cassandra cluster - cassandra

This may sound as an elementary question - but I am confused with all the literature.
I have a 3-node cassandra cluster on 3.11.x,with 1 seed node.
We are testing brute force write throughput in this setup with a single threaded client seated outside the cluster.
With nodetool and cqsl at my disposal - how do I go about realistically assessing the following:
How much volume was processed by each node.
How much of the total time was consumed
a)by the actual flush + compaction at each node
b)time taken by the cluster to resolve the node/partition(hashing)
c)network latency in chaperoning the data to the node

The best way would be monitoring JMX port of Cassandra by a tool like Opennms or Zabbix and compare Metrics such as Mutation, compatcion, etc (Hundreds of Metrics there):
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsMonitoring.html

Related

What is the expected ingestion pace for a Cassandra cluster?

I am running a project that requires to load millions of records to cassandra.
I am using kafka connect and doing partitioning and raising 24 workers I only get around 4000 rows per second.
I did a test with pentaho pdi inserting straight to cassandra with jdbc driver and I get a litle bit less rows per second: 3860 (avg)
The cassandra cluster has 24 nodes. What is the expected insertion pace by default? how can i fine tune the ingestion of big loads of data?
There is no magical "default" rate that a Cassandra cluster can ingest data. One cluster can take 100K ops/sec, another can do 10M ops/sec. In theory, it can be limitless.
A cluster's throughput is determined by a lot of moving parts which include (but NOT limited to):
hardware configuration
number of cores, type of CPU
amount of memory, type of RAM
disk bandwidth, disk configuration
network capacity/bandwidth
data model
client/driver configuration
access patterns
cluster topology
cluster size
The only way you can determine the throughput of your cluster is by doing your own test on as close to production loads as you can simulate. Cheers!

when to add nodes to cassandra cluster

What are the symptoms/signs that indicates that the existing cluster nodes are over-capacity and that the cluster would need more nodes to be added? Want to know what are the possible performance symptoms after which more nodes are to be added to the cluster.
It depends a lot on the configuration and your use cases. You may have to take a look at the different metrics from your existing cluster. A few metrics that you should keep an eye on includes.
CPU usage
Query Latency
Memory (Depends how you are using the heap memory)
Disk usage
Based on these metrics, you should make a decision whether to scale out the cluster or not.
These are the common scenarios you should look after for adding a new node
Performance of the cluster is degraded. You are not getting required throughput even after all the tunings.
Require more Disk space- Generally you can increase the disk space by adding a new disk but after a limit(2TB generally) it is advised to add a new node.
You have metrices in your hand to identify that your performance is degrading. For example you can use nodetool tablehistograms to identify read and write latency for a particular table. If read/write latency is under your required latencies then you are good and if you see your system is getting slower with more traffic, then it is sign that you should add a node to the cluster.

In Cassandra 3.x, is there a way to set a limit on the cluster usage for each keyspace?

I am currently working on setting up a Cassandra cluster that will be used by different applications each with their own keyspace (in a multi-tenancy fashion).
So I was wondering if I could limitate the usage of my cluster for each keyspace individually.
For example, if keyspace1 is using 65% of the cluster resources, every new request on that keyspace would be put in a queue so it doesn't impact requests on other keyspaces.
I know I can get statistics on each keyspace using nodetool cfstats but I don't know how to take counter measures.
Cluster resources is also a term to define as it can be total CPU usage, JVM heap usage, or proportion of write/read on each keyspace on the cluster at instant t.
Also, if you have strategies to avoid entering into this kind of situation, I'm glad to hear about it !
No, Cassandra doesn't have such functionality. That's why it's recommended to setup separate clusters to isolate from noisy neighbors...
Theoretically you can do this on Docker/Kubernetes/... but it could take a lot of resources to build something working reliably.

Frequent Compaction of OpsCenter.rollup_state on all the nodes consuming CPU cycles

I am using Datastax Cassandra 4.8.16. With cluster of 8 DC and 5 nodes on each DC on VM's. For last couple of weeks we observed below performance issue
1) Increase drop count on VM's.
2) LOCAL_QUORUM for some write operation not achieved.
3) Frequent Compaction of OpsCenter.rollup_state and system.hints are visible in Opscenter.
Appreciate any help finding the root cause for this.
Presence of dropped mutations means that cluster is heavily overloaded. It could be increase of the main load, so it + load from OpsCenter, overloaded system - you need to look into statistics about number of requests, latencies, etc. per nodes and per tables, to see where increase happened. Please also check the I/O statistics on machines (for example, with iostat) - sizes of the queues, read/write latencies, etc.
Also it's recommended to use a dedicated OpsCenter cluster to store metrics - it could be smaller size, and doesn't require an additional license for DSE. How it said in the OpsCenter's documentation:
Important: In production environments, DataStax strongly recommends storing data in a separate DataStax Enterprise cluster.
Regarding VMs - usually it's not really recommended setup, but heavily depends on what kind of underlying hardware - number of CPUs, RAM, disk system.

Configure cassandra to use different network interfaces for data streaming and client connection?

I have a cassandra cluster deployed with 3 cassandra nodes with replication factor of 3. I have a lot of data being written to cassandra on daily basis (10-15GB). I have provisioned these cassandra on commodity hardware as suggested by "Big data community" and I am expecting the nodes to go down frequently which is handled using redundancy provided by cassandra.
My problem is, I have observed cassandra to slow down with writes when a new node is provisioned and the data is being streamed while bootstrapping. So, to overcome this hurdle, We have decided to have a separate network interface for inter-node communication and for client application to write data to cassandra. My question is how can this be configured, if at all this is possible ?
Any help is appreciated.
I think you are chasing the wrong solution.
I am confused by the fact that you only have 3 nodes, yet your concern is around slow writes while bootstrapping. Why? Are you planning to grow your cluster regularly? What is your consistency level on write, as this has a big impact on performance? Obviously if you only have 2 or 3 nodes and you're trying to bootstrap, you will see a slowdown, because you're tying up a significant percentage of your cluster to do the streaming.
Note that "commodity hardware" doesn't mean cheap, low-performance hardware. It just means you don't need the super high-end database-class machines used for databases like Oracle. You should still use really good commodity hardware. You may also need more nodes, as setting RF equal to cluster size is not typically a great idea.
Having said that, you can set your listen_address to the inter-node interface and rpc_address to the client address if you feel that will help.

Resources