The disk read/write rate and cpu usage of cassandra db intermittently bounce - cassandra

The disk read/write rate and cpu usage of cassandra db intermittently bounce.
Casssandra was installed with docker, and node exporter and process exporter were used for monitoring. Node and process exporter are all installed with Docker.
I checked the process exporter at the time it bounced. The process that consumed the most resources during the bounced time has Java in the groupname. I'm guessing that there might be a problem with cassandra java.
No more special traffic came in at the time of the bounce.
It does not match the compaction cycle.
Clustering is not broken.
cassandra version is 4.0.3

In Cassandra 4 you have the ability to access swiss java knife (sjk) via nodetool and one of the things you get access to is ttop.
If you run the following in your cassandra env during the time your cpu is spiking you can see which threads are the top consumers, which then allows you to dial in on those threads specifically to see if there is an actual problem.
nodetool sjk ttop >> $(hostname -i)_ttop.out
Allow that to run to completion (during a period of reported high cpu), or at least for 5-10min or so if you decide to kill it early. This will collect a new iteration every few seconds, so once complete, parse the results to see which threads are regularly top consumers and what percentage of the cpu they are actually using, then you'll have a targeted approach at where to troubleshoot for potential problems in the jvm.
If nothing good turns up, go for a thread dump next for a more complete look and I recommend the following script:
https://github.com/brendancicchi/collect-thread-dumps

Related

How to increase connection per host while cassandra streaming

While I am running nodetool decommission, I want to use 100% of my network. I set "nodetool setstreamthroughput 0". At the beginning, since the node on which decommission process started sends multiple nodes, The node can send data at speed 900Mbps. Later, since number of nodes that transferred is reducing, the node can send data like 300Mbps.
I see that the node sends one SSTable to one node. I want to increase the parallelism. nodetool says that one connection per hosts. How can I increase this setting. I mean "multiple connection per hosts" while I am streaming?
Most likely Cassandra 3.0 will not be able to utilize 100% of your network regardless of how you set it up. Even with multiple threads you push up against a point where the allocation rate of objects generated in the streaming will exceed what the jvm can clean up and then your GC pauses will only be able give you 100% for short periods. Kind of moot though as you cannot configure it to use more threads.
In cassandra 4.0 you will be able to achieve this: http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html

Can I "nice" Cassandra nodetool repair

Will any issues arise if I deprioritize the Cassandra "nodetool repair" command using "nice" ? It causes high CPU "user time" load and is having a negative impact on our production systems, causing API timeouts on our Usergrid implementation. I see documentation on limiting network throughput, but iowait does not appear to be the issue. Additionally, are there any good methods for mitigating this problem?
The nodetool command doesn't actually do any work. It just calls a JMX operation in C* to kick off the repair and then listens for updates to print out. Doing nice wont make any difference. There are a couple main phases to the repair
build merkle trees (on each node)
stream changes
compactions
Possibly the validation compaction (on some versions can be controlled with compaction throttle) or the streams (can set stream throughput via nodetool or cassandra.yaml) are burning your CPU. If so can try using the throttles, but in some versions it wont make a difference.
After the repair is completed there are normal compactions that kick off for anti compaction in incremental repairs, and also for full repairs if theres a lot of differences streamed. Some problems are very version specific, so pay attention to logs and when CPU is high to drill down more.

Cassandra CPU performance

I deployed a Cassandra 2.2 ring composed by 4 nodes in the cloud with 8 vCPU and 8GB of ram. I am running some tests now with cassandra-stress and YCSB tools to test its performance. I am mainly interested in read requests with a small amount of write requests (95%/5%).
Running the experiments, I noticed that even setting a high number of threads (or clients) the CPU (and disk) does not saturate, but still always around the 60% of utilisation.
I am trying to figure out where is the bottleneck in my system. From the hardware point of view it seems all ok to me.
dstat
I also looked into the Cassandra configuration file to see if there are some tuning parameters to increase the system throughput. I increase the value of concurrent_read/write parameter, but it doesn't increase the performance.
The log file also does not contain any warning.
What it could be that is limiting my system?
Thanks
You might want to consider running cassandra-stress from outside the cluster and on multiple instances as described in
Usage of the Cassandra tool cassandra-stress

Why is my client CPU utilization so high when I use a cassandra cluster?

This is a follow-on question to Why is my cassandra throughput not improving when I add nodes?. I have configured my client and nodes as closely as I could to what is recommended here: http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installRecommendSettings.html. The whole setup is not exactly world class (the client is on a laptop with 32G of RAM and a modern'ish processor, for example). I am more interested in developing an intuition for the cassandra infrastructure at this point.
I notice that if I shut down all but one of the nodes in the cluster and run my test client against it, I get a throughput of ~120-140 inserts/s and a CPU utilization of ~30-40%. When I crank up all 6 nodes and run this one client against them, I see a throughput of ~110-120 inserts/s and my CPU utilization gets to between ~80-100%.
All my tests are run with a clean DB (I completely delete all DB files and restart from scratch) and I insert 30M rows.
My test client is multi-threaded and each thread writes exclusively to one partition using unlogged batches, as recommend by various sources for a schema like mine (e.g. https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/).
Is this CPU spike expected behavior?

Cassandra compaction tasks stuck

I'm running Datastax Enterprise in a cluster consisting of 3 nodes. They are all running under the same hardware: 2 Core Intel Xeon 2.2 Ghz, 7 GB RAM, 4 TB Raid-0
This should be enough for running a cluster with a light load, storing less than 1 GB of data.
Most of the time, everything is just fine but it appears that sometimes the running tasks related to the Repair Service in OpsCenter sometimes get stuck; this causes an instability in that node and an increase in load.
However, if the node is restarted, the stuck tasks don't show up and the load is at normal levels again.
Because of the fact that we don't have much data in our cluster we're using the min_repair_time parameter defined in opscenterd.conf to delay the repair service so that it doesn't complete too often.
It really seems a little bit weird that the tasks that says that are marked as "Complete" and are showing a progress of 100% don't go away, and yes, we've waited hours for them to go away but they won't; the only way that we've found to solve this is to restart the nodes.
Edit:
Here's the output from nodetool compactionstats
Edit 2:
I'm running under Datastax Enterprise v. 4.6.0 with Cassandra v. 2.0.11.83
Edit 3:
This is output from dstat on a node that behaving normally
This is output from dstat on a node with stucked compaction
Edit 4:
Output from iostat on node with stucked compaction, see the high "iowait"
azure storage
Azure divides disk resources among storage accounts under an individual user account. There can be many storage accounts in an individual user account.
For the purposes of running DSE [or cassandra], it is important to note that a single storage account should not should not be shared between more than two nodes if DSE [or cassandra] is configured like the examples in the scripts in this document. This document configures each node to have 16 disks. Each disk has a limit of 500 IOPS. This yields 8000 IOPS when configured in RAID-0. So, two nodes will hit 16,000 IOPS and three would exceed the limit.
See details here
So, this has been an issue that have been under investigation for a long time now and we've found a solution, however, we aren't sure what the underlaying problem that were causing the issues were but we got a clue even tho that, nothing can be confirmed.
Basically what we did was setting up a RAID-0 also known as Striping consisting of four disks, each at 1 TB of size. We should have seen somewhere 4x one disks IOPS when using the Stripe, but we didn't, so something was clearly wrong with the setup of the RAID.
We used multiple utilities to confirm that the CPU were waiting for the IO to respond most of the time when we said to ourselves that the node was "stucked". Clearly something with the IO and most probably our RAID-setup was causing this. We tried a few differences within MDADM-settings etc, but didn't manage to solve the problems using the RAID-setup.
We started investigating Azure Premium Storage (which still is in preview). This enables attaching disks to VMs whose underlaying physical storage actually are SSDs. So we said, well, SSDs => more IOPS, so let us give this a try. We did not setup any RAID using the SSDs. We are only using one single SSD-disk per VM.
We've been running the Cluster for almost 3 days now and we've stress tested it a lot but haven't been able to reproduce the issues.
I guess we didn't came down to the real cause but the conclusion is that some of the following must have been the underlaying cause for our problems.
Too slow disks (writes > IOPS)
RAID was setup incorrectly which caused the disks to function non-normally
These two problems go hand-in-hand and most likely is that we basically just was setting up the disks in the wrong way. However, SSDs = more power to the people, so we will definitely continue using SSDs.
If someone experience the same problems that we had on Azure with RAID-0 on large disks, don't hesitate to add to here.
Part of the problem you have is that you do not have a lot of memory on those systems and it is likely that even with only 1GB of data per node, your nodes are experiencing GC pressure. Check in the system.log for errors and warnings as this will provide clues as to what is happening on your cluster.
The rollups_60 table in the OpsCenter schema contains the lowest (minute level) granularity time series data for all your Cassandra, OS, and DSE metrics. These metrics are collected regardless of whether you have built charts for them in your dashboard so that you can pick up historical views when needed. It may be that this table is outgrowing your small hardware.
You can try tuning OpsCenter to avoid this kind of issues. Here are some options for configuration in your opscenterd.conf file:
Adding keyspaces (for example the opsc keyspace) to your ignored_keyspaces setting
You can also decrease the TTL on this table by tuning the 1min_ttlsetting
Sources:
Opscenter Config DataStax docs
Metrics Config DataStax Docs

Resources