Frequent Compaction of OpsCenter.rollup_state on all the nodes consuming CPU cycles - cassandra

I am using Datastax Cassandra 4.8.16. With cluster of 8 DC and 5 nodes on each DC on VM's. For last couple of weeks we observed below performance issue
1) Increase drop count on VM's.
2) LOCAL_QUORUM for some write operation not achieved.
3) Frequent Compaction of OpsCenter.rollup_state and system.hints are visible in Opscenter.
Appreciate any help finding the root cause for this.

Presence of dropped mutations means that cluster is heavily overloaded. It could be increase of the main load, so it + load from OpsCenter, overloaded system - you need to look into statistics about number of requests, latencies, etc. per nodes and per tables, to see where increase happened. Please also check the I/O statistics on machines (for example, with iostat) - sizes of the queues, read/write latencies, etc.
Also it's recommended to use a dedicated OpsCenter cluster to store metrics - it could be smaller size, and doesn't require an additional license for DSE. How it said in the OpsCenter's documentation:
Important: In production environments, DataStax strongly recommends storing data in a separate DataStax Enterprise cluster.
Regarding VMs - usually it's not really recommended setup, but heavily depends on what kind of underlying hardware - number of CPUs, RAM, disk system.

Related

when to add nodes to cassandra cluster

What are the symptoms/signs that indicates that the existing cluster nodes are over-capacity and that the cluster would need more nodes to be added? Want to know what are the possible performance symptoms after which more nodes are to be added to the cluster.
It depends a lot on the configuration and your use cases. You may have to take a look at the different metrics from your existing cluster. A few metrics that you should keep an eye on includes.
CPU usage
Query Latency
Memory (Depends how you are using the heap memory)
Disk usage
Based on these metrics, you should make a decision whether to scale out the cluster or not.
These are the common scenarios you should look after for adding a new node
Performance of the cluster is degraded. You are not getting required throughput even after all the tunings.
Require more Disk space- Generally you can increase the disk space by adding a new disk but after a limit(2TB generally) it is advised to add a new node.
You have metrices in your hand to identify that your performance is degrading. For example you can use nodetool tablehistograms to identify read and write latency for a particular table. If read/write latency is under your required latencies then you are good and if you see your system is getting slower with more traffic, then it is sign that you should add a node to the cluster.

Cassandra cluster planning

I would like to know about hardware limitations in cluster planning (in TBs) specific to my use case. I have read few threads and documents related to it but some content seem to be over 5 years old. Thought of giving it a shot again:
Use case: Building a time-series cassandra cluster where there is from time-to-time bulk loading from data sources which are in Gigabytes. However, the end-user will majorly be focused in reading the data from the cluster. Quite rarely will be some update or delete on the rows
I have an initial hardware configuration with me to setup Cassandra cluster:
2*12 Cores
128 GB RAM
HDD SAS 3.27 TB
This is the initial plan that I come up with:
When I now speculate over the setup, and after reading the post:
should I further divide my nodes with lesser RAM, vCPUs and HDD?
If yes, what would be the good fit wrt my case?

Increasing Number of VMs decreases Cassandra Throughput. What can be reason?

I am using YCSB benchmarking tool to benchmark Cassandra cluster.
I am varying the number of Virtual machines in the cluster.
I am using 1 physical host and I am using 1,2,3,4 virtual machines for benchmarking(as shown in attached figure).
The generated workload is same all the time Workload C 10,000,00 operations, 10,000 records
Each VM has 2 GB RAM, 20GB drive
Cassandra - 1 seed node, endpoint_snitch - gossipproperty
Keyspace YCSB - Replication factor 3,
The problem is that when I increase the number of virtual machines in the cluster, the throughput decreases. What can be the reason?
By definition, by increasing compute resources(i.e virtual machines), the cluster should offer better performance, but the opposite is happening as shown in attached figure. Kindly explain what can be the probable reason for this? I am writing my thesis on this topic but I am unable to figure out the reason for this, please help, I will be grateful to you.
Throughput observed by varying number of VMs in Cassandra cluster:
Very likely hitting a disk IO bottleneck. Especially with non ssd drives this is completely expected. Unless you have dedicated disk/cpu per vm the competition for resources will cause contention like this. Also 2gb per vm is not enough to do any kind of performance benchmark with Cassandra since the minimum recommended JVM heap size is 8gb.
Cassandra is great at horizontal scaling (nearly linear), but that doesn't mean that simply adding VMs to one physical host will increase throughput - a single VM on the physical host will have less contention for resources (disk, cpu, memory, network) than 4, so it's likely one VM would perform better than 4.
By definition, if you WERE increasing resources, you SHOULD see it perform better - but you're not, you're simply adding contention to existing resources. If you want to scale cassandra, you need to test it with additional physical resources - more physical machines, not more VMs on the same machine.
Finally, as Chris Lohfink mentions, your VMs are too small to do meaningful tests - 8GB JVM heap is recommended, with another 8GB of vm page cache to support reads - running Cassandra with less than 16G of RAM is typically non-ideal in production.
You're trying to test a jet engine (a distribute database designed for hundreds or thousands of physical nodes) with a gas station level equipment - your benchmark hardware isn't viable for a real production environment, so your benchmark results aren't meaningful.

Cassandra compaction tasks stuck

I'm running Datastax Enterprise in a cluster consisting of 3 nodes. They are all running under the same hardware: 2 Core Intel Xeon 2.2 Ghz, 7 GB RAM, 4 TB Raid-0
This should be enough for running a cluster with a light load, storing less than 1 GB of data.
Most of the time, everything is just fine but it appears that sometimes the running tasks related to the Repair Service in OpsCenter sometimes get stuck; this causes an instability in that node and an increase in load.
However, if the node is restarted, the stuck tasks don't show up and the load is at normal levels again.
Because of the fact that we don't have much data in our cluster we're using the min_repair_time parameter defined in opscenterd.conf to delay the repair service so that it doesn't complete too often.
It really seems a little bit weird that the tasks that says that are marked as "Complete" and are showing a progress of 100% don't go away, and yes, we've waited hours for them to go away but they won't; the only way that we've found to solve this is to restart the nodes.
Edit:
Here's the output from nodetool compactionstats
Edit 2:
I'm running under Datastax Enterprise v. 4.6.0 with Cassandra v. 2.0.11.83
Edit 3:
This is output from dstat on a node that behaving normally
This is output from dstat on a node with stucked compaction
Edit 4:
Output from iostat on node with stucked compaction, see the high "iowait"
azure storage
Azure divides disk resources among storage accounts under an individual user account. There can be many storage accounts in an individual user account.
For the purposes of running DSE [or cassandra], it is important to note that a single storage account should not should not be shared between more than two nodes if DSE [or cassandra] is configured like the examples in the scripts in this document. This document configures each node to have 16 disks. Each disk has a limit of 500 IOPS. This yields 8000 IOPS when configured in RAID-0. So, two nodes will hit 16,000 IOPS and three would exceed the limit.
See details here
So, this has been an issue that have been under investigation for a long time now and we've found a solution, however, we aren't sure what the underlaying problem that were causing the issues were but we got a clue even tho that, nothing can be confirmed.
Basically what we did was setting up a RAID-0 also known as Striping consisting of four disks, each at 1 TB of size. We should have seen somewhere 4x one disks IOPS when using the Stripe, but we didn't, so something was clearly wrong with the setup of the RAID.
We used multiple utilities to confirm that the CPU were waiting for the IO to respond most of the time when we said to ourselves that the node was "stucked". Clearly something with the IO and most probably our RAID-setup was causing this. We tried a few differences within MDADM-settings etc, but didn't manage to solve the problems using the RAID-setup.
We started investigating Azure Premium Storage (which still is in preview). This enables attaching disks to VMs whose underlaying physical storage actually are SSDs. So we said, well, SSDs => more IOPS, so let us give this a try. We did not setup any RAID using the SSDs. We are only using one single SSD-disk per VM.
We've been running the Cluster for almost 3 days now and we've stress tested it a lot but haven't been able to reproduce the issues.
I guess we didn't came down to the real cause but the conclusion is that some of the following must have been the underlaying cause for our problems.
Too slow disks (writes > IOPS)
RAID was setup incorrectly which caused the disks to function non-normally
These two problems go hand-in-hand and most likely is that we basically just was setting up the disks in the wrong way. However, SSDs = more power to the people, so we will definitely continue using SSDs.
If someone experience the same problems that we had on Azure with RAID-0 on large disks, don't hesitate to add to here.
Part of the problem you have is that you do not have a lot of memory on those systems and it is likely that even with only 1GB of data per node, your nodes are experiencing GC pressure. Check in the system.log for errors and warnings as this will provide clues as to what is happening on your cluster.
The rollups_60 table in the OpsCenter schema contains the lowest (minute level) granularity time series data for all your Cassandra, OS, and DSE metrics. These metrics are collected regardless of whether you have built charts for them in your dashboard so that you can pick up historical views when needed. It may be that this table is outgrowing your small hardware.
You can try tuning OpsCenter to avoid this kind of issues. Here are some options for configuration in your opscenterd.conf file:
Adding keyspaces (for example the opsc keyspace) to your ignored_keyspaces setting
You can also decrease the TTL on this table by tuning the 1min_ttlsetting
Sources:
Opscenter Config DataStax docs
Metrics Config DataStax Docs

Cassandra consistency model performance evaluation

Hi I am a student and am trying to evaluate the latency(Insert, read and Upsert) of cassandra for different consistency models and for different replication factors.
I am using Virtual box on my host system and have 10 ubuntu VMs to form a cluster.
When I run the tests, sometimes the average latency comes out lesser for a stronger consistency model.
Also the latency does not increase as I increase the replication factor in some cases which is also not an expected result.
I wanted to know what all could be the possible reasons for such behavior?
There are a few things:
Performance benchmarks using virtual box on a single system will give you very different resutls from a live cluster. For instance, network latencies would be considerably reduced. A real cluster would have different resources available whereas vbox instances are sharing the same resources. Even on a cloud platform, you'd see different numbers.
When a write request comes in, the coordinator sends to all required replicas a write request in parallel. They all process the write and respond. If your lower consistency write went to a busy node, and the higher consistency write went to enough "faster / available" nodes to make a quorum, then the latter will have lower latency. Also, increasing the replication factor means the data is available in more nodes. So reads can be faster (depending on consistency levels).

Resources