Cassandra Nodetool netstats. What exactly am i monitoring? - cassandra

Can anyone please let me know what to look/monitor in nodetool netstats command and its thresholds.
when we say we monitor the number of active, pending, and completed commands and responses, what should the threshold be? I was reading a blog and they told the thresholds are 5 and 10. I am having trouble understanding is it 5 pending commands or 5% of pending commands or a ratio between pending and active commands.
Sorry if this is silly. I am new to Cassandra.

The numbers you see in the netstats output are actual counts. See this doc link:
http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsNetstats.html
Could you elaborate on what you mean by thresholds, where are you seeing this, or what blog are you referring to?

Related

Sudden load spikes in Cassandra cluster

We recently started having problems with our Cassandra cluster. Maybe someone has ideas on how to fix this. We're running Cassandra 3.11.7 on a 40 node cluster. We are using replication factor = 3 and read/write at consistency level QUORUM.
Recently, a single node experienced a sudden spike in CPU load which then last for a while. During that period, we can observe a lot of dropped and queued MUTATIONs. If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem. We have examined log files and access patterns and have not yet been able to find the reason.
What could be the most common reasons for such behaviour? Where should we take a closer look? Has anyone already had similar experiences?
If we restart Cassandra on the problematic node, one or two other nodes start to suffer of the same problem.
First of all, when a single node presents a problem, restarting it generally achieves nothing. If anything, you'll clear the JVM heap...which will be quickly repopulated upon startup. Seriously, don't expect restarting a node to fix anything.
Has anyone already had similar experiences?
Yes, several times. For things not Cassandra related:
Are you in a cloud environment? Run iostat and look for things like high percentages of iowait and steal. Sometimes shared resources don't play well with others. If you don't have iostat, get it (yum install -y sysstat).
Check cron for all users. We once had an issue with a file integrity checker getting installed as a part of our base image, and it did exactly what you are talking about.
What could be the most common reasons for such behaviour? Where should we take a closer look?
For Cassandra related issues, I see a few possibilities:
Repairs. Check if the node is running a repair. You can see Merkle Tree calculations with nodetool compactionstats and repair streams with nodetool netstats.
Compactions. Check nodetool compactionstats. If this is it, you can try lowering your compaction throughput so that it doesn't affect normal operations.
Garbage Collection. Check the gc.log.* files. If it's GC, it can usually be fixed by reading up on and adjusting the GC settings. If there isn't anyone on your team who is a JVM GC expert, I recommend using G1GC as it removes a lot of the guesswork.
Do note that everything I mentioned above can never be fixed with a reboot. In fact, it's likely it'll pick right back up where it left off.

how to know if cassandra autocompaction is enabled or not

Autocompactions can be enabled or disabled using nodetool enableautocompaction and disableautocompaction. But is there any way to know the status? I do not see any nodetool command which will show the status.
There is no mechanism to tell short of taking a heap dump currently. Best option is just to use nodetool enableautocompaction if you want it on regardless to be safe or setting alerting on compaction pending tasks.
I think you are searching for one of the below commands:
1.CompactionHistory
Description: Provides the history of compaction operations.
CompactionStats
Provide statistics about a compaction. The total column shows the total number of uncompressed bytes of SSTables being compacted. The system log lists the names of the SSTables compacted.
As suggested Chris, nodetool compactionstats will probably help if autocompaction enabled then you can see some running task and pending task may be 0 or any number but if autocompaction disabled then you can see many pending task and no running task on nodetool compactionstats.

Cassandra - How to check table data is consistent at a given point in time?

How to find out when a Cassandra table becomes "eventually consistent"? Is there a definitive way to determine this at a given point in time? Preferably programatically through the Datastax driver API? I checked out the responses to the following related questions but there does not seem to be anything more concrete than "check the nodetool netstats output"
Methods to Verify Cassandra Node Sync
how do i know if nodetool repair is finished
If your system is always online doing operations then it may never become full consistent at single point of time untill you are on Consistency level "ALL".
Repairs process logs error in log file if it does not get reply from other replica nodes cause they were down/timeout etc.
you can check the logs if no error WRT AntiEntropy/stream it means your system is almost consistence.

Speed of hinted handoff in Cassandra

Given a particular set of configurations and a particular size of data to be written on a node, can we predict how much time will the hinted handoff take to finish?
In my case, as soon as the node came up, I checked using the 'nodetool statushandoff' command, that the hinted handoff had started running. However, it seems to be running endlessly. So is there any way by looking at the configurations, missing data size etc. so that we can know that after this much amount of time, the missing data will be written on the node.
You should be able to track the progress with some hint metrics. Have a look on this page: http://cassandra.apache.org/doc/latest/operating/metrics.html#hintedhandoff-metrics
The TotalHintsInProgress will tell you how big the backlog is and TotalHints will tell you the number of hints written on the node since startup. So by tracking these two metrics you should be able to give an estimate (good or bad) on how far it's come.

Can I avoid running repair while compaction is going on in Cassandra cluster?

I have scheduled incremental repair for everyday. But while the repair is going on, our monitoring system reports COMPACTIONEXECUTOR_PENDING tasks.
I am wondering, if I can introduce a check, to see, if compaction is not running, before I trigger repair.
I should be able to check if compaction is running by parsing output of nodetool netstats and compactionstats command output.
I will proceed with repair if both of the following checks passes:
nodetool netstats output contains Not sending any streams.
nodetool compactionstats output contains pending tasks: 0
But I want to get some expert opinion before I proceed.
Is my understanding correct?
I don't want to get into situation, in which, these checks are failing always and repair process is not getting triggered at all.
Thanks.
Compaction is occurring regularly in Cassandra. So I'm a bit scared that only triggering repair when pending_compactions=0 will result in repair not running enough. But it depends on your traffic of course, e.g. if you have few writes you won't do many compactions. You should probably add a max wait time for pending_compactions=0 so that after a specified time if the condition is not true repair will run anyway.
To answer your question. Nodetool uses JMX to fetch MBeans in Cassandra. You can see all available MBeans here: http://cassandra.apache.org/doc/latest/operating/metrics.html
You want this MBean:
org.apache.cassandra.metrics:type=Compaction name=PendingTasks
You can create your own JMX Client like this: How to connect to a java program on localhost jvm using JMX?
Or you can use jmxterm: https://github.com/jiaqi/jmxterm
My understanding is you could use it like this:
java -jar jmxterm-1.0.0-uber.jar
get -b org.apache.cassandra.metrics:type=Compaction name=PendingTasks

Resources