When I tried to monitor Cassandra_node with JMX, I had a problem.
In detail, I got a negative value from jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"].
In Cassandra Wiki, the definition of this metrics is:
Total disk space used (in bytes) for this node
Is it possible to get a negative value from this metrics? and why?
Yes its possible. It depends a bit on version but there were some bugs like CASSANDRA-8205 and CASSANDRA-7239 in particular around the load. If its operating like it should, those will be accurate though.
You can always drop down to OS level and monitor it by looking at du on the data directory.
Related
I was unable to find a good documentation/explanation as to what severity indicates in nodetool gossipinfo. was looking for a detailed explanation but could not find a suitable one.
The severity is a value added to the latency in the dynamic snitch to determine which replica a coordinator will send the read's DATA and DIGEST requests to.
Its value would depend on the IO used in compaction and also it would try to read /proc/stat (same as the iostat utility) to get actual disk statistics as its weight. In post 3.10 versions of cassandra this is removed in https://issues.apache.org/jira/browse/CASSANDRA-11738. In pervious versions you can disable it by setting -Dcassandra.ignore_dynamic_snitch_severity in jvm options. The issue is that it weighting the io use the same as the latency. So if a node is GC thrashing and not doing much IO because of it, it could end up being treated as the target of most reads even though its the worst possible node to send requests to.
Now you can still use JMX to set the value still (to 1) if you want to exclude it from being used for reads. A example use case is using nodetool disablebinary so application wont query it directly, then setting the severity to 1. That node would then only be queried by cluster if theres a CL.ALL request or a read repair. Its a way to take a node "offline" for maintenance from a read perspective but still allow it to get mutations so it doesn't fall behind.
Severity reports activity that happens on the particular node (compaction, etc.), and this information then is used to make a decision on what node could better handle the request. There is discussion in original JIRA about this functionality & how this information is used.
P.S. Please see Chris's answer about changes in post 3.10 versions - I wasn't aware about these changes...
We are writing cluster performance metric collected using Sensu to influxDB on RHEL VM(16GB). I want to collect the write rates for the influxd process per second issued by it. My device location is /dev/vda1 and file location /var/lib/influxDB/data.
The problem:
There is a substantial delay between the data collection time from sensu and the time to which data is written to the InfluxDB. We suspect the disk IO performance of influx may be bottleneck but do not have concrete data to support the claim.
Tried things:
I have tried iostat, iotop and a bunch of other ways.
Using iotop influxd process shows write rate of 35kb/s average which I am sure is far less for the load we have. (I suspect it is NOT showing me the VM stats but the physical machine stats?)
Question:
1. Is there is any other way I can collect the correct write rate metric for influxd process?
2. Has someone else faces similar issue with sensu and InfluxDB? how did yo solve it?
Thanks
You can use the _internal Influx database. It stores query times, disk usage, writes/reads, measurements, series cardinality and so on.
You could also install Telegraf on the data nodes and get disk IO, disk, CPU, network, memory and so on from the system.inputs section of Telegraf.
I'm running a two node Datastax AMI cluster on AWS. Yesterday, Cassandra started refusing connections from everything. The system logs showed nothing. After a lot of tinkering, I discovered that the commit logs had filled up all the disk space on the allotted mount and this seemed to be causing the connection refusal (deleted some of the commit logs, restarted and was able to connect).
I'm on DataStax AMI 2.5.1 and Cassandra 2.1.7
If I decide to wipe and restart everything from scratch, how do I ensure that this does not happen again?
You could try lowering the commitlog_total_space_in_mb setting in your cassandra.yaml. The default is 8192MB for 64-bit systems (it should be commented-out in your .yaml file... you'll have to un-comment it when setting it). It's usually a good idea to plan for that when sizing your disk(s).
You can verify this by running a du on your commitlog directory:
$ du -d 1 -h ./commitlog
8.1G ./commitlog
Although, a smaller commit log space will cause more frequent flushes (increased disk I/O), so you'll want to keep any eye on that.
Edit 20190318
Just had a related thought (on my 4-year-old answer). I saw that it received some attention recently, and wanted to make sure that the right information is out there.
It's important to note that sometimes the commit log can grow in an "out of control" fashion. Essentially, this can happen because the write load on the node exceeds Cassandra's ability to keep up with flushing the memtables (and thus, removing old commitlog files). If you find a node with dozens of commitlog files, and the number seems to keep growing, this might be your issue.
Essentially, your memtable_cleanup_threshold may be too low. Although this property is deprecated, you can still control how it is calculated by lowering the number of memtable_flush_writers.
memtable_cleanup_threshold = 1 / (memtable_flush_writers + 1)
The documentation has been updated as of 3.x, but used to say this:
# memtable_flush_writers defaults to the smaller of (number of disks,
# number of cores), with a minimum of 2 and a maximum of 8.
#
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
#memtable_flush_writers: 8
...which (I feel) led to many folks setting this value WAY too high.
Assuming a value of 8, the memtable_cleanup_threshold is .111. When the footprint of all memtables exceeds this ratio of total memory available, flushing occurs. Too many flush (blocking) writers can prevent this from happening expediently. With a single /data dir, I recommend setting this value to 2.
In addition to decreasing the commitlog size as suggested by BryceAtNetwork23, a proper solution to ensure it won't happen again will have monitoring of the disk setup so that you are alerted when its getting full and have time to act/increase the disk size.
Seeing as you are using DataStax, you could set an alert for this in OpsCenter. Haven't used this within the cloud myself, but I imagine it would work. Alerts can be set by clicking Alerts in the top banner -> Manage Alerts -> Add Alert. Configure the mounts to watch and the thresholds to trigger on.
Or, I'm sure there are better tools to monitor disk space out there.
I just replaced a Cassandra cluster with brand new SSDs instead of spinning disks. What configuration options would you recommend that I review? Feel free to post links to blog posts/presentations if you know of any (yes, I've Googled).
Based on a quick look through the cassandra.yaml, there are three that I see right away:
memtable_flush_writers : It is set to 2 by default, but the text above the setting indicates that "If your data directories are backed by SSD, you should increase this to the number of cores."
trickle_fsync : Forces the OS to run an fsync to flush the dirty buffers during sequential writes. The text above the setting indicates that setting it to true is "Almost always a good idea on SSDs; not necessarily on platters."
concurrent_compactors : The number of simultaneous compactions allowed. Just like the memtable_flush_writers setting, the text above indicates that SSD users should set it to the number of system cores.
Also, according to the DataStax documentation on Selecting hardware for enterprise implementations:
Unlike spinning disks, it's all right to store both commit logs and SSTables are on the same mount point.
I am looking for documentation or general guidelines on when more Cassandra servers should be added to a ring. Should this be based on disk usage or other monitoring factors?
Currently I have some concerns about CoordinatorReadLatency, ReadLatency, and DroppedMessages.REQUEST_RESPONSE, but again I cannot find a good guide on how to interpret various components that I am monitoring. I can find good guides on performance tuning, but limited information on devops.
I understand that this question may be more relevant to Server Fault, but they don't have tags for Datastax Enterprise.
Thanks in advance
Next steps based on #bcoverston 's response
Nodetool provides access to read and write latency metrics: nodetool cfhistrograms
See docs here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html?scroll=toolsCFhisto#
Since we want to tie this into pretty graphs the nodetool source code points us to the right jmx values
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeTool.java#L82
Each cf has write and read latency metrics.
The question is a little open ended, and it depends on your use case. There are a lot of things to monitor, and it can be overwhelming to look at every possible setting and decide if you need to increase your cluster size.
The general advice here is that you should monitor your read and write latency, decide where your thresholds should be, and plan your capacity accordingly. Because there is no proscriptive hardware for running Cassandra, and your use case can be unique to whatever your doing there are only rules of thumb.
Sizing your cluster based on data/node can be helpful, but only if I know how big your working set is, and what your latency targets are. In addition the speed of your storage media also matters.
Sizing your cluster based on latency makes more sense. If you need to do N tx/second you can test your hardware based on your workload and see if it can meet your targets. Keep in mind that when you do this you'll want to do a long term test to see if those targets hold up in a sustained manner, and also how long it will take until performance under that load when and if it will degrade (a write heavy workload will degrade over time, and you'll want to add capacity before you start missing your targets).