I have a cassandra cluster (two nodes) already set up. When I use nodetool to get the tablestats I get different values depending on where I use nodetool. Is there a way to get an output that give a representation of the entire cluster?
For example, when I use nodetool tablestats thingsboard (this is in my node 1), I get write latency: 0.015ms. And when I do the same, but in my node 2, I get write latency: 0.012ms.
Is there a way that I can get an average of this value within nodetool? Like: nodetool tablestats (node1 and node2) thingsboard ?
Thanks in advance
No, it's not possible out of the box - all nodetool commands are using JMX of the node to which it's connected. To have aggregated results you need to setup some monitoring system, for example, by using Metrics Collector for Apache Cassandra (MCAC), and use Prometheus to scrape data, and represent data using Grafana. You can use other systems as well - for example, scrape JMX metrics using Prometheus's JMX collector, etc.
Related
This is what I have in cassandra.yaml
prepared_statements_cache_size_mb: 500MB
Is it possible to see the actual value of that variable once you're in cqlsh?
Since Cassandra 4.0 you'll able to do that by reading from system_views.settings table.
See the blog post from TLP on topic of virtual tables...
Because CQL statements are sent to the cluster (which should be three or more nodes), you can't use CQL to read settings on an individual node. The value for prepared_statements_cache is just for that node.
You could use JMX to read org.apache.cassandra.config.Config on a node, including prepared_statements_cache_size_mb.
On executing nodetool tablestats, there are many output metrics that are generated for a given table.
Are the metrics like SSTable count, Space used (live) and Space used (total)
reflecting the counts and size of table on particular node or its a aggregation of metrics from entire cluster. If its not cluster wide how do we extrapolate "space used" to reflect cluster wide usage.
Most of the nodetool commands are showing the data for a current node, or how the other nodes look like from a current node...
To get a cluster wide statistics you need to collect data from all nodes, for example by scraping data via JMX into monitoring system, or use something like metrics collector for Apache Cassandra together with Prometheus and Grafana
I have the following datacenter-aware configuration:
Primary Datacenter: 3 node cluster, RF=3
Data size is more than 100GB per node
I would lite to add new data center (Secondary Datacenter: 3 node cluster, RF=3)
I know how do that.
But the problem is: How sync data from primary to secondary quickly?
I tried "nodetool repair" (with various keys) and "nodetool rebuild" but it takes much time near 10 hours.
I used cassandra 2.1.15 version
nodetool rebuild is usually the fastest way to sync new nodes.
To speed it up you could try a couple things:
If you have a lot of network bandwidth between the data centers, try increasing the cassandra.yaml parameter inter_dc_stream_throughput_outbound_megabits_per_sec. This defaults to 200 Mbs, so you could try a higher value.
You could also use a smaller replication factor than 3 in the new data center, for example start with 1 to get it up and running as quickly as possible, then later alter the keyspace to a higher value and use repair to create the extra replicas.
I'm connecting to Cassandra with JMX (host:port) and querying with the ObjectName:
"org.apache.cassandra.metrics:type=Keyspace,keyspace=keySpaceName,name=TotalDiskSpaceUsed"
Then I fetch the Attribute "Value" which I suppose is the total disk usage in bytes for the keyspace (in my example, which is small, returns 10516).
In my test I only have a one-node-cluster, but what if I have a cluster of hundreds of nodes with a lot of tables with different partition keys, will the value then be for the whole cluster or is it just the value for the node I connect to?
All JMX metrics, including TotalDiskUsage, are measurements for the local node only.
With the objective of speeding up the migration process of a full production cassandra cluster, I would like to know if anyone has tried to simultaneously run cassandra's sstableloader from two nodes at the same time.
Those nodes would be out of the destination cassandra's ring and they would stream different data to the ring.
Has anyone tried this?
Thank you.
I have tried this with multiple simultaneous sstableloaders without any issue. In my case the SSTable-sets were created by a map-reduce job resulting in a set of SSTables per reducer that were later loaded via the sstableloader.