I have setup new Cassandra 3.3 cluster. Then I use jvisualvm to monitor Cassandra read/write latency by using MBean (jmx metric).
The result of read/write latency is always stable in all nodes for many weeks whereas read/write request in that cluster have normally movement (heavy or less in some day).
As I use jvisualvm to monitor Cassandra 2.0 cluster. The read/write latency have normally behavior. It have movement depending on read/wire requests.
I wonder that Why the read/write latency statistics of Cassandra 3.0+ are always stable? And I think it is incorrect result. (I have load tested in Cassandra v3.3, v3.7).
[Updated]
I have found bug relate with this issue.
Cassandra metric flat. https://issues.apache.org/jira/browse/CASSANDRA-11752
The detail show that this problem has been solved in C* version 2.2.8, 3.0.9, 3.8. But after I have tested in version 3.0.9, The result of latency still show flat line.
Any Idea?
Thanks.
have not found any metrics problem When using C*3.3
first,try to monitor with jconsole,have met same issue?
second,which attribute do you see?avg value or percentage?there value always count from node up,so it is common to see percentage value is same.but not always happens on average value.try to restart cassandra node and check the value.
Related
As stated in the title, we are having a problem with our Cassandra cluster. There are 9 nodes with a replication factor of 3 using NetworkTopologyStrategy. All in the same DC and Rack. Cassandra version is 3.11.4 (planning to move on 3.11.10). Instances have 4 CPU and 32 GB RAM. (planning to move on 8 CPU)
Whenever we try to run repair on our cluster (using Cassandra Reaper on one of our nodes), we lose one node somewhere in the process. We quickly stop the repair, restart Cassandra service on the node and wait for it to join the ring. Therefore we are never able to run repair these days.
I observed the problem and realized that this problem is caused by high CPU usage on some of our nodes (exactly 3). You may see the 1 week interval graph in below. Ups and downs are caused by the usage of the app. In the mornings, it's very low.
I compared the running processes on each node and there is nothing extra on the high CPU nodes. I compared the configurations. They are identical. Couldn't find any difference.
I also realized that these nodes are the ones that take most of the traffic. See the 1 week interval graph in below. Both sent & received bytes.
I made some research. I found this thread and at the end it is recommended to set dynamic_snitch: false in Cassandra configuration. I looked at our snitch strategy which is GossipingPropertyFileSnitch. In practice, this strategy should work properly but I guess it doesn't.
The job of a snitch is to provide information about your network topology so that Cassandra can efficiently route requests.
My only observation that could be cause of this issue is there is a file called cassandra-topology.properties which is specifically told to be removed if using GossipingPropertyFileSnitch
The rack and datacenter for the local node are defined in cassandra-rackdc.properties and propagated to other nodes via gossip. If cassandra-topology.properties exists, it is used as a fallback, allowing migration from the PropertyFileSnitch.
I did not remove this file as I couldn't find any hard proof that this is causing the issue. If you have any knowledge on this or see any other reason to my problem, I would appreciate your help.
These two sentences tell me some important things about your cluster:
high CPU usage on some of our nodes (exactly 3).
I also realized that these nodes are the ones that take most of the traffic.
The obvious point, is that your replication factor (RF) is 3 (most common). The not-so-obvious, is that your data model is likely keyed on date or some other natural key which results in the same (3?) nodes serving all of the traffic for long periods of time. Running repair during those high-traffic periods will likely lead to issues.
Some things to try:
Have a look at the data model, and see if there's a better way to partition the data to distribute traffic over the rest of the cluster. This is often done with a modeling technique known as "bucketing" (adding another component...usually time based...to the partition key).
Are the partitions large? (Check with nodetool tablehistograms) And by "large," like > 10MB? It could also be that the large partitions are causing the repair operations to fail. If so, hopefully lowering resource consumption (below) will help.
Does your cluster sustain high amounts of write throughput? If so, it may also be dealing with compactions (nodetool compactionstats). You could try lowering compaction throughput (nodetool setcompactionthroughput) to free up some resources. Repair operations can also invoke compactions.
Likewise, you can also lower streaming throughput (nodetool setstreamthroughput) during repairs. Repairs will take longer to stream data, but if that's what is really tipping-over the node(s), it might be necessary.
In case you're not already, set up another instance and use Cassandra Reaper for repairs. It is so much better than triggering from cron. Plus, the UI allows for some finely-tuned config which might be necessary here. It also lets you pause and resume repairs, to pick-up where it leaves off.
(Single Node Cluster)I've got a table having 2 columns, one is of 'text' type and the other is a 'blob'. I'm using Datastax's C++ driver to perform read/write requests in Cassandra.
The blob is storing a C++ structure.(Size: 7 KB).
Since I was getting lesser than desirable throughput when using Cassandra alone, I tried adding Ignite on top of Cassandra, in the hope that there will be significant improvement in the performance as now the data will be read from RAM instead of hard disks.
However, it turned out that after adding Ignite, the performance dropped even more(roughly around 50%!).
Read Throughput when using only Cassandra: 21000 rows/second.
Read Throughput with Cassandra + Ignite: 9000 rows/second.
Since, I am storing a C++ structure in Cassandra's Blob, the Ignite API uses serialization/de-serialization while writing/reading the data. Is this the reason, for the drop in the performance(consider the size of the structure i.e. 7K) or is this drop not at all expected and maybe something's wrong in the configuration?
Cassandra: 3.11.2
RHEL: 6.5
Configurations for Ignite are same as given here.
I got significant improvement in Ignite+Cassandra throughput when I used serialization in raw mode. Now the throughput has increased from 9000 rows/second to 23000 rows/second. But still, it's not significantly superior to Cassandra. I'm still hopeful to find some more tweaks which will improve this further.
I've added some more details about the configurations and client code on github.
Looks like you do one get per each key in this benchmark for Ignite and you didn't invoke loadCache before it. In this case, on each get, Ignite will go to Cassandra to get value from it and only after it will store it in the cache. So, I'd recommend invoking loadCache before benchmarking, or, at least, test gets on the same keys, to give an opportunity to Ignite to store keys in the cache. If you think you already have all the data in caches, please share code where you write data to Ignite too.
Also, you invoke "grid.GetCache" in each thread - it won't take a lot of time, but you definitely should avoid such things inside benchmark, when you already measure time.
I have setup new Cassandra 3.3 cluster. Then I use jvisualvm to monitor Cassandra read/write latency by using MBean (jmx metric).
The result of read/write latency is always stable in all nodes for many weeks whereas read/write request in that cluster have normally movement (heavy or less in some day).
As I use jvisualvm to monitor Cassandra 2.0 cluster. The read/write latency have normally behavior. It have movement depending on read/wire requests.
I wonder that Why the read/write latency statistics of Cassandra 3.0+ are always stable? And I think it is incorrect result. (I have load tested in Cassandra v3.3, v3.7).
[Updated]
I have found bug relate with this issue.
Cassandra metric flat. https://issues.apache.org/jira/browse/CASSANDRA-11752
The detail show that this problem has been solved in C* version 2.2.8, 3.0.9, 3.8. But after I have tested in version 3.0.9, The result of latency still show flat line.
Any Idea?
Thanks.
have not found any metrics problem When using C*3.3
first,try to monitor with jconsole,have met same issue?
second,which attribute do you see?avg value or percentage?there value always count from node up,so it is common to see percentage value is same.but not always happens on average value.try to restart cassandra node and check the value.
I have a 5 node cluster with around 1TB of data. Vnodes enabled. Ops Center version 5.12 and DSE 4.6.7. I would like to do a full repair within 10 days and use the repair service in Ops Center so that i don't put unnecessary load on the cluster.
The problem that I'm facing is that repair service puts to much load and is working too fast. It progress is around 30% (according to Ops Center) in 24h. I even tried to change it to 40 days without any difference.
Questions,
Can i trust the percent-complete number in OpsCenter?
The suggested number is something like 0.000006 days. Could that guess be related to the problem?
Are there any settings/tweaks that could be useful to lower the load?
You can use OpsCenter as a guideline about where data is stored and what's going on in the cluster, but it's really more of a dashboard. The real 'tale of the tape' comes from 'nodetool' via command line on server nodes such as
#shell> nodetool status
Status=Up/Down |/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack UN 10.xxx.xxx.xx 43.95 GB 256 33.3%
b1e56789-8a5f-48b0-9b76-e0ed451754d4 RAC1
What type of compaction are you using?
You've asked a sort of 'magic bullet' question, as there could be several factors in play. These are examples but not limited to:
A. Size of data, and the whole rows in Cassandra (you can see these with nodetool cf_stats table_size entries). Rows that result in a binary size of larger than 16M will be seen as "ultra" wide rows, which might be an indicator your schema in your data model needs a 'compound' or 'composite' row key.
B. Type of setup you have with respects to replication and network strategy.
C. Data entry point, how Cassandra gets it's data. Are you using Python? PHP? What inputs the data? You can get funky behavior from a cluster with a bad PHP driver (for example)
D. Vnodes are good, but can be bad. What version of Cassandra are you running? You can find out via CQLSH with cqlsh -3 then type 'show version'
E. Type of compaction is a big killer. Are you using SizeTieredCompaction or LevelCompaction?
Start by running 'nodetool cfstats' from command line on the server any given node is running on. The particular areas of interest would be (at this point)
Compacted row minimum size:
Compacted row maximum size:
More than X amount of bytes in size here on systems with Y amount of RAM can be a significant problem. Be sure Cassandra has enough RAM and that the stack is tuned.
The default configuration for performance on Cassandra should normally be enough, so the next step would be to open a CQLSH interface to the node with 'cqlsh -3 hostname' and issue the command 'describe keyspaces'. Take the known key space name you are running and issue 'describe keyspace FOO' and look at your schema. Of particular interest are your primary keys. Are you using "composite rowkeys" or "composite primary key"? (as described here: http://www.datastax.com/dev/blog/whats-new-in-cql-3-0 )If not, you probably need to depending on read/write load expected.
Also check how your initial application layer is inserting data into Cassandra? Using PHP? Python? What drivers are being used? There are significant bugs in Cassandra versions < 1.2.10 using certain Thrift connectors such as the Java driver or the PHPcassa driver so you might need to upgrade Cassandra and make some driver changes.
In addition to these steps also consider how your nodes were created.
Note that migration from static nodes to virtual nodes (or vnodes) has to be mitigated. You can't simply switch configs on a node that's already been populated. You will want to check your initial_token: settings in /etc/cassandra/cassandra.yaml. The questions I ask myself here are "what initial tokens are set? (no initial tokens for vnodes) were the tokens changed after the data was populated?" For static nodes which I typically run, I calculate them using a tool like: [http://www.geroba.com/cassandra/cassandra-token-calculator/] as I've run into complications with vnodes (though they are much more reliable now than before).
I have a Cassandra installation which contains a table with no more then 110k records.
I'm getting quite a lot of troubles querying the data using PDI 5.3 (the latest version). I am constantly getting out of memory on Cassandra side.
Granted that the server I have Cassandra installed is not the greatest, 4Gb RAM and only 2 cores, I would still expect to perform this simple task without issues.
In cassandra /conf/cassandra-env.sh, I've configured:
MAX_HEAP_SIZE="4G"
HEAP_NEWSIZE="200M"
and now the maximum number of rows I can query is 80k.
The documentation suggests to set MAX_HEAP_SIZE to 1/4th of the machines RAM. But for me that meant 1G and only about 20k rows to query.
I am able to tell how many rows I can query by limiting the select, with the limit keyword, inside the Cassandra input step in PDI.
Are there any other parameters I can tweak to get better performance? This is a development server, on production I'll be expecting queries with 1mil+ rows.
Server on which Cassandra is installed: Red Hat Enterprise Linux Server release 6.6 (Santiago)
Cassandra version: apache-cassandra-2.1.2
Edit: versions updated.
Sacrifice IO for Memory (since memory is killing you):
lower key / row caches if they are enabled (key cache is on by default)
if you carry out lots of deletes you can lower gc_grace_seconds to remove tombstones quicker (assuming you many range scans which you do if you fetch 80k rows, this can help)
Some other ideas:
Paginate (Select 0-10k of 80k, then 10-20k etc.
Check sizes of memtables, if they are too large lower them.
Use tracing to verify what you are retrieving (tombstones can cause lots of overhead)
This thread suggests lowering the commit_log size, but the commit log was heavily revamped and moved offheap in 2.1 and shouldn't be such an issue anymore.