Janusgraph warm-up - cassandra

Janusgraph does some internal activities due to which there are spikes in starting 1-2 runs.. (even at 10 QPS) and after that it gets stable. Once it's stable, no spikes are observed, is it expected behaviour, that needed some warm-up run to make it stable? Using Back-end Cassandra CQL.

Cassandra can exhibit behavior like this on startup. If there are files in the commitlog (directory), those will validate and write their data to disk. If there are a lot of them, there can be a bit of an increase in resource consumption.

Related

The disk read/write rate and cpu usage of cassandra db intermittently bounce

The disk read/write rate and cpu usage of cassandra db intermittently bounce.
Casssandra was installed with docker, and node exporter and process exporter were used for monitoring. Node and process exporter are all installed with Docker.
I checked the process exporter at the time it bounced. The process that consumed the most resources during the bounced time has Java in the groupname. I'm guessing that there might be a problem with cassandra java.
No more special traffic came in at the time of the bounce.
It does not match the compaction cycle.
Clustering is not broken.
cassandra version is 4.0.3
In Cassandra 4 you have the ability to access swiss java knife (sjk) via nodetool and one of the things you get access to is ttop.
If you run the following in your cassandra env during the time your cpu is spiking you can see which threads are the top consumers, which then allows you to dial in on those threads specifically to see if there is an actual problem.
nodetool sjk ttop >> $(hostname -i)_ttop.out
Allow that to run to completion (during a period of reported high cpu), or at least for 5-10min or so if you decide to kill it early. This will collect a new iteration every few seconds, so once complete, parse the results to see which threads are regularly top consumers and what percentage of the cpu they are actually using, then you'll have a targeted approach at where to troubleshoot for potential problems in the jvm.
If nothing good turns up, go for a thread dump next for a more complete look and I recommend the following script:
https://github.com/brendancicchi/collect-thread-dumps

Is infinispan cache usable as a cluster cache of small resource nodes?

Suppose, I have a lot of nodes with small resources on memory and cpu maybe 5 or maybe 20.
These nodes are not really reliable, they may be switched of by the User.
They all use a database for readonly master data which will be delivered by a kafka topic connected to from each node.
What I want to achieve is to use infinispan as a distributed[replicated] cache above the database used by the nodes, so that at any node at any point on time has the same "view" on the readonly database.
Can I get this working, especially with low resources and if yes, is there any Link to an example for getting expirience?
Thanx
I don't think you can get a definite answer here, you need to try it out. I wouldn't call 5 - 20 CPUs small resources; there's not much going on in background when you're not actively reading/writing the cache so there shouldn't be any 'constant' overhead - just JGroups' heartbeat messages and such.
When using off-heap memory, Infinispan can be started with pretty small JVM heaps (24 MB IIRC, just for the POC), so you might be fine. However if you'll replicate the database on every node it's going to occupy some memory.
If the nodes often come and go, it could cause some churn on CPU. In replicated mode leaves won't matter too much, but when a node joins it will be getting all the data (from different nodes).

Can I "nice" Cassandra nodetool repair

Will any issues arise if I deprioritize the Cassandra "nodetool repair" command using "nice" ? It causes high CPU "user time" load and is having a negative impact on our production systems, causing API timeouts on our Usergrid implementation. I see documentation on limiting network throughput, but iowait does not appear to be the issue. Additionally, are there any good methods for mitigating this problem?
The nodetool command doesn't actually do any work. It just calls a JMX operation in C* to kick off the repair and then listens for updates to print out. Doing nice wont make any difference. There are a couple main phases to the repair
build merkle trees (on each node)
stream changes
compactions
Possibly the validation compaction (on some versions can be controlled with compaction throttle) or the streams (can set stream throughput via nodetool or cassandra.yaml) are burning your CPU. If so can try using the throttles, but in some versions it wont make a difference.
After the repair is completed there are normal compactions that kick off for anti compaction in incremental repairs, and also for full repairs if theres a lot of differences streamed. Some problems are very version specific, so pay attention to logs and when CPU is high to drill down more.

when does Cassandra node fail?

How does cassandra guarantee no failure of node at any given point of time,i know data is replicated so there might not be issues of losing the data
Cassandra nodes can fail due to alot of reasons like, very heavy write, out of memory error, hardware failure, tombstone limit 100k error, compaction failures, network errors, and so on.
Cassandra cannot guarantee no failure of node, because it just like any other software is vulnerable to dependent component and hardware.
What it does guarantee is that you won't have data loss, until you have minimum number of required nodes up and running, based on replication factor.
Cassandra could not guarantee no failure of nodes like any other systems, but with a correct setup of cassandra cluster, with enough number of nodes and replicas configured, even some of the nodes down, the entire cluster will still be available and no data lost, which could be transparent to clients. Clients will not realize it.

Cassandra compaction tasks stuck

I'm running Datastax Enterprise in a cluster consisting of 3 nodes. They are all running under the same hardware: 2 Core Intel Xeon 2.2 Ghz, 7 GB RAM, 4 TB Raid-0
This should be enough for running a cluster with a light load, storing less than 1 GB of data.
Most of the time, everything is just fine but it appears that sometimes the running tasks related to the Repair Service in OpsCenter sometimes get stuck; this causes an instability in that node and an increase in load.
However, if the node is restarted, the stuck tasks don't show up and the load is at normal levels again.
Because of the fact that we don't have much data in our cluster we're using the min_repair_time parameter defined in opscenterd.conf to delay the repair service so that it doesn't complete too often.
It really seems a little bit weird that the tasks that says that are marked as "Complete" and are showing a progress of 100% don't go away, and yes, we've waited hours for them to go away but they won't; the only way that we've found to solve this is to restart the nodes.
Edit:
Here's the output from nodetool compactionstats
Edit 2:
I'm running under Datastax Enterprise v. 4.6.0 with Cassandra v. 2.0.11.83
Edit 3:
This is output from dstat on a node that behaving normally
This is output from dstat on a node with stucked compaction
Edit 4:
Output from iostat on node with stucked compaction, see the high "iowait"
azure storage
Azure divides disk resources among storage accounts under an individual user account. There can be many storage accounts in an individual user account.
For the purposes of running DSE [or cassandra], it is important to note that a single storage account should not should not be shared between more than two nodes if DSE [or cassandra] is configured like the examples in the scripts in this document. This document configures each node to have 16 disks. Each disk has a limit of 500 IOPS. This yields 8000 IOPS when configured in RAID-0. So, two nodes will hit 16,000 IOPS and three would exceed the limit.
See details here
So, this has been an issue that have been under investigation for a long time now and we've found a solution, however, we aren't sure what the underlaying problem that were causing the issues were but we got a clue even tho that, nothing can be confirmed.
Basically what we did was setting up a RAID-0 also known as Striping consisting of four disks, each at 1 TB of size. We should have seen somewhere 4x one disks IOPS when using the Stripe, but we didn't, so something was clearly wrong with the setup of the RAID.
We used multiple utilities to confirm that the CPU were waiting for the IO to respond most of the time when we said to ourselves that the node was "stucked". Clearly something with the IO and most probably our RAID-setup was causing this. We tried a few differences within MDADM-settings etc, but didn't manage to solve the problems using the RAID-setup.
We started investigating Azure Premium Storage (which still is in preview). This enables attaching disks to VMs whose underlaying physical storage actually are SSDs. So we said, well, SSDs => more IOPS, so let us give this a try. We did not setup any RAID using the SSDs. We are only using one single SSD-disk per VM.
We've been running the Cluster for almost 3 days now and we've stress tested it a lot but haven't been able to reproduce the issues.
I guess we didn't came down to the real cause but the conclusion is that some of the following must have been the underlaying cause for our problems.
Too slow disks (writes > IOPS)
RAID was setup incorrectly which caused the disks to function non-normally
These two problems go hand-in-hand and most likely is that we basically just was setting up the disks in the wrong way. However, SSDs = more power to the people, so we will definitely continue using SSDs.
If someone experience the same problems that we had on Azure with RAID-0 on large disks, don't hesitate to add to here.
Part of the problem you have is that you do not have a lot of memory on those systems and it is likely that even with only 1GB of data per node, your nodes are experiencing GC pressure. Check in the system.log for errors and warnings as this will provide clues as to what is happening on your cluster.
The rollups_60 table in the OpsCenter schema contains the lowest (minute level) granularity time series data for all your Cassandra, OS, and DSE metrics. These metrics are collected regardless of whether you have built charts for them in your dashboard so that you can pick up historical views when needed. It may be that this table is outgrowing your small hardware.
You can try tuning OpsCenter to avoid this kind of issues. Here are some options for configuration in your opscenterd.conf file:
Adding keyspaces (for example the opsc keyspace) to your ignored_keyspaces setting
You can also decrease the TTL on this table by tuning the 1min_ttlsetting
Sources:
Opscenter Config DataStax docs
Metrics Config DataStax Docs

Resources