Apache Cassandra monitoring - cassandra

What is the best way to monitor if cassandra nodes are up? Due to security reasons JMX and nodetool is out of question. I have cluster metrics monitoring via Rest Api, but I understand that even if a node goes Rest Api will only report on a whole cluster.

Well, I have integrated a system where I can monitor all the metrics regarding to my cluster of all nodes. This seems like complicated but pretty simple to integrate. You will need the following components to build up a monitoring system for cassandra:
jolokia jar
telegraf
influxdb
grafana
I'm writing a short procedure, how it works.
Step 1: copy jolokia jvm jar to install_dir/apache-cassandra-version/lib/ , jolokia jvm agent can be downloaded from anywhere in google.
Step 2: add the following line to install_dir/apache-cassandra-version/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -javaagent:<here_goes_the_path_of_your_jolokia_jar>"
Step 3: install telegraf on each node and configure the metrics you want to monitor. and start telegraf service.
Step 4: install grafana and configure your ip, port, protocol. grafana will give you a dashboard to look after your nodes and start grafana service. Your metrics will be able get visibility here.
Step 5: install influxdb on another server from where you want to store your metrics data which will come through telegraf agent.
Step 6: browse the ip you have mentioned, where you have launched your grafana through browser and add data source ip (influxdb ip), then customize your dashboard.
image source: https://blog.pythian.com/monitoring-cassandra-grafana-influx-db/

This is not for monitoring but only for node state.
Cassandra CQL driver provides info if a particular node is UP or DOWN with Host.StateListener Interface. This info is used by driver to mark a node UP or Down. Thus it could be used if node is down or up if JMX is not accessible.
Java Doc API : https://docs.datastax.com/en/drivers/java/3.3/

I came up with a script which listens for DN nodes in the cluster and reports it to our monitoring setup which is integrated with pagerduty.
The script runs on one of our nodes and executes nodetool status every minute and reports for all down nodes.
Here is the script https://gist.github.com/johri21/87d4d549d05c3e2162af7929058a00d1
[1]:

Related

pull out metrics from spark logs

how do I pull out these metrics from spark history logs? Is there some api I can pull these from?
I tried downloading the json event logs, but I can't grep for the numbers seen in the photo
The spark history server keeps all that information for you. You can access it via a rest API.
If you are on EMR:
You can view the Spark web UIs by following the procedures to create
an SSH tunnel or create a proxy in the section called Connect to the
cluster in the Amazon EMR Management Guide and then navigating to the
YARN ResourceManager for your cluster. Choose the link under Tracking
UI for your application. If your application is running, you see
ApplicationMaster. This takes you to the application master's web UI
at port 20888 wherever the driver is located. The driver may be
located on the cluster's primary node if you run in YARN client mode.
If you are running an application in YARN cluster mode, the driver is
located in the ApplicationMaster for the application on the cluster.
If your application has finished, you see History, which takes you to
the Spark HistoryServer UI port number at 18080 of the EMR cluster's
primary node. This is for applications that have already completed.
You can also navigate to the Spark HistoryServer UI directly at
http://master-public-dns-name:18080/.

Thingsboard cluster setup

Building a Thingsboard cluster
I need help setting up a Thingsboard cluster, the documentation online is very limited.
The cluster will contain 2 Zookeeper nodes and 4 Thingsboard nodes with Cassandra DB.
Should Zookeeper be installed separately?
A step-by-step guide would be much appreciated!
I cannot provide you detailed step-by-step instructions to setup a ThingsBoard cluster. I can point you into the right direction by sharing the different documents you need to do so.
Bottom line, the following tasks must be completed:
Install and configure a ZooKeeper ensemble.
Check the ZooKeeper documentation for further installation details. Keep in mind that you need at least three different ZK-nodes in a clustered environment and that you always need an odd number of ZK nodes (3,5,7,...). It is a very very very bad idea to build a cluster consisting out of two ZK-nodes, check split brain condition that might appear under these circumstances! Basically you setup the number of individual nodes you wish to use and change the configuration file to enable the different nodes as an ensemble. This is documented quite well in the ZK-docs.
Install and configure a Cassandra cluster.
Again you will setup the number of individual nodes you need for your Cassandra cluster and modify the individual configuration files to convert them into a Cassandra cluster. Check Cassandra documentation for details. Be sure to check proper configuration using the nodetool status command as described at the end of the document. All your nodes should be up and running.
Install and configure a ThingsBoard cluster.
Use the instructions provided with ThingsBoard single node setup.
Install Java
Skip External database installation
ThingsBoard service installation
Configure ThingsBoard to use the external database - Cassandra
Go to Cluster setup and apply the configuration steps depicted (ZK, Cassandra and RPC). Keep in mind to point to ALL members of your ZK, Cassandra cluster. You can also use IP-addresses instead of host names.
Return to single node setup and run the installation script at ONE NODE only!
Start ThingsBoard service
If everything went well, you should be able to access your ThingsBoard nodes directly using the URL http://[NODE_IP]:8080. You can verify proper cluster operation by creating a tenant on one node and check its presence on another node.
I don't know if using an even number of ThingsBoard nodes is a good idea. The documentation does not mention anything about this.
One final remark, you could/should consider putting a proxy in front of your ThingsBoard cluster to provide load balancing to your web clients and improve user experience. This way you shouldn't share the individual host addresses with your users and you will prevent node overloading due to the fact that everybody is using the same web-address to access your dashboard(s). You could also proxy your MQTT broker to provide load balancing as well.
Good luck in setting up your cluster!
Zookeeper needs at least 3 nodes to run in a cluster mode. Each node voting and the valid replica count to gain the QUORUM is 3.

Cassandra JMX need to connect all the nodes

I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia

Issue to start coordinator

I have install 3 arangodb servers. But i have always the same listening port 8529 no 8530 for coordinator so i cannot create a cluster.
tcp 0 0 0.0.0.0:8529 0.0.0.0:* LISTEN 13142/arangod
So when i try to create a cluster via the web interface, i have the following error
ERROR bootstrapping DB servers failed: Could not connect to 'tcp://10.0.0.18:8530' 'connect() failed with #111 - Connection refused'
How can i start and/or configure the corrdinator to have a listen on my servers?
Regards
Dispatcher based clusters
Please note that dispatcher based setups as you asked are intended for evaluation purposes only.
To start a cluster from the dispatcher webfrontend you need to configure all nodes to start the arangod daemon in dispatcher mode:
[cluster]
disable-dispatcher-kickstarter = no
disable-dispatcher-frontend = no
To start a cluster on a single machine you only need to install ArangoDB and reconfigure it once; it will then use the same installation to start the dispatcher and dbserver nodes.
One should know that the initial cluster startup may take a while.
Another side note is that authentication is not supported in this scenario, so you may need to turn it off.
You should now find the log output of the dbserver and coordinator instances under /var/log/arangodb/cluster/ so you can get the actual informations of what went wrong.
Script based cloud install clusters
A better way to get a cluster running in the cloud may be to use one of the scripts we prepared for Digital Ocean, Google Compute Engine, AWS or Azure.
ArangoDB Clusters based on Mesosphere DCOS
The currently recommended way of running an ArangoDB cluster is to use Mesosphere DCOS, as Max describes in these slides using some example configurations.
ArangoDB is an official Mesosphere partner and we offer an official DCOS subcommand to manage an ArangoDB Cluster on Mesosphere DCOS.
Mesosphere adds additional services on top of Mesos and eases management of the Mesos cluster via the dcos-cli.
If you want to use a raw Apache Mesos Cluster, you can use the Mesos framework directly to schedule the neccessary tasks to create ArangoDB cluster.
Meanwhile there is a better article about Running ArangoDB on DC/OS.

Opscenter Agent 4.1.2: Remote Monitoring

The usual method for installing the opscenter agent whether via the opscenter server or manually is to install it on the cassandra node server.
Is the agent configuration flexible enough that it can run anywhere and monitor cassandra remotely? Are there complexities that may not be readily apparent?
tl;dr - No
While there are several config options that would let the agent monitor certain JMX metrics remotely, you would lose critical metrics and functionality such as disk and CPU metrics, and the ability to stop/start/configure the Cassandra process. These all require the agent to be running locally.

Resources