The usual method for installing the opscenter agent whether via the opscenter server or manually is to install it on the cassandra node server.
Is the agent configuration flexible enough that it can run anywhere and monitor cassandra remotely? Are there complexities that may not be readily apparent?
tl;dr - No
While there are several config options that would let the agent monitor certain JMX metrics remotely, you would lose critical metrics and functionality such as disk and CPU metrics, and the ability to stop/start/configure the Cassandra process. These all require the agent to be running locally.
Related
What is the best way to monitor if cassandra nodes are up? Due to security reasons JMX and nodetool is out of question. I have cluster metrics monitoring via Rest Api, but I understand that even if a node goes Rest Api will only report on a whole cluster.
Well, I have integrated a system where I can monitor all the metrics regarding to my cluster of all nodes. This seems like complicated but pretty simple to integrate. You will need the following components to build up a monitoring system for cassandra:
jolokia jar
telegraf
influxdb
grafana
I'm writing a short procedure, how it works.
Step 1: copy jolokia jvm jar to install_dir/apache-cassandra-version/lib/ , jolokia jvm agent can be downloaded from anywhere in google.
Step 2: add the following line to install_dir/apache-cassandra-version/conf/cassandra-env.sh
JVM_OPTS="$JVM_OPTS -javaagent:<here_goes_the_path_of_your_jolokia_jar>"
Step 3: install telegraf on each node and configure the metrics you want to monitor. and start telegraf service.
Step 4: install grafana and configure your ip, port, protocol. grafana will give you a dashboard to look after your nodes and start grafana service. Your metrics will be able get visibility here.
Step 5: install influxdb on another server from where you want to store your metrics data which will come through telegraf agent.
Step 6: browse the ip you have mentioned, where you have launched your grafana through browser and add data source ip (influxdb ip), then customize your dashboard.
image source: https://blog.pythian.com/monitoring-cassandra-grafana-influx-db/
This is not for monitoring but only for node state.
Cassandra CQL driver provides info if a particular node is UP or DOWN with Host.StateListener Interface. This info is used by driver to mark a node UP or Down. Thus it could be used if node is down or up if JMX is not accessible.
Java Doc API : https://docs.datastax.com/en/drivers/java/3.3/
I came up with a script which listens for DN nodes in the cluster and reports it to our monitoring setup which is integrated with pagerduty.
The script runs on one of our nodes and executes nodetool status every minute and reports for all down nodes.
Here is the script https://gist.github.com/johri21/87d4d549d05c3e2162af7929058a00d1
[1]:
How to setup a fail-over node for Cassandra Opscenter. The Opscenter data is stored on Opscenter node itself. So to setup a failover node i need to setup an Opscenter different from current Opscenter and sync Opscenter data and config files between Opscenters.
The stomp_interface on nodes in the cluster are pointed towards Opscenter_1 how will it change automatically to Opscenter_2 when failover occurs??
There are steps on the datastax documentation that have details for doing this. At a minimum:
Mirror the configuration directories stored on the OpsCenter primary to the OpsCenter backup using the method you prefer.
On the backup OpsCenter in the failover directory, create a primary_opscenter_location configuration file that indicates the IP address of the primary OpsCenter daemon to monitor
The stomp_interface setting on the agents gets changed (address.yaml file updated as well) when failover occurs. This is why the documentations recommend making sure there is no 3rd party configuration management on it.
3 things :
If you have firewall on, allow the corresponding ports to communicate (61620,61621,9160,9042,7199)
always verify IF the cassandra is up and running, so agent can actually connect to something.
stop the agent, check again the address.yaml, restart the agent.
I have install 3 arangodb servers. But i have always the same listening port 8529 no 8530 for coordinator so i cannot create a cluster.
tcp 0 0 0.0.0.0:8529 0.0.0.0:* LISTEN 13142/arangod
So when i try to create a cluster via the web interface, i have the following error
ERROR bootstrapping DB servers failed: Could not connect to 'tcp://10.0.0.18:8530' 'connect() failed with #111 - Connection refused'
How can i start and/or configure the corrdinator to have a listen on my servers?
Regards
Dispatcher based clusters
Please note that dispatcher based setups as you asked are intended for evaluation purposes only.
To start a cluster from the dispatcher webfrontend you need to configure all nodes to start the arangod daemon in dispatcher mode:
[cluster]
disable-dispatcher-kickstarter = no
disable-dispatcher-frontend = no
To start a cluster on a single machine you only need to install ArangoDB and reconfigure it once; it will then use the same installation to start the dispatcher and dbserver nodes.
One should know that the initial cluster startup may take a while.
Another side note is that authentication is not supported in this scenario, so you may need to turn it off.
You should now find the log output of the dbserver and coordinator instances under /var/log/arangodb/cluster/ so you can get the actual informations of what went wrong.
Script based cloud install clusters
A better way to get a cluster running in the cloud may be to use one of the scripts we prepared for Digital Ocean, Google Compute Engine, AWS or Azure.
ArangoDB Clusters based on Mesosphere DCOS
The currently recommended way of running an ArangoDB cluster is to use Mesosphere DCOS, as Max describes in these slides using some example configurations.
ArangoDB is an official Mesosphere partner and we offer an official DCOS subcommand to manage an ArangoDB Cluster on Mesosphere DCOS.
Mesosphere adds additional services on top of Mesos and eases management of the Mesos cluster via the dcos-cli.
If you want to use a raw Apache Mesos Cluster, you can use the Mesos framework directly to schedule the neccessary tasks to create ArangoDB cluster.
Meanwhile there is a better article about Running ArangoDB on DC/OS.
I am new to Cassandra and trying to setup monitoring to Cassandra production cluster.
Apart from monitoring using nodetool commands in crontab what else is recommended?
is it a general practice to use ganglia for monitoring?
can you direct me to a good resource on setting up monitoring in production.
we are using apache cassandra so opscenter was not very useful.
The free version of OpsCenter works with OSS Cassandra and most monitoring capabilities are available. You do miss a good amount of cluster management capabilities if you don't have DSE:
http://www.datastax.com/what-we-offer/products-services/datastax-opscenter/compare
I would like to know how OpsCenter communicates with its Agents and Cassandra Nodes.
Does it use Thrift? Is JMX required?
I'll base my answer on the latest released version of OpsCenter (1.3).
The main OpsCenter process can communicate with the agents in two ways. It can query the agents over an http rest api that each agent exposes. It uses this to ask the agent basic things about the cassandra node and also to have the agent send jmx commands to the cassandra process.
The other way is using the STOMP protocol. (http://stomp.github.com//) Agents send messages over STOMP to a message queue in OpsCenter. These generally contain details about the cassandra node and metric information.
Hope that helps.