I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia
Related
I need to provide similar utility functions such as is available through
nodetool tablestats
I've gone over their source code but didn't find a convenient solution to accessing it through code.
Is there a library available for this?
https://github.com/mariusae/cassandra/blob/master/src/java/org/apache/cassandra/tools/NodeProbe.java
The nodetool utility is connecting Cassandra via JMX and fetch all necessary data from corresponding beans. You can fetch necessary data from your program via JMX as well, but I wouldn't say that this is recommended way to do - it's better to setup some "standard" monitoring solution, like, Prometheus, connect it to Cassandra, and fetch data via it...
The datastax has restricted the use of opscenter for enterprise users only. Is there any way or possibility that even an open source casandra user can have access to the opscenter? Kindly check the follwing image and let me know if there is any possibility of the usage or opscenter?
I am trying to connect it but gettng the following error:
No. However you could consider creating a monitor via the metrics exposed by JMX instead.
In other dbs, we connect to db cluster with load balance IP. How do we connect to cassandra cluster using command line? What socket is used? Is this always a single node and IP?
What if i connect o node1, and node1 goes down. Will this automatically connect to node2 or node3?
You have several options: the easiest one is to use the Cassandra Query Language Shell (CQLSH), which is a python based CQL interpreter to interact with Cassandra. It usually comes with every Cassandra installation, under the /bin folder of the installation directory. If you have ssh access to one of the nodes Cassandra is running onto, this can be an easy option (you will avoid any issues related to firewall blocking incoming connections to your cluster).
You can also use cqlsh to access remotely to the cluster:
cqlsh node_ip 9043
but this will require cqlsh to be present on your machine.
In general, Cassandra uses an initial set of contact nodes and a gossip protocol to contact and learn the cluster composition. You will be assigned a node as coordinator for your query. You may not worry about seed nodes being currently down, provided that at least one is up and running.
Another option to access remotely to the cluster is the Datastax DevCenter,which is a free-to-use grafical interface to execute CQL queries.
Hope this helps
I have a 5 node Cassandra cluster set up on EC2, all in the same region.
If I connect over cqlsh (9160), queries respond in under a second.
When I connect via Dev Center, or using the native Java Driver, both of which use port 9042, the queries take over 20 seconds to respond.
They consistently respond in the same 21 second region. Never fast and then slow.
I have set up a few Cassandra Clusters on EC2 and have seen this before but do not know how to fix the problem. The last time, I scrapped the cluster and built a new one and the response time on port 9042 was fine.
Any help in how to debug or fix this problem would be appreciated, thanks.
The current version of DevCenter was designed to support as main scenario running (longish) CQL scripts (vs an interactive console with queries executed one after another). DevCenter is using as an underlying connector the DataStax Java driver for Cassandra.
For the above mentioned scenario, in order to ensure there are no "conflicts", a new Session is created for each execution. When a Session is initialized, the driver performs an auto-node discovery, creates connection pools, etc. Basically it does a lot of preparation work. Depending on the latency from your client machine to the EC2 nodes, the size of the cluster and also the configuration of these nodes (see the connection requirements), this initialization phase can be quite expensive.
As you can imagine the time spent preparing wouldn't represent a large percentage of running a DDL script and a decent size of inserts/updates. But for an interactive scenario, it will result in a suboptimal behavior (the one you are describing)
The next version(s) of DevCenter will address the interactive scenario and optimize for it so the user experience would be what you'd expect. And supporting this scenario is pretty high on our list of priorities.
The underlying Java driver obtains the whole cluster topology when it initially connects. This enables it to automatically connect to any node in the cluster. On EC2 it only obtains the private addresses, tries each one, and then times out. It then sends the request over the initial connection
I am trying to learn Cassandra and have setup a 2 node Cassandra cluster. I have written a client in Java using cassandra jdbc driver, which currently connects to a hard coded single node in the cluster. Ideally, I would like my client to connect to the "cluster" rather then a specific node.
So that client code automatically connects to other node if the first node is down.
Is this possible using cassandra jdbc driver? Currently using below code to create connection
DriverManager.getConnection("jdbc:cassandra://localhost:9160/testdb");
Yes. If you're using the Datastax Java driver, you can get all of these benefits and more. From the documentation:
The driver has the following features:
connection pooling
node discovery
automatic failover
load balancing
What is your language? If you're using Java, I suggest for Hector framework.
http://hector-client.github.io/hector/build/html/index.html
I think it's very good for correspond on Cassandra db.