I created a cassandra cluster and session.
val newSession = Cluster.builder().addContactPoint("localhost")
.build().newSession()
newSession.execute("SELECT * from workout.competitorsentity")
return newSession
However when I go to the jconsole for this jvm I don't see the cassandra jvm metrics.
What am I doing wrong?
You can't see Cassandra metrics in driver - driver exposes only its own metrics, and they are under the cluster1-metrics part of the tree (see driver documentation about metrics for more details).
If you need to see Cassandra metrics, you need to connect JConsole to Cassandra process(-es) itself (by default they are available only on localhost interface, so you may need to expose them outside if you're connecting from another machine).
Related
Typically port 9042 is provided with the Cassandra cluster seed hosts to connect to Cassandra cluster for performing CRUD operations.
Does Cassandra Java client driver has knowledge of port 7000 (used for peer communication) after the client establishes connection with Cassandra cluster?
Thanks,
Deepak
The Java driver doesn't make use of the internode communication port 7000 because it doesn't need to participate in gossip with the nodes in the cluster.
Instead, the Java driver establishes a control connection with one of the nodes the first time it connects to the cluster. The driver uses the control connection (1) to query the system tables to discover the cluster's topology and schema, and (2) listen to topology and schema changes.
It is in point (2) above that the driver recognises when nodes are added or decommissioned from the cluster as well as find out when the schema has been updated. This is the reason the driver doesn't need to gossip with the nodes.
For more information, see Control connection for the Java driver. Cheers!
No, the Java driver doesn't know and should not know about in-cluster node-to-node communications. Why should it, anyway?
Is Datastax Cassandra community edition integration with Spark community edition using spark-cassandra-connector community edition node aware or is this feature reserved for Enterprise editions only?
By node awareness I mean if Spark will send job execution to the nodes owning the data
Yes, the Spark connector is node-aware and will function in that manner with both DSE and (open source) Apache Cassandra.
In fact on a SELECT it knows how to hash the partition keys to a token, and send queries on specific token ranges only to the nodes responsible for that data. It can do this, because (like the Cassandra Java driver) it has a window into node-to-node gossip and can see things like node status (up/down) and token range assignment.
In Spark it is referred to as data locality.
Data locality can only be achieved if both the JVMs for Cassandra and the Spark worker/executor are co-located within the same OSI. By definition, data can only be local if the executors doing the processing are running on the same server (OSI) as the Cassandra node.
During the initial contact with the cluster, the driver retrieves information about the cluster topology -- available nodes, rack/DC configuration, token ownership. Since the driver is aware of where nodes are located, it will always try to connect to the "closest" node in the same (local) data centre.
If the Spark workers/executors are co-located with the Cassandra node, the Spark-Cassandra-connector will process Spark partitions on the nodes that own the data where possible to reduce the amount of data shuffling across the network.
There are methods such as joinWithCassandraTable() which maximise data locality where possible. Additionally, the repartitionByCassandraReplica() method splits the Spark partitions so they are mapped to Cassandra replicas that own the data.
This functionality works for both open-source Cassandra clusters and DataStax Enterprise clusters. Cheers!
How can we use Datastax Java Driver to know down nodes in Cassandra Cluster? Does metadata of driver is updated continuously or do we have to register any listeners?
The driver consumes gossip info with the cluster. If a node is down, it’ll know it and not route traffic to it. No worries about engineering anything to do it yourself.
I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia
In cassandra api which class starts JMX Server thread....?
Is it StandardMBean or StorageService...?
functionality of cassandra wrapper.exe
The JVM starts JMX services automatically, Cassandra just registers MBeans for different services, which automatically results in them being exposed through JMX.