Is it possible to retrieve token-to-node assignment information (aka the ring state) via thrift or CQL api. I am looking for output similar to what nodetool ring command returns? I need that to optimize a client application a bit so that it goes directly to the node that contains the requested data hereby saving one network hop.
The thrift interface has the method describe_ring that gives you back this information.
In CQL this information is in the system.peers table:
select * from system.peers;
Related
From Documentation It is cleared that in cassandra 4.0 virtual tables are read only and no writes allowed.
Currently there are 2 vtables available i.e system_views and system_virtual_schema and it contains 17 tables.
This contains data like clients,cache ,settings etc.
Where this data is exactly coming from in vtables?
Here are all vtables: https://github.com/apache/cassandra/tree/64b338cbbce6bba70bda696250f3ccf4931b2808/src/java/org/apache/cassandra/db/virtual
PS: I have gone through cassandra.yaml
Reference : https://cassandra.apache.org/doc/latest/new/virtualtables.html
The virtual tables store metrics data that was previously only available via JMX but now also available via CQL.
For example, the system_view.clients table tracks metadata on client connection including (but not limited to):
the remote IP address of the client
logged in user (if auth is enabled)
protocol version
driver name & version
whether SSL is used, etc
This data is available via JMX and nodetool clientstats, and is now retrievable via CQL (I wrote about this in https://community.datastax.com/questions/6113/). Cheers!
This question already has answers here:
how Cassandra chooses the coordinator node and the replication nodes?
(2 answers)
Closed 3 years ago.
I don't understand the load balancing algorithm in cassandra.
It seems that the TokenAwarePolicy can be used to route my request to the coordinator node holding the data. In particular, the documentation states (https://docs.datastax.com/en/developer/java-driver/3.6/manual/load_balancing/) that it works when the driver is able to automatically calculate a routing-key. If it can, I am routed to the coordinator node holding the data, if not, I am routed to another node. I can still specify the routing-key myself if I really want to reach the data without any extra hop.
What does not make sense to me:
If the driver cannot calculate the routing-key automatically, then why can the coordinator do this? Does it have more information than the client driver? Or does the coordinator node then ask every other node in the cluster on my behalf? This would then not scale, right?
I thought that the gossip protocol is used to share the topology of the ring among all nodes (AND the client driver). The client driver than has the complete 'ring' structure and should be equal to any 'hop' node.
Load balancing makes sense to me when the client driver determines the N replicas holding the data, and then prioritizes them (host-distance, etc), but it doesn't make sense to me when I reach a random node that is unlikey to have my data.
Token aware load balancing happens only for that statements that are able to hold routing information. For example, for prepared queries, driver receives information from cluster about fields in query, and has information about partition key(s), so it's able to calculate token for data, and select the node. You can also specify the routing key youself, and driver will send request to corresponding node.
It's all explained in the documentation:
For simple statements, routing information can never be computed automatically
For built statements, the keyspace is available if it was provided while building the query; the routing key is available only if the statement was built using the table metadata, and all components of the partition key appear in the query
For bound statements, the keyspace is always available; the routing key is only available if all components of the partition key are bound as variables
For batch statements, the routing information of each child statement is inspected; the first non-null keyspace is used as the keyspace of the batch, and the first non-null routing key as its routing key
When statement doesn't have routing information, the request is sent to node selected by nested load balancing policy, and the node-coordinator, performs parsing of the statement, extract necessary information and calculates the token, and forwards request to correct node.
I'm interested in finding out which Cassandra replica responded to a read or write request performed at the ONE consistency level. Is there some way I can do this?
Running your queries with TRACING ON will get you that information. If you are using the Java driver, most of the trace information can be fetched via the ExecutionInfo class which you can get by calling ResultSet.getExecutionInfo. Else query the system_traces keyspace as the documentation suggests.
We have 2 node Cassandra cluster. Replication factor is 1 and consistency level is 1. We are not using replication as the data we are inserting is very huge for each record.
How does Cassandra reacts when a node is down when write is being performed in that node? We are using Hector API from Java client.
My understanding is that Cassandra will perform the write to other node which is running.
No, using CL.ONE the write would not be performed if the inserted data belongs to the tokenrange of the downed node. The consistency level defines how many replica nodes have to respond to accept the request.
If you want to be able to write, even if the replica node is down, you need to use CL.ANY. ANY will make sure that the coordinator stores a hint for the request. Hints are stored in System.Hints table. After the replica comes back up again, all hints will be processed and sent to the upcoming node.
Edit
You will receive the following error:
com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)
When using token aware policy as Load Balancing Policy in Cassandra do all the queries are automatically send over the correct node (which contains the replica eg select * from Table where partionkey = something, will automatically get the hash and go to the correct replica) or I have to use token() function with all my queries ?
That is correct. The TokenAwarePolicy will allow the driver to prefer a replica for the given partition key as the coordinator for the request if possible.
Additional information about load balancing with the Java driver is available on the LoadBalancingPolicy API page.
Specifically, the API documentation for TokenAwarePolicy is available here.