Not able to query to two different cassandra cluster in same app - cassandra

I created cassandra session for two separate cluster using datastax driver in one java application. Session got created successfully however when I query, query on first cluster (pick any one) get executed successfully however query on second cluster always fails with below error. Please help me resolving this issue.
com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 0x5f318143588bfa8c5deb2245224cf2da
Note: I have requirement to connect to two separate cluster in same app. Please don't ask why.

From the stack trace, it is likely that you are trying to execute on session 1 a BoundStatement that belongs to session 2. PreparedStatement and BoundStatement instances can only be used with the session that created them.
In your situation, you will need to prepare each statement you plan to use in your application on both sessions.

Related

Which node will respond to "SELECT * FROM system.local" using the Cassandra Java driver?

I am trying to write some synchronization code for a java app that runs on each of the cassandra servers in our cluster (so each server has 1 cassandra instance + our app). For this I wanted to make a method that will return the 'local' cassandra node, using the java driver.
Every process creates a cqlSession using the local address as contactPoint. The driver will figure out the rest of the cluster from that. But my assumption was that the local address would be its 'primary' node, at least for requesting things from the system.local table. This seems not so, when trying to run the code.
Is there a way in the Java driver to determine which of the x nodes the process its running on?
I tried this code:
public static Node getLocalNode(CqlSession cqlSession) {
Metadata metadata = cqlSession.getMetadata();
Map<UUID, Node> allNodes = metadata.getNodes();
Row row = cqlSession.execute("SELECT host_id FROM system.local").one();
UUID localUUID = row.getUuid("host_id");
Node localNode = null;
for (Node node : allNodes.values()) {
if (node.getHostId().equals(localUUID)) {
localNode = node;
break;
}
}
return localNode;
}
But it seems to return random nodes - which makes sense if it just sends the query to one of the nodes in the cluster. I was hoping to find a way without providing hardcoded configuration to determine what node the app is running on.
my assumption was that the local address would be its 'primary' node, at least for requesting things from the system.local table. This seems not so, when trying to run the code.
Correct. When running a query where token range ownership cannot be determined, a coordinator is "selected." There is a random component to that selection. But it does take things like network distance and resource utilization into account.
I'm going to advise reading the driver documentation on Load Balancing. This does a great job of explaining how the load balancing policies work with the newer drivers (>= 4.10).
In that doc you will find that query routing plans:
are different for each query, in order to balance the load across the cluster;
only contain nodes that are known to be able to process queries, i.e. neither ignored nor down;
favor local nodes over remote ones.
As far as being able to tell which apps are connected to which nodes, try using the execution information returned by the result set. You should be able to get the coordinator's endpoint and hostId that way.
ResultSet rs = session.execute("select host_id from system.local");
Row row = rs.one();
System.out.println(row.getUuid("host_id"));
System.out.println();
System.out.println(rs.getExecutionInfo().getCoordinator());
Output:
9788de64-08ee-4ab6-86a6-fdf387a9e4a2
Node(endPoint=/127.0.0.1:9042, hostId=9788de64-08ee-4ab6-86a6-fdf387a9e4a2, hashCode=2625653a)
You are correct. The Java driver connects to random nodes by design.
The Cassandra drivers (including the Java driver) are configured with a load-balancing policy (LBP) which determine which nodes the driver contacts and in which order when it runs a query against the cluster.
In your case, you didn't configure a load-balancing policy so it defaults to the DefaultLoadBalancingPolicy. The default policy calculates a query plan (list of nodes to contact) for every single query so each plan is different across queries.
The default policy gets a list of available nodes (down or unresponsive nodes are not included in the query plan) that will "prioritise" query replicas (replicas which own the data) in the local DC over non-replicas meaning replicas will be contacted as coordinators before other nodes. If there are 2 or more replicas available, they are ordered based on "healthiest" first. Also, the list in the query plan are shuffled around for randomness so the driver avoids contacting the same node(s) all the time.
Hopefully this clarifies why your app doesn't always hit the "local" node. For more details on how it works, see Load balancing with the Java driver.
I gather from your post that you want to circumvent the built-in load-balancing behaviour of the driver. It seems like you have a very edge case that I haven't come across and I'm not sure what outcome you're after. If you tell us what problem you are trying to solve, we might be able to provide a better answer. Cheers!

Sharing hazelcast cache between multiple application and using write behind and read through

Question - Can I share the same hazelcast cluster (cache) between the multiple application while using the write behind and read through functionality using map store and map loaders
Details
I have enterprise environment have the multiple application and want to use the single cache
I have multiple application(microservices) ie. APP_A, APP_B and APP_C independent of each other.
I am running once instance of each application and each node will be the member node of the cluster.
APP_A has MAP_A, APP_B has MAP_B and APP_C has MAP_C. Each application has MapStore for their respective maps.
If a client sends a command instance.getMap("MAP_A").put("Key","Value") . This has some inconsistent behavior. Some time I see data is persistent in database but some times not.
Note - I wan to use the same hazelcast instance across all application, so that app A and access data from app B and vice versa.
I am assuming this is due to the node who handles the request. If request is handle by node A then it will work fine, but fails if request is handled by node B or C. I am assuming this is due to Mapstore_A implementation is not available with node B and C.
Am I doing something wrong? Is there something we can do to overcome this issue?
Thanks in advance.
Hazelcast is a clustered solution. If you have multiple nodes in the cluster, the data in each may get moved from place to place when data rebalancing occurs.
As a consequence of this, map store and map loader operations can occur from any node.
So all nodes in the cluster need the same ability to connect to the database.

How to query Cassandra from a certain node and get the data from only that node own?

Cassandra use consistent hash to manage data, and after we use Cassandra driver to connect the cluster, the node we connect to may query from other nodes in the cluster to get the result. But for my current situation, I'm doing some testing for my algorithm, I want to give a certain tokenRange and query the data in the tokenRange and on a certain node, if some data in the tokenRange isn't in this node, I don't want the node query other node to get the result. Is it possible and how to achieve it?
I find Cassandra Python driver: force using a single node but this solution only provide the client's connection pool connect to a certain node, the node will still query other nodes.
Use the WhiteListRoundRobinPolicy and CL.ONE like linked in other question.
You can also extend the Statement to include a host and a custom load balancing policy to send the request to the host in the wrapper. Extend a policy and override make_query_plan, something like (untested just scratch, consider following pseudo code)
class StatementSingleHostRouting(DCAwareRoundRobinPolicy):
def make_query_plan(self, working_keyspace=None, query=None):
if query.host:
return [query.host]
return DCAwareRoundRobinPolicy.make_query_plan(self, working_keyspace, query)
If that host doesn't own the data it will still query other replicas though.

Need more insight into Hazelcast Client and the ideal scenario to use it

There is already a question on the difference between Hazelcast Instance and Hazelcast client.
And it is mentioned that
HazelcastInstance = HazelcastClient + AnotherFeatures
So is it right to say client just reads and writes to the cluster formed without getting involved in the cluster? i.e. client does not store the data?
This is important to know since we can configure JVM memory as per the usage. The instances forming the cluster will be allocated more than the ones that are just connecting as a client.
It is a little bit more complicated than that. The Hazelcast Lite Member is a full-blown cluster member, without getting partitions assigned. That said, it doesn't store any data but otherwise behaves like a normal member.
Clients on the other side are simple proxies that have to forward everything to one cluster member to get any operation done. You can imagine a Hazelcast client to be something like a JDBC client, that has just enough code to connect to the cluster and redirect requests / retrieve responses.

How Datastax PreparedStatements work

When we create a PreparedStatement object is it cached on the server side? How it is different comparing to PreparedStatement in Oracle driver? If prepared statement is reused, what data is sent to Cassandra server, param values only?
From what I understand, one Session object in java driver holds multiple connections to multiple nodes in cluster. If we reuse the same prepared statement in our application in multiple threads, will make us using only one connection to one Cassandra? I guess preparing statement is done on one connection only... What happens when routing key is updated by each execute call?
What are benefits of using prepared statements?
Thank you
Yes, only the statement ID and parameters need to be sent after preparing the statement.
The driver tracks statement IDs for each server in its connection pool; it's transparent to your application.
The benefit is improved performance from not having to re-compile the statement for each query.

Resources