Unknown peer xxx, excluding from schema agreement check - azure

Since upgrading to Cassandra Java driver v4.x, we keep seeing the following messages in the client app logs:
[s1] Unknown peer xxx, excluding from schema agreement check
FWIW, xxx seems like a UUID, not an IP.
We are connecting to Azure Cosmos DB using the Cassandra Java driver v4.6.1. The message seems to be emanating from SchemaAgreementChecker, but it's pretty useless because it doesn't suggest any way to fix the supposed problem. After digging into the code, I think the problem is that the following query returns a new host_id each time it's executed.
SELECT host_id, schema_version FROM system.peers;
SchemaAgreementChecker.java#L143
It seems the driver is trying to match up the host_id received from peer gossip with the nodes received from InternalDriverContext. I'm not a Cassandra or Azure admin, so I'm not sure what the implication of this is, but given that this warning wasn't shown before, there's some assumption made in the code that isn't holding up.
Any ideas on what could be done here to get rid of this message?

This is a problem with Azure Cosmos DB which isn't a full implementation of Apache Cassandra but provides an CQL-like API.
The SchemaAgreementChecker.java class was added in Java driver 4.0 (JAVA-1638) and it seems like Azure Cosmos DB isn't fully compliant so try using Java driver 3.x and it should work. Cheers!

Related

Dealing with Azure Cosmos DB cross-partition queries in REST API

I'm talking to Cosmos DB via the (SQL) REST API, so existing questions that refer to various SDKs are of limited use.
When I run a simple query on a partitioned container, like
select value count(1) from foo
I run into a HTTP 400 error:
The provided cross partition query can not be directly served by the gateway. This is a first chance (internal) exception that all newer clients will know how to handle gracefully. This exception is traced, but unless you see it bubble up as an exception (which only
happens on older SDK clients), then you can safely ignore this message.
How can I get rid of this error? Is it a matter of running separate queries by partition key? If so, would I have to keep track of what the existing key values are?

Persistent storage in JanusGraph using Cassandra

I'm playing with JanusGraph and Cassandra backend but I have some doubts.
I have a Cassandra server running on my machine (using Docker) and in my API I have this code:
GraphTraversalSource g = JanusGraphFactory.build()
.set("storage.backend", "cql")
.set("storage.hostname", "localhost")
.open()
.traversal();
Then, through my API, I'm saving and fetching data using Gremlin. It works fine, and I see data saved in Cassandra database.
The problem comes when I restart my API and try to fetch data. Data is still stored in Cassandra but JanusGraph query returns empty. Why?
Do I need to load backend storage data into memory or something like that? I'm trying to understand how it works.
EDIT
This is how I add an item:
Vertex vertex = g.addV("User")
.property("username", username)
.property("email", email)
.next();
And to fetch all:
List<Vertex> all = g.V().toList()
Commit your Transactions
You are using JanusGraph right now embedded as a library in your application which gives you access to the full API of JanusGraph. This means that you have to manage transactions on your own which also includes the necessity to commit your transactions in order to persist your modifications to the graph.
You can simply do this by calling:
g.tx().commit();
after you have iterated your traversal with the modifications (the addV() traversal in your case).
Without the commit, the changes are only available locally in your transaction. When you restart your Docker container(s), all data will be lost as you haven't committed it.
The Recommended Approach: Connecting via Remote
If you don't have a good reason to embed JanusGraph as a library in your JVM application, then it's recommended to deploy it independently as JanusGraph Server to which you can send your traversals for execution.
This has the benefit that you can scale JanusGraph independently of your application and also that you can use it from non-JVM languages.
JanusGraph Server then also manages transactions for you transparently by executing each traversal in its own transaction. If the traversal succeeds, then the results are committed and they are also rolled back automatically if an exception occurs.
The JanusGraph docs contain a section about how to connect to JanusGraph Server from Java but the important part is this code to create a graph traversal source g connected to your JanusGraph Server(s):
Graph graph = EmptyGraph.instance();
GraphTraversalSource g = graph.traversal().withRemote("conf/remote-graph.properties");
You can start JanusGraph Server of course also as a Docker container:
docker run --rm janusgraph/janusgraph:latest
More information about the JanusGraph Docker image and how it can be configured to connect to your Cassandra backend can be found here.
The part below is not directly relevant for this question any more given the comments to my first version of the answer. I am still leaving it here in case that others have a similar problem where this could actually be the cause.
Persistent Storage with Docker Containers
JanusGraph stores the data in your storage backend which is Cassandra in your case. That means that you have to ensure that Cassandra persists the data. If you start Cassandra in a Docker container, then you have to mount a volume where Cassandra stores the data to persist it beyond restarts of the container.
Otherwise, the data will be lost once you stop the Cassandra container.
To do this, you can start the Cassandra container for example like this:
docker run -v /my/own/datadir:/var/lib/cassandra -d cassandra
where /my/own/datadir is the directory of your host system where you want the Cassandra data to be stored.
This is explained in the docs of the official Cassandra Docker image under Caveats > Where to Store Data.

Error java.net.UnknownHostException while connecting Cassandra cluster

I am doing a PoC to connect Cassandra from my java8 application code.
I am using apache Cassandra with java8
To start with I looked and started with
https://github.com/lankydan/datastax-java-driver
Trying to connect my Cassandra cluster
when i download and try to connect the same to my C* cluster I am getting Caused by: java.net.UnknownHostException: 10.24.78.22,10.24.78.108,10.24.79.173
Updated **CassandraConfig**
.addContactPoints(host)
I updated **application.properties** file
cassandra.host=10.24.78.22,10.24.78.108,10.24.79.173
cassandra.cluster.name=My_Cluster
cassandra.port=9042
cassandra.keyspace=rrr_xxx
So what need to be fixed, and how to fix this issue?
the .addContactPoints function accepts an array of the strings, inet addresses, hosts, etc., while you're passing a one string with multiple addresses inside. You need somehow convert this string into array, or pass only one address.
if you already modifying the code, then it should be simply changed to
.addContactPoints(host.split(","))

Not able to query to two different cassandra cluster in same app

I created cassandra session for two separate cluster using datastax driver in one java application. Session got created successfully however when I query, query on first cluster (pick any one) get executed successfully however query on second cluster always fails with below error. Please help me resolving this issue.
com.datastax.driver.core.exceptions.DriverInternalError: Tried to execute unknown prepared query 0x5f318143588bfa8c5deb2245224cf2da
Note: I have requirement to connect to two separate cluster in same app. Please don't ask why.
From the stack trace, it is likely that you are trying to execute on session 1 a BoundStatement that belongs to session 2. PreparedStatement and BoundStatement instances can only be used with the session that created them.
In your situation, you will need to prepare each statement you plan to use in your application on both sessions.

Validating Hector connection for Cassandra

Supposing I've connected to a cluster with
HFactory.getOrCreateCluster(cluster, address)
Is there a way to check later on whether or not I'm still connected? There doesn't seem to be an obvious way of doing that from looking through their javadocs.
One option might be to just fire a simple query to the cluster within a try-catch statement and see whether it returns properly, or it throws an exception.

Resources