Upgrade issue in Application querying schema_keyspaces and using RoundRobin policy - cassandra

In our java application version 1 which is using cassandra 2.1
At startup we are executing query : "*SELECT * from system.schema_keyspaces;*" to get keyspace info (if this fails application wont start)
However new code we are getting the keypspace information from driver's cluster.metadata instance which is using cassandra 3.11
We are using DC aware RoundRobin load balancing policy of java
datastax driver.
Now consider a scenario with upgrade of 3 nodes : A,B and C, where A is upgraded (new application + Cassandra 3.11), upgrade on B is under process (Cassandra is down here) and C is not upgraded (old application + Cassandra 2.1). and your client application on C node restarts.
I am getting InvalidQueryException if the old query present on java client of C node gets executed on A (as client will send query in round robin way). if it fails there is no handling in old application. How can we resolve this issue ?
com.datastax.driver.core.exceptions.InvalidQueryException: un-configured table schema_keyspaces
One way i figured out that remove A's Ip from contact points of client application + peers table on C Cassandra node . Now Restart the client application. and then Cassandra to restore peers table entry.
Other way is keep restarting the client application on C until client application query actually hit the Cassandra 2.1 and successfully restarts. But that seems ugly to me.

In your application it's better to explicitly set protocol version to match Cassandra 2.1, instead of trying to rely on auto-negotiation features. Driver's documentation explicitly says about this.
According the compatibility matrix you need to explicitly set protocol version to V3, but this also depends on the driver version, so you may need to stuck to version 2.
Cluster cluster = Cluster.builder()
.addContactPoint("xxxx")
.withProtocolVersion(ProtocolVersion.V3)
.build();
After upgrade to 3.11 is done you can switch to protocol version 4.

Related

dotnet client's traffic not similar while using hazelcast

I am using the 3.11.2 version with 4 nodes clyster setup. One of my applications was written in java another one in dotnet. When I check the java client's cluster, all nodes have almost the same network traffic. On the other hand at dotnet client's cluster 1 node using, for example, 200MB traffic other 3 only 3-5MB. Java cluster configured for cache, dotnet cluster using map.
How can I fix this?
PS : I know 3.11.2 is old version but I don't like to upgrade it unless I hit a pug which force me to do it.
Mehmet

Does "spring data cassandra" have client side loadbalancing?

I'm operating project using spring-boot, spring-data-cassandra.
When I setup that project, I set cassandra properties by ip and port.
(referred by https://www.baeldung.com/spring-data-cassandra-tutorial)
When set it up like this, If I had 3 cassandra nodes and 1 cassandra node died, I think project should fail to connect with cassandra at a 33% probability.
But my project was fine even though 1 cassandra node was dead. (just have some error on one's deathbed)
Do It happen to have A function in spring-data-cassandra like client-side-loadbalancing?
If they have that function, Where can I see that code??
I tried to find that code but failed.
Please give me a little clue.
Spring Data Cassandra relies on the functionality of the DataStax Java driver that is responsible for making everything works. This includes:
establishing the initial connection to the cluster. This is where the contact points play their role. After driver is connected to any of points, it reads information about the whole cluster and establishes connections to all nodes (by default)
establishing the control connection that is used to receive notifications about changes in the cluster - nodes going up & down, changes in schema, etc. If node goes down or up, this information is used to modify the list of the active nodes
providing the load balancing of requests based on the replication, and nodes availability - if the node is down, it's excluded from list of candidates, so we don't send queries to node that is known to be down

Enable Cassandra PasswordAuthenticator at up time

I have a Cassandra cluster (Datastax open source) and currently there is no authentication configured (i.e., it is using AllowAllAuthenticator), and I want to use PasswordAuthenticator. The official document says that I should follow these steps:
enable PasswordAuthenticator in cassandra.yaml,
restart the Cassandra node, which will create the system_auth keyspace,
change the system_auth replication factor,
create new user and password
However, this is a big problem to me because the cluster is used in production so we cannot have any downtime. Between step 2 and 4 no user has been configured yet, so even if the client supplies username and password, the request would still be rejected, which is not ideal.
I looked into the Datastax Enterprise doc, and it has a TransitionalAuthenticator class, which would create the system_auth keyspace but without rejecting requests. I wonder if this class can be ported to the open source version? Or if there are other ways around this problem? Thanks
Update
This is the Cassandra version I'm using:
cqlsh 4.1.1 | Cassandra 2.0.9 | CQL spec 3.1.1 | Thrift protocol 19.39.0
You should be able to execute steps 2-4 with just one node and have zero downtime, assuming proper client configuration, replication, and cluster capacity. Then, it's just a rolling restart of the remaining nodes.
Clients should be setup with credentials ahead of time, and they will start using them as nodes as nodes with authorizers come online (this behavior could depend on driver -- try it out first).
You might be able to manually generate the schema and data for steps 3-4 before engaging the CassandraAuthenticator, but that shouldn't be necessary.
What are your concerns about downtime?

Hbase Thrift in CDH 5

I'm using Node.js Thrift API to connect to Hbase. Everything was working great until I upgraded CDH 4.6 to CDH 5. After upgrading I regenerated the Thrift API for Node.js with this script:
thrift --gen js:node /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/include/thrift/hbase2.thrift
After replacing the original Node.js script with the newly generated script, everything stopped working.
You can view the new script and the basic methods in the demo that I'm trying to run on https://github.com/lgrcyanny/Node-HBase-Thrift2
When I run the 'get' method, it returns "Internal error processing get".
When I run the 'put' method, it returns "Invalid method name: 'put'".
It seems like the new Thrift API is completely incompatible? Am I missing anything here?
There are two Thrift IDL files that come with HBase:
hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift2/Hbase.thrift
Both have a get() method, but only one of them has a put() method, which is exactly what your error messages above are telling us.
Cited from the package summary page:
There are currently 2 thrift server implementations in HBase, the
packages:
org.apache.hadoop.hbase.thrift: This may one day be
marked as depreceated.
org.apache.hadoop.hbase.thrift2: i.e. this
package. This is intended to closely match to the HTable interface and
to one day supercede the older thrift (the old thrift mimics an API
HBase no longer has).
Also the install guides have a separate section for that scenario:
CDH 5 HBase Compatibility
CDH 5 HBase is [...] not wire
compatible with CDH 4 [...]. Consequently,
rolling upgrades from CDH 4 to CDH 5 are not possible because existing
CDH 4 HBase clients cannot make requests to CDH 5 servers and CDH 5
HBase clients cannot make requests to CDH 4 servers. Clients of the
Thrift and REST proxy servers, however, retain wire compatibility
between CDH 4 and CDH 5. [...]
The HBase User API (Get, Put, Result, Scanner etc; see Apache HBase
API documentation) has evolved and attempts have been made to make
sure the HBase Clients are source code compatible and thus should
recompile without needing any source code modifications. This cannot
be guaranteed however, since with the conversion to ProtoBufs, some
relatively obscure APIs have been removed. Rudimentary efforts have
also been made to preserve recompile compatibility with advanced APIs
such as Filters and Coprocessors. These advanced APIs are still
evolving and our guarantees for API compatibility are weaker here.

Connecting to Cassandra Cluster instead of specific node

I am trying to learn Cassandra and have setup a 2 node Cassandra cluster. I have written a client in Java using cassandra jdbc driver, which currently connects to a hard coded single node in the cluster. Ideally, I would like my client to connect to the "cluster" rather then a specific node.
So that client code automatically connects to other node if the first node is down.
Is this possible using cassandra jdbc driver? Currently using below code to create connection
DriverManager.getConnection("jdbc:cassandra://localhost:9160/testdb");
Yes. If you're using the Datastax Java driver, you can get all of these benefits and more. From the documentation:
The driver has the following features:
connection pooling
node discovery
automatic failover
load balancing
What is your language? If you're using Java, I suggest for Hector framework.
http://hector-client.github.io/hector/build/html/index.html
I think it's very good for correspond on Cassandra db.

Resources