Connecting to Cassandra Cluster instead of specific node - cassandra

I am trying to learn Cassandra and have setup a 2 node Cassandra cluster. I have written a client in Java using cassandra jdbc driver, which currently connects to a hard coded single node in the cluster. Ideally, I would like my client to connect to the "cluster" rather then a specific node.
So that client code automatically connects to other node if the first node is down.
Is this possible using cassandra jdbc driver? Currently using below code to create connection
DriverManager.getConnection("jdbc:cassandra://localhost:9160/testdb");

Yes. If you're using the Datastax Java driver, you can get all of these benefits and more. From the documentation:
The driver has the following features:
connection pooling
node discovery
automatic failover
load balancing

What is your language? If you're using Java, I suggest for Hector framework.
http://hector-client.github.io/hector/build/html/index.html
I think it's very good for correspond on Cassandra db.

Related

How to use Datastax java driver for knowing a node is down in the Cassandra cluster?

How can we use Datastax Java Driver to know down nodes in Cassandra Cluster? Does metadata of driver is updated continuously or do we have to register any listeners?
The driver consumes gossip info with the cluster. If a node is down, it’ll know it and not route traffic to it. No worries about engineering anything to do it yourself.

How to connect multiple cassandra intances using single odbc driver ( from SAS ETL)

We are facing challenges to connect multiple Cassandra instances using a single ODBC driver. We have a SAS ETL server using that we want to connect multiple Cassandra instances, but we are not able to figure out how to do this?
If you have the ODBC driver installed, you can connect to different Cassandra clusters as long as you configure the appropriate ODBC URL/DSN connection for each cluster.
If for example, you want to configure the driver to use multiple contact points, you can only do it if you are connecting to a DataStax Enterprise cluster since that is an enterprise-only feature in the Simba Spark ODBC driver which connects to the AlwaysOn SQL Service in DSE. Cheers!

Cassandra JMX need to connect all the nodes

I am trying to get to know Cassandra cfstats information from all the machines using JMX. This can be done using OpsCenter, but I do not want to use it. I started building my own utility. For now, my java program connects to JMX and fetching cfstats information such as estimateKeys, No of SSTables ..etc.
My requirement is, This is a java jar file, will run from one Cassandra node and should be able to connect to all the machines and fetch cfstats using their respective JMX per node.
I am planning to use java driver for this, as java driver will be able to connects all the machines in the cluster using system.peers coumnFamily. Once java driver connect to the machines, I will form the service:jmx:rmi using respective hostname and JMX port(7199). Then I will be able to connect to NodeProbe using this information.
My question is, after connecting to the another node using java driver, will I be able to retain state there and after forming service:jmx:rmi url, will this url really connects to the current node JMX and pull cfstats information from the current node. Because JMX host name it will take from the Cassandra-env.sh file. Can some one please help me in this.
Does this idea works or is there another best way to achieve this?
It's possible to use JMX remotely, but that's not the easiest thing to do.
But if you are writing your tool - maybe it's worth to check out a different connection. E.g. you can easily convert JMX calls to HTTP using Jolokia

Spring Cassandra vs. Astyanax performance

I am trying to evaluate the performance of Astyanax and Spring Cassandra. However I did write up a program to measure insertion and read time. It turned out that with large data Astyanax showed up to 600 times faster insertion rate than Spring Cassandra. I believe Spring Cassandra uses datastax driver to communicate with Cassandra though Astyanax uses thrift. Can anyone who have much knowledge about Cassandra client APIs give me more information on their performance analysis? Is anything appearing wrong in my analysis?
Astyanax and the Thrift protocol are deprecated in Cassandra. Netflix, who contributed Astyanax, has ceased all new development in favor of the Datastax Java driver.
SDC* uses the Datastax Java Driver, which uses the latest protocol, and is very fast in the production emvironments I have deployed into.
Without your test, it is impossible to tell you why you are seeing what you are seeing.
Are you testing reads or writes?
Are you using the spring-data-cassandra or spring-cql module?
Are you explicitly setting the ConsistencyLevel in your SDC* tests?
Which methods of the template or repository are you using for your test.
We can perform 10K writes per second PER NODE in a C* cluster using the DS java driver.

Differences betweeen Hector Cassandra and JDBC

I'm currently starting a project that use Cassandra Apache. So I'm interesting in accessing to my database cassandra from Java. For that, I'm using Hector Cassandra. However, I've some doubts about what's the differences between the access via Hector or JDBC Cassandra (specifically this: https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/).
I believe the following (although I not sure if I'm right):
one difference between both could be that are API of different level (I consider that Hector Cassandra is an API of higher-level than JDBC Cassandra)?
in JDBC Cassandra is used CQL for accessing/modifying the database, while Hector Cassandra don't use CQL (only use the methods provided for that).
I'll be thankful if someone can help me and tell me if I'm right/wrong in the previous lines and more differences between both (Hector and JDBC Cassandra).
Thank in advance!
Official Cassandra Java Driver (https://github.com/datastax/java-driver) is probably the best (IMHO, the only) choice for a new project for several reasons:
New features
All other Cassandra clients (Hector, Astyanax, etc) are based on legacy Thrift RPC protocol. RPC "One response per one request" model has severe limitations, for example it doesn't allow processing several requests at the same time in a single connection or streaming large ResultSets.
So, DataStax developed a new protocol that doesn't have RPC limitations. Thrift API won't be getting new features, it's only kept for backward-compatibility. In contrast, Java Driver is actively developed to incorporate the new features of Cassandra 2.0, like conditional updates, batching prepared statements, etc. The overview of new features is here: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
Convenience
In early Cassandra days (0.7) in our company we have used in-house low-level Thrift client. Later on we have used Hector, Pelops and Astyanax in various projects. I can say that the clients based on Java Driver look the most simple and clean to me.
Performance
We have made some performance testing of Cassandra Java Driver vs other clients. In most scenarios the performance is roughly the same. However, there are certain situations when Cassandra Java Driver significantly outperforms other clients due to its asynchronous nature.
Btw, there's a couple of related questions with excellent answers:
Advantages of using cql over thrift
Cassandra Client Java API's
EDIT: When I wrote this, I wasn't aware that Achilles (https://github.com/doanduyhai/Achilles) mentioned in another answer has CQL implementation that works via Java Driver. For the same of completeness I must say that Achilles' DAO on top of CQL might be (or might became one day) viable alternative to plain CQL via Java Driver.
#mol
Why do you restrict to Hector and cassandra-jdbc if you're starting a new project ?
There are many other interesting choices:
Astyanax as Martin mentioned (Thrift & CQL3)
FireBrand (Thrift via Hector)
Achilles I've just developed (CQL3 & Cassandra 2.0 via Java driver core)
Java Driver Core for plain CQL3
Hector is indeed a higher-level API. Internally it will use Cassandra's Thrift API to execute its functions. It will not convert them to equivalent CQL calls. But its API also provides access to CQL. In this case it will pass the CQL (via Thrift) to Cassandra's APIs for CQL.
CQL in Cassandra is a SQL-like language that works via the Cassandra APIs. So it does not provide any additional capability in the use of Cassandra than the APIs but does make it easier at times to use. If you are considering using Hector I would also look at Astyanax which is a newer take on a high-level Java API to Cassandra.
Since you are starting a new project, it is best to start with CQL as Java native driver:
http://www.datastax.com/documentation/developer/java-driver/1.0/webhelp/index.html#common/drivers/introduction/introArchOverview_c.html
Per DataStax, it is 10-15% faster than Thrift APIs, as it uses Binary Protocol.

Resources