For Cassandra kundera.client.lookup.class options - cassandra

In order to configure kundera for Cassandra, I notice there are 3 possible options for kundera.client.lookup.class as below
com.impetus.client.cassandra.pelops.PelopsClientFactory
com.impetus.kundera.client.cassandra.dsdriver.DSClientFactory
com.impetus.client.cassandra.thrift.ThriftClientFactory
I am not sure of the Pros and Cons of the above 3 and hence not sure which one to use. Please help me decide

I suggest you to use com.impetus.client.cassandra.thrift.ThriftClientFactory. It is the implementation using just Cassandra's thrift api.
PelopsClient is not in active development.
DSClient is built over datastax driver of cassandra.
There is no real advantage of using either DSClient or ThriftClient.

After further research, I found the following
Don't use PelopsClient as its not in active development as mentioned by #karthik , but more importantly because of the issue reported here
Data Stax Driver is better than thrift client as it over comes few limitations of thrift and they use a different binary protocol specific to cassandra which gives a better performance. Refer Datastax java driver support for Cassandra using Kundera

Related

Spring Cassandra vs. Astyanax performance

I am trying to evaluate the performance of Astyanax and Spring Cassandra. However I did write up a program to measure insertion and read time. It turned out that with large data Astyanax showed up to 600 times faster insertion rate than Spring Cassandra. I believe Spring Cassandra uses datastax driver to communicate with Cassandra though Astyanax uses thrift. Can anyone who have much knowledge about Cassandra client APIs give me more information on their performance analysis? Is anything appearing wrong in my analysis?
Astyanax and the Thrift protocol are deprecated in Cassandra. Netflix, who contributed Astyanax, has ceased all new development in favor of the Datastax Java driver.
SDC* uses the Datastax Java Driver, which uses the latest protocol, and is very fast in the production emvironments I have deployed into.
Without your test, it is impossible to tell you why you are seeing what you are seeing.
Are you testing reads or writes?
Are you using the spring-data-cassandra or spring-cql module?
Are you explicitly setting the ConsistencyLevel in your SDC* tests?
Which methods of the template or repository are you using for your test.
We can perform 10K writes per second PER NODE in a C* cluster using the DS java driver.

Java API for handling collections in Cassandra CQL

I am looking for a java API which can handle collections in Cassandra. Which has methods to read/update/insert/delete collections like list/set/map in a column value. I am using Hector client now, I did not find any methods which could perform the above requirement. The API should be able to handle mixed column types (like one column value can be utf8 and other can be collection). Any example or tutorial will be appreciated as well.
C* collections are part of the CQL spec v.3. The only Java driver, that I'm aware of, supporting this spec completely is the open source DataStax Java driver. The driver offers 2 ways of working with CQL statements: one based on Statements/PreparedStatements/etc. and one using a fluent API.
If you are using Cassandra 1.2.x then look for the version 1.x of the driver. In case you are on Cassandra 2.0.x look for the version 2.0 of the driver (this is currently RC2, soon to go final).

Does CQL3 client support native protocol?

Is it possible to use the CQL3 java client (the one using execute_prepared_cql3_query etc)
with the native protocol in 2.0?
Or is the Datastax java client the only one that supports the native protocol?
Is there a performance benefit in using the native protocol especially when inserting 1MB blobs?
I have existing applications that are using the CQL3 client, hence would prefer not to port unless the performance benefit is large.
The DataStax Java Driver for Apache Cassandra (https://github.com/datastax/java-driver) is the only Java Driver to support the CQL Native Protocol so far.
To clarify details about this new protocol: the CQL Native Protocol v1 has been introduced in Cassandra 1.2 and then enhanced as CQL Native Protocol v2 in Cassandra 2.0. While the Thrift interface will remain supported for a while in upcoming versions of Cassandra, you can expect some features to be only available with the CQL Native Protocol in the future, which is already the case with Automatic Paging in Cassandra 2.0 (see http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0).
On the performance side, there's still no clear benchmark to compare efficiency of Thrift vs. CQL Native Protocol publicly available. That's something we wish to see soon. Keep in mind though that there's no one size fits all answer to this performance question as it will heavily depends on the use case and workload being considered. That's why I would definitely advice you to run your own performance tests and see how you can improve the efficiency of your application. I would just note that for the 1MB blob case that you've mentioned, I don't expect much difference as the payload will be much bigger than the protocol overhead here.
As a consequence I wouldn't say that it's urgent for you to upgrade to a driver supporting the CQL Native Protocol, but that's something that you should start to experiment as most of the investments will happen there in the years to come.

Differences betweeen Hector Cassandra and JDBC

I'm currently starting a project that use Cassandra Apache. So I'm interesting in accessing to my database cassandra from Java. For that, I'm using Hector Cassandra. However, I've some doubts about what's the differences between the access via Hector or JDBC Cassandra (specifically this: https://code.google.com/a/apache-extras.org/p/cassandra-jdbc/).
I believe the following (although I not sure if I'm right):
one difference between both could be that are API of different level (I consider that Hector Cassandra is an API of higher-level than JDBC Cassandra)?
in JDBC Cassandra is used CQL for accessing/modifying the database, while Hector Cassandra don't use CQL (only use the methods provided for that).
I'll be thankful if someone can help me and tell me if I'm right/wrong in the previous lines and more differences between both (Hector and JDBC Cassandra).
Thank in advance!
Official Cassandra Java Driver (https://github.com/datastax/java-driver) is probably the best (IMHO, the only) choice for a new project for several reasons:
New features
All other Cassandra clients (Hector, Astyanax, etc) are based on legacy Thrift RPC protocol. RPC "One response per one request" model has severe limitations, for example it doesn't allow processing several requests at the same time in a single connection or streaming large ResultSets.
So, DataStax developed a new protocol that doesn't have RPC limitations. Thrift API won't be getting new features, it's only kept for backward-compatibility. In contrast, Java Driver is actively developed to incorporate the new features of Cassandra 2.0, like conditional updates, batching prepared statements, etc. The overview of new features is here: http://www.datastax.com/dev/blog/cql-in-cassandra-2-0
Convenience
In early Cassandra days (0.7) in our company we have used in-house low-level Thrift client. Later on we have used Hector, Pelops and Astyanax in various projects. I can say that the clients based on Java Driver look the most simple and clean to me.
Performance
We have made some performance testing of Cassandra Java Driver vs other clients. In most scenarios the performance is roughly the same. However, there are certain situations when Cassandra Java Driver significantly outperforms other clients due to its asynchronous nature.
Btw, there's a couple of related questions with excellent answers:
Advantages of using cql over thrift
Cassandra Client Java API's
EDIT: When I wrote this, I wasn't aware that Achilles (https://github.com/doanduyhai/Achilles) mentioned in another answer has CQL implementation that works via Java Driver. For the same of completeness I must say that Achilles' DAO on top of CQL might be (or might became one day) viable alternative to plain CQL via Java Driver.
#mol
Why do you restrict to Hector and cassandra-jdbc if you're starting a new project ?
There are many other interesting choices:
Astyanax as Martin mentioned (Thrift & CQL3)
FireBrand (Thrift via Hector)
Achilles I've just developed (CQL3 & Cassandra 2.0 via Java driver core)
Java Driver Core for plain CQL3
Hector is indeed a higher-level API. Internally it will use Cassandra's Thrift API to execute its functions. It will not convert them to equivalent CQL calls. But its API also provides access to CQL. In this case it will pass the CQL (via Thrift) to Cassandra's APIs for CQL.
CQL in Cassandra is a SQL-like language that works via the Cassandra APIs. So it does not provide any additional capability in the use of Cassandra than the APIs but does make it easier at times to use. If you are considering using Hector I would also look at Astyanax which is a newer take on a high-level Java API to Cassandra.
Since you are starting a new project, it is best to start with CQL as Java native driver:
http://www.datastax.com/documentation/developer/java-driver/1.0/webhelp/index.html#common/drivers/introduction/introArchOverview_c.html
Per DataStax, it is 10-15% faster than Thrift APIs, as it uses Binary Protocol.

Creating Secondary Indexes in Cassandra using Thrift and php

I am after any examples of how to create secondary indexes on an new or existing columns in a cassandra db using the Thrift API. The documentation surrounding Thrift is very sparse. Can anyone help a brother out?
A second question that I was wondering is: are there any negatives with using phpcassa as an interface to cassandra. My understanding is that it sits on top of Thrift so are there any performance drawbacks to this scenario?
I'm using Cassandra 0.8, Thrift 2.0, and php 5.2.9.
If you're using phpcassa, you can use SystemManager.create_index().
If you're using the PHP CQL driver it will look like this: http://www.datastax.com/docs/0.8/references/cql#create-index
The performance overhead of phpcassa or the CQL driver is quite small, and is only worth worrying about in the most extreme of circumstances. Generally, the network latency and DB latency for your queries are much larger.

Resources