datastax driver vs spring-data-cassandra - cassandra

Hey I am new to Cassandra and I am friendly with Spring jdbc-template.
Can anyone please explain difference between both of them? Also can you suggest which one is good to use ?
thanks.

spring-data-cassandra uses datastax's java-driver, so the decision to be made is really whether or not you need the functionality of spring-data.
Some features from spring data that may be useful for you (documented here):
spring xml configuration for configuring your Cluster instance (especially useful if you are already using spring).
object mapping component.
The java-driver also has a mapping component as well that is worth exploring.
In my opinion if you are already using spring, it is worth looking into spring-data-cassandra. Otherwise, it would be good to start off with just the datastax java-driver.

Related

Is there a simple Jmeter performance test case for Cassandra

We are creating Jmeter performance benchmarking for our Cassandra installation.
For which we have been referring to the default Cassandra plugin mentioned in the site
This plugin does not take any Cassandra server connection parameter for the "put", no much help is also present to how to use this plugin.
Some can help me with this plugin if any one knows how to configure Cassandra connection
Hence we switched to an article to test Cassandra with Groovy. (Link here)
This site calls to add multiple jar some are bundles and cannot find the exeat JAR
snappy-java-1.0.5
netty-transport-4.0.33.Final
netty-handler-4.0.33.Final
netty-common-4.0.33.Final
netty-codec-4.0.33.Final
netty-buffer-4.0.33.Final
metrics-core-3.1.2
lz4-1.2.0
HdrHistogram-2.1.4
guava-16.0.1
Can some help me with some simpler test perform on Cassandra ?
For correct performance testing of Cassandra it's better to use specialized tools, like NoSQLBench that was developed specifically for that task. Generic tools won't give you the real performance numbers. Please read NoSQLBench documentation on how to correctly test Cassandra to take into account things like compaction, repairs, etc.
Have you tried to read documentation which mentions CassandraProperties configuration element where you can define your connection server parameters:
If you want to have the full control and not only be limited to what other guys implemented you can consider following instructions from Cassandra Load Testing with Groovy article

jCache providers features

So i'm going to use jCache implementation for my J2EE application java 8 and i want to know what is the difference between all the providers and all its features.
Hazecast
ehcache
infinispan
can anyone help me to choose one of them ( in terms of cluster support, easy to use, performance ...) ?
JCache is a specification, so that all implementations behave in the same way regarding the caching features.
However, a key differentiator to evaluate the products is whether you want the cache to be distributed or not. The Open Source version of Hazelcast is distributed, this is not the case for EhCache.
Disclaimer: I work for Hazelcast.

Thrift, CQL3 or what?

Recently I noticed the Cassandra and DataStax are pushing CQL3 more. A new java driver even released, this one does not use Thrift at all.
And if your are not going to use "compact storage" you will not able to use Thrift with your application. Thus, I believe that Thrift is fading out from Cassandra.
My question is, for a new application should I go head and use CQL3? However, I still prefer thrift because I want to know what's going on underneath and on the other hand I do not want to be using something that is fading out and becoming a legacy. What do you recommend?
My company recently went through the same thought process and ended up using CQL3 over thrift.
Although there is a slight lack of transparency with the additional layer of abstraction going on with CQL3, the ease and familiarity of writing SQL style statements makes the code much more readable and intuitive in my opinion. Plus we found the cqlsh interface far more user friendly than cassandra-cli for debugging and general db maintenance (the auto-complete is fab in cqlsh!).
Once you understand the underlying data structure and how CQL3 represents that data, the extra layer of abstraction pales into insignificance, really.
Datastax are encouraging developers to use cql3 for newer applications. From the Thrift to CQL3 Guide:
…we believe that CQL3 is a simpler and overall better API for Cassandra than the thrift API is. Therefore, new projects/applications are encouraged to use CQL3 (though remember that CQL3 is not final yet, and so this statement will only be fully valid with Cassandra 1.2). But the thrift API is not going anywhere.
Thrift won't be getting newer features (unless they are requested a lot) so it's safe to say that CQL3 is the better choice for new apps (of course there are exceptions… if you need low-level you need thrift). My only pain is that datastax's driver does not yet support SSL but that is in the pipeline and will hopefully be a committed feature soon.

JSF 2.0 & Cassandra - How to get started

Here come a few newbie questions on using Cassandra with JSF 2.0. I'm in the start of a web application and want to use Cassandra as a backend. My app should be deployed in different regions and hence the same data should be available/replicated in/to all regions.
I would like to have general information about best practices.
I have the following setup:
Maven2
JSF 2.0 (currently using Managed Beans and JPA)
Glassfish 3.2.1
Which driver would you recommend?
cassandra-jdbc driver implemented with datastax community edition?
Or Hector, which seems to be the most advanced client?
If Hector how would I use Hector properly within a JSF project?
Hector JPA integration (could not find any information on how to use)?
Using Hector directly from Managed Beans without JPA?
How would I use the driver best in a JSF 2.0 web application?
Managed Beans?
Singleton Bean?
POJOs?
(I'm not very familiar with Java EE yet, but in the process of building up knowledge)
How would I structure the classes/beans so that connections can be pooled?
Is there maybe an open source example that is using JSF/Java EE or JSF/POJO to see how such as setup is used in best practice?
What are your opinions on how to start such a project from scratch?
I'm thankful for all hints you can give me. I'm struggling since a month to find enough information to get started on this project.
Which driver would you recommend?
* cassandra-jdbc driver implemented with datastax community edition?
* Or Hector, which seems to be the most advanced client?
I would probably suggest hector as this project status is active. There are many q&a, articles and manuals that can be found online and that help greatly when you starting to learn this new technology. But that does not means cassandra-jdbc driver is lack of materials that can be found online. This is my personal preference development using hector library from version 0.8 to the current latest version.
If Hector how would I use Hector properly within a JSF project?
* Hector JPA integration (could not find any information on how to use)?
* Using Hector directly from Managed Beans without JPA?
How would I use the driver best in a JSF 2.0 web application?
* Managed Beans?
* Singleton Bean?
* POJOs?
(I'm not very familiar with Java EE yet, but in the process of
building up knowledge)
How would I structure the classes/beans so that connections can be
pooled?
Is there maybe an open source example that is using JSF/Java EE or
JSF/POJO to see how such as setup is used in best practice?
You should really read and study into these repository.
cassandra-queue-spring
hector object mapper
I don't have experience in JPA so I can't comment further but it seem that hector has the JPA classes that you will be interested in. Be sure to check out this Hector Object Mapper introduction.
What are your opinions on how to start such a project from scratch?
It's probably difficult to start everything from scratch if you are new JavaEE but this link should provide you assistance when you start your own project.

About Java Cassandra Client, which one is better? How about CQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am trying to develop application using Hive as the Database, and then I also find noSQL solutions as an alternative to it.
Now decided to develop using Cassandra, my next problem is about what client should I use? which one is better, Hector -- a pure java solutions, or Kundera with JPA like development?
I prefer Hector, but I am curious about Kundera. Is there anyone using Kundera? Which is better?
I'm curious about CQL (Cassandra Query Language). Can it integrate with Hector?
Hector is slowly moving towards CQL integration. The first steps have been made, but because of the experience of an unstable API, the developers seem to have postponed a new release. The CQL API is rather new, as it should be nearly equivalent to a SQL syntax. I made some basic steps with CRUD operations to verify that data could be written and read via CQL.
Nevertheless, the CQL JAR is not usable out of the box like a standard JDBC driver as of now, and misses some important feature aspects. Having a look at the more or less difficult to understand thrift API and the not really much simpler hector API, I am convinced that CQL will be established as the state-of-the-art access API for Cassandra in version 0.8.1 and 1.0, where thrift will remain the native, raw access for some time.
The competition between both APIs has nothing to do with the decision of Hector. Hector itself provides additional services like failure and connection handling in the cluster. These are features being addressed by neither thrift nor CQL.
I don't really believe in all other O/R mappers, or even those claiming to provide a full-fledged JPA. I cannot imagine how this should work.
Answering your question about clients - Hector essentially provides access to the Cassandra native API (columns, column families, rows etc) whereas Kundera aims to hide these details and provide object-database mapping.
Kundera therefore probably makes it easier to quickly persist a range of Java objects into Cassandra - but may not provide an efficient mapping, perhaps losing some of the performance that noSQL approaches provide.
Hector expects you to adapt to the Cassandra data model - this will be harder work, but is likely to deliver more performance.
There is now a new client, Astyanax, released by Netflix in January 2012.
"Astyanax is a Java Cassandra client. It borrows many concepts from
Hector but diverges in the connection pool implementation as well as
the client API. One of the main design considerations was to provide a
clean abstraction between the connection pool and Cassandra API so
that each may be customized and improved separately. Astyanax provides
a fluent style API which guides the caller to narrow the query from
key to column as well as providing queries for more complex use cases
that we have encountered. The operational benefits of Astyanax over
Hector include lower latency, reduced latency variance, and better
error handling."
The source code for Astyanax is hosted at Github: https://github.com/Netflix/astyanax
For details about using CQL with Cassandra and Hector, see:
https://github.com/rantav/hector/wiki/Using-CQL
The following mail list thread is a good discussion on where we will be going with CQL as an API:
http://groups.google.com/group/hector-users/browse_thread/thread/540dc9c3908fbb44/f5ee488f2178e2f4
For the sake of completeness I think the Pelops library should be mentioned too. Hector seems to be the most used, but Pelops has a simpler API. Pelops does not support CQL.
Coming from Ruby I find both to be extremely verbose and imperative, though.
Kundera no more relies on Solandra for indexing approach. It enables you now to use secondary indexing support provided by Cassandra and as well as it gives you a way to run jpa queries over OPP (like range queries etc). We are working to enable native CQL support.
Take a look at:
http://mevivs.wordpress.com/2012/02/13/how-to-crud-and-jpa-association-handling-using-kundera/
for more details.
-Vivek
There is no java client in the same level with hector, hector is the best and there is work in progress in hector side to support cql. I saw cql commits for hector in github this month, but doesn't know it's final state. You can ask it to hector users group http://groups.google.com/group/hector-users
Also there is a very simple object mapper in hector
https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29
My Best,
Serdar Irmak
Kundera 2.0.4 released:
Major Changes in this release:
Cross-datastore persistence( Easy to migerate existing mysql app over nosql)
support for relational databases (e.g Mysql etc)
replace solandra with lucene based indexing.
Support added for bi-directinal associations.
Performance improvement fixes.
We tested and 1 million inserts with proper indexing happened in 6 minutes.
Vivek
I am yet to try Hector, but am involved in latest Kundera 2.0.1 release. I suggest you give it a try. It has gone a major change since its inception and you can see a lot of new features getting added and bugs being fixed. Currently it supports JPA 1.0 and Cassandra 0.7.6 but we are planning to add support for Cassandra 0.8 and JPA 2.0 very soon. There is a pretty good example here: https://github.com/impetus-opensource/Kundera/wiki/Getting-started that may help you get started.
Astyanax api produces human-readable code and does include connection pooling.
CQL support over cassandra has been integrated in kundera 2.0.6(yet to be released). It allows to execute CQL as nativequery now.
-Vivek

Resources