Apache Cassandra vs Datastax Cassandra [closed] - cassandra

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Is Datastax Cassandra the only available Cassandra that can be used in a production environment? Is there any free alternatives available? What about the cassandra available on Apache site?

Datastax Community Edition is also free, it contains a basic version of OpsCenter -- http://planetcassandra.org/cassandra/
Here is the difference between the community edition and DSE
http://www.datastax.com/download/dse-vs-dsc

They can both be used in production. DataStax Enterprise comes with a bunch of extra features on top of Apache Cassandra, and also comes with support.

Datastax is a commercial company, who supports C*. The base source code of Cassandra is taken of the Apache Repositories, then some of their own code is merged. Besides this, as already mentioned by others, Datastax version comes with some additional tools for maintaining a Cassandra Cluster.
One of the benefits of Datastax Enterprise is their neatless SOLR Integration, another great Apache Foundation Project.
Cassandra comes with a Query Language called CQL (Cassandra Query Language) which is "similar" to SQL, you should however think of CQL like a cousin of SQL, not a brother.
One of the great features of the Enterprise edition is that you can query a SOLR index through their CQL integration, also a Cassandra Cluster shares it's resources with SOLR, so you don't need a second Cluster for SOLR.
You could... set up Apache or Datastax Cassandra, you would get almost the same thing, but if you need something similar to SQL Like Statement (natively not available in Cassandra), or you do have a very much denormalized database and you need search capabilities, then Datstax Enterprise (DSE) is your only viable choice.
As someone already has mentioned, DSE is free for startups until they reach an annual revenue of 3m USD, or are funded with 30m. This should give everybody the opportunity to leverage the power of NoSQL and use one of the most reliable databases for big data out there.

For the Cassandra product, you can use the Apache open source offering in production, if your organisation is comfortable with open source.
You can also use the Datastax Community version of Cassandra, which is also open source and free to deploy; that gives you a bit more assurance from DataStax who offer commercial support.
Then there is DataStax Enterprise, which is the version that you pay to use, with a support model included. This still uses open source Cassandra, with additional code from DataStax. They have also put this release through their internal test processes, so that they are happy to support it. That generally means the releases will lag that Apache and Community versions, if that matters to you.
The DataStax 'Dev Center' product is a GUI tool that allows you to enter CQL commands against a Cassandra installation - it is free to use against any release. You may find it useful, though the CQLSH command-line should offer much of what you may need (and Cassandra CLI).
The DataStax 'Ops Center' product is available in a free version, which can run against any Cassandra with the associated 'DataStax Agent' used to collect data from each node. The Enterprise version of Ops Center includes additional functionality; that is available if you purchase the fully support DSE (DataStax Enterprise) stack.
Hope that helps. Much more information available at Planet Cassandra and the DataStax web sites.

Besides Apache Cassandra, there's Scylla which is a drop in replacement for Cassandra written in C++. It claims to be 10 times faster than Apache Cassandra. However, Scylla is still in alpha version, and you should stay away from it in a production environment.
Scylla aims to support all cassandra features together with toolings. It also supports JMX monitoring.

Apache Cassandra also have all features as well as community edition of DataStax . So you can put Apache Cassandra on Production enivorment .

Another good feature of DSE is the ability to do backup and recovery of your Cassandra database which I would think is very important if you are planning to use this in a production setup.

Related

Datastax Studio License

I was looking for the visualization client to be used in Mac for Cassandra DB when I found Datastax Studio which can be setup in local to see Cassandra db metadata and tables visually. I downloaded Datastax studio using below link:
https://academy.datastax.com/quick-downloads
I tried to look at the few links but couldn't understand about the licensing.
Can someone please confirm if it is free to use for non production environment within an organization?
Thanks!
The DataStax Studio License Terms can be found here https://www.datastax.com/terms/datastax-studio-license-terms.
I won't try to interpret or summarize, but I will highlight this portion:
...only in conjunction with the use of other DataStax commercial software on a trial basis, or for which you have an active, paid subscription.
DataStax Studio will only work with DataStax DSE clusters; connections to OSS Cassandra are not supported. For more details beyond that, I'd recommend asking through DataStax Support.

Which Apache cassandra version to use for production

We are exploring apache cassandra and are going to use it for Production soon.
We are going to use mostly Datastax community edition of apache cassandra.
But after reading :
http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
https://www.pythian.com/blog/cassandra-version-production/
With this sentence from above blog “If you don’t mind facing serious bugs and contribute to the development pick 3.x”
I am confused about which version to opt for our production deployment ?
Just need to know whether 3.5.0 and 3.0.6 are production ready.
Datastax community : 3.5.0 from http://www.planetcassandra.org/cassandra/
Datastax community : 3.0.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
or
Datastax community : 2.2.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
The version provided by datastax is supposed to be stable and production ready. You have an application to monitor your cluster, which is nice if you don't have any ops that knows about cassandra in the first place, and you can pay to get support.
However, you don't have the latest version of Cassandra, and you can miss interesting features.
As for Cassandra 3.x, as said above, you get more features (for example JSON support) and better performance, but if you find a critical bug and can't fix it, you can only writes a ticket and hope they will take care of it quickly. Yet it is production ready and this could work well for you.
In conclusion, go for the latest version only if you need a special feature, or if you have the developers in your team to back your choice. Go for Datastax if you want something that works with less effort.

YCSB for Cassandra 3.0 Benchmarking

I have a cassandra ubuntu visual cluster and need to benchmark it.
I try to do it with yahoo's ycsb (without use of maven if possible).
I use cassandra 3.0.1 but I cant find a suitbale version of ycsb.
I dont want to change to an oldest version of cassandra (ycsb latest cassandra-binding is for cassandra 2.x)
What should I do?
As suggested here, despite Cassandra 3.x is not officially supported, you can use the cassandra-cql binding.
For instance:
/bin/ycsb load cassandra-cql -threads 4 -P workloads/workloada
I just tested it on Cassandra 3.11.0 and it works for both load and run.
That said, the benchmark software to use depends on your test schedule. If you want to benchmark only Cassandra, then #gsteiner 's solution might be the best. If you want to benchmark different databases using the same tool to avoid variability, then YCSB is the right one.
I would recommend using Cassandra-stress to perform a load/performance test on your Cassandra cluster. It is very customizable, to the point that you can test distributions with different data models as well as specify how hard you want to push your cluster.
Here is a link to the Datastax documentation for it that goes into how to use the tool in depth.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html

Connecting open-source Cassandra to Tableau Desktop

I am trying to use the Business Intelligence (BI) software Tableau Desktop to see into a local Cassandra cluster. The Cassandra cluster is the open-source version and not the proprietary version that one pays to use. The version of Cassandra I am using is 2.2.x.
I can successfully connect Cassandra and Tableau after configuring the 64 bit ODBC driver. However, actually querying the tables in the NoSQL database throws errors. For instance in the 'Data Source' view selecting 'Update Now' results in an error from a SQL statement that starts with SELECT 1... I do not think Cassandra can understand, process, SELECT 1 statements.
Errors are also thrown when trying to build graphs of the data as this also results in failed queries.
In the 'Advanced Options' for the ODBC driver I selected to use CQL as the 'Query Mode' and still there were problems with the queries Tableau was sending to Cassandra.
Does anyone know how to get these two technologies to work together? I found this tutorial but it was made almost a year ago, as of this writing, and does not work from my experience. Please see the link to what I am talking about here: http://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise In this article they say to download the driver from here: https://academy.datastax.com/downloads/download-drivers?dxt=DX I am wondering this specific version of the ODBC driver is the problem.
I also read a previous post on this and it was not helpful as it is also obsolete from my experience. The post I am referring to is at the following URL: Connecting cassandra to Tableau Software The first answer is probably the obsolete one but the second one recommends to use the Simba driver, which is some type of proprietary driver. My current hypotheses is maybe the Simba driver is needed to use Tableau and Cassandra together.
Thank-you for reading this.
DataStax licenses Simba Technologies ODBC driver, but the version on their website may be behind the latest version available from Simba. Please download a free evaluation version of the driver and see if you have the same issue: http://www.simba.com/drivers/cassandra-odbc-jdbc/
'SELECT 1' is not a valid CQL query (http://docs.datastax.com/en/cql/3.1/cql/cql_reference/select_r.html).

For Cassandra kundera.client.lookup.class options

In order to configure kundera for Cassandra, I notice there are 3 possible options for kundera.client.lookup.class as below
com.impetus.client.cassandra.pelops.PelopsClientFactory
com.impetus.kundera.client.cassandra.dsdriver.DSClientFactory
com.impetus.client.cassandra.thrift.ThriftClientFactory
I am not sure of the Pros and Cons of the above 3 and hence not sure which one to use. Please help me decide
I suggest you to use com.impetus.client.cassandra.thrift.ThriftClientFactory. It is the implementation using just Cassandra's thrift api.
PelopsClient is not in active development.
DSClient is built over datastax driver of cassandra.
There is no real advantage of using either DSClient or ThriftClient.
After further research, I found the following
Don't use PelopsClient as its not in active development as mentioned by #karthik , but more importantly because of the issue reported here
Data Stax Driver is better than thrift client as it over comes few limitations of thrift and they use a different binary protocol specific to cassandra which gives a better performance. Refer Datastax java driver support for Cassandra using Kundera

Resources