How to choose Cassandra version? - cassandra

I'm sql coder and new in Cassandra, which version should I choose, Cassandra 3.X or Cassandra 2.X?
We already have cassandra-2.1.17 but it could not support any aggregate function at all, I want to upgrade to cassandra 3.X but I don't know much about the difference, just want to get some advice, appreciate.

Usually the best it's to go with latest versions.
Cassandra 2.x: You should have in mind that if you need stability, go for 2.2.x/2.1.x, but those will be out of support when Cassandra 4.0 will be out.
The latest releases are 2.2.11 (changes, release notes) and 2.1.19 (changes, release notes), both of them released in 2017.10.05. Go for 2.2.11 if you need something that goes well and you like playing it safe. 2.2.11 is low risk.
Cassandra 3.0: Probably it's better to skip this, because it's pretty buggy, especially on the storage. But you definitely should go with 3.0.15 if you plan to pick 3.0. You can check the changes and the release notes.
Also have in mind that materialized view are still in experimental phase or, better to say, the community might start marking those as experimental.
Probably it's best to skip 3.0 and go directly to 3.11.x.
Cassandra 3.x: Probably the latest version should be more stable. 3.11.1 has been released on 2017.10.10. You can check the changes and the release notes.
Final thoughts: at the end it's you who decides based on your use cases and data or if you know that are features that you need/don't need or if there are bugs that you can cope with.

The latest stable release of Cassandra is 3.11.1. It would be good to upgrade yourself with it. I have tested and used 3.11.0 with satisfaction. You can check the latest release and changelog here - http://cassandra.apache.org/

While updgrating to 3.0.15 or 3.11.1, please check below open jira tickes and see if it does not effect your usecase.
3.0.15 :
CASSANDRA-13900(Major)
Massive GC suspension increase after updating to 3.0.14 from 2.1.18
3.11.1:
CASSANDRA-13929(Major)
BTree$Builder / io.netty.util.Recycler$Stack leaking memory
CASSANDRA-13221(Minor)
Connection reset handling not working in case of protocol errors

Related

Cassandra cluster upgrade from 4.0.1 to 4.0.5, looking for any official documentation about it

I'm surprise to not find a official guidelines around upgrades for the community edition of cassandre in their website.
https://cassandra.apache.org/doc/4.0/index.html
I see that datastax provides some guidelines for their enterprise product but not really for the community versions. maybe I'm not looking at the right sites?
by googling around I see different howTos and different advices regarding upgrading minor versions in a cluster but nothing specific for making the jump between 4.0.1 to 4.0.5(which is the latest rpm available in their official repos)
[context]
about a year ago, I had put together a very simple cassandra cluster with 3 seed nodes and 2 normal nodes. this cluster is running on 4.0.1 since then without any issues and now I'm looking to upgrade a few minor versions to 4.0.5.
Besides replacing from time to time a node the maintenance is pretty simple and tbh I cannot complain about cassandra software itself.
[/context]
My understanding from other stackoverflow questions is that for minor versions(saw lot of questions about 3.11 minor upgrades) the risk is low and sometimes depending on the versions you can get away without not even having to upgrade the sstables but I cannot find if this applies to 4.0.1 to 4.0.5.
I would like to understand how the community is handling this lack of official community guidelines for upgrades in C*, so I'm looking to see what sites or docs the great stackoverflow heroes recommended to take them as reference.
I'm still in the researching phase before the upgrade, I was thinking to run some sort of "online" upgrade by updating node by node, to avoid downtime, i know this will mean having mixed versions in the cluster for a while but I understand the other option is to bring the whole cluster down, perform the upgrade on all nodes and the start it back.

Is there a compatibility matrix for Hadoop components?

I wonder if there is a compatibility matrix for the various Hadoop components of the eco-system ?
Each Hadoop upgrade has big compatibility impact, e.g:
Apache Spark 2.4 does not support Hadoop v3,
Hadoop does not support Java 9 and 10,
and so on...
I know that vendors such as Hortonworks publish components lists with each version of their distribution, but this is not meant for the public in large because this includes patched components.
Does one have to go through all the bug trackers over at Jira for each tool to find out about compatibility problems ?
One of the key things that a company like Cloudera/Hortonworks does is taking all the open source projects that make up Hadoop and making sure that they work well together. Both from a functional perspective as well as security a lot of testing and tweaking is done to ensure that everything together forms a proper release.
Now that you have some insight how much effort goes into the release of just one distribution with comparatively strong focus on recent versions, you might understand that there will not be a general overview of 'how everything works with everything' beyond these distributions.
Full disclosure: I am an employee of Cloudera, but even without this I would still recommend you to work with a distribution where possible

Version-Compatbility for PouchDB-Replication with CouchDB

I have an Angular6-App with a PouchDB 7 where I plan a replication to a CouchDB-Server. The current option is a CouchDB-Server in version 1.6 only.
So the question is, if a replication form PouchDB in version 7 to this CouchDB in version 1.6 can work (for a two-way-replication so that different Angular-Clients can exchange changes over this CouchDB-Server).
I can't find any compatibillity list in the net to this topic...
Some hint's would be apprieciated.
CouchDB and PouchDB use the same replication protocol. There are some optimizations introduced in CouchDB 2.x, but PouchDB 7 will still be able to sync without a problem in CouchDB 1.6.
However, you absolutely should not use CouchDB 1.6!! 1.6 had some very serious security flaws, which can essentially allow anyone to execute arbitrary code on your server. These were fixed in 1.7 and later. So please upgrade to at least 1.7.1 immediately!

Is it already possible to do joins from Mongo 3.2 in Node.js application?

As in the question, I did not find any information whether Mongoose has already added this feature, but has anyone tried using this in Node.js application? Is the performance ok and is it stable enough for production use?
Yes, the $lookup feature is available from version 4.3 onwards, see this ticket https://github.com/Automattic/mongoose/issues/3532. As Mongo 3.2 is now a stable release and as no big changes were required to implement this in Mongoose, the feature should be relatively stable. Regarding performance, this is hard to comment on without looking at a concrete example.

Which Cassandra version is more stable for Production deployment? And which Cassandra driver is better?

In My organisation we are planning to use Cassandra and these days we are running some experimental tests against Custom Configuraiton to check the better and stable verison of Cassandra. And we are using DataStax drivers.
We are running tests, INSERT into and Select * from CQL statements in very tight loop with higher load like 10K qps.
So any one has any experience on which Cassandra version is better and stable and which drivers shall be used?
Thanks in advance.
You cannot go wrong with the latest 2.0 release (2.0.9). You can get that version from either the Apache Cassandra project or DataStax. The Apache Cassandra download page also has links for the latest release candidates (RC5 is the latest) of 2.1, but those are still in development, so consider that before installing them.
As for the driver, there are drivers available for more than a dozen languages. Chances are that you probably know or use one of them. There is no one driver (at least that I am aware of) that significantly out-performs all of the others. So pick the driver for the language that either:
You have the most thorough knowledge of.
Complies with the usage standards of your team.
For instance, you could make an argument for using Java. After all, Cassandra is written in Java and all of the examples on the original DataStax Academy are done with the Java CQL Driver. But that argument loses ground quickly if you have never done Java before. Or if your team is a .Net shop, and there's nobody else who understands Java. InfoWorld's Andrew Oliver put it best when he wrote:
The lesson to be learned here is: Don't solve a simple problem with a
completely unfamiliar technology and apply it to use cases it isn't
especially appropriate for.
Again, you cannot go wrong with using a "DataStax Supported Driver" from their downloads page.
“You should not deploy a Cassandra version X.Y.Z to production where Z <= 5.”
Source:
https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
Hence go with 2.0.x . Currently its 2.0.10

Resources