Cassandra and hector upgrade to 1.0 and unit testing - cassandra

We have been using Cassandra 0.7 and since the stable version of cassandra 1.0.0 is out, we planned to upgrade to it. Its low risk since we are not in production yet. We were using hector 0.7-29 which had testutils package and a EmbeddedServerHelper class that we used to start a embedded server in all our unit tests.
However the upgraded version of hector 1.0-1 (which is for cassandra 1.0.x) has removed this package (me.prettyprint.cassandra.testutils) from its core distribution.
I would like to know the plan moving forward for unit testing using the new hector 1.0-1 api client. Is there a way to start cassandra embedded server any more?
Thanks for your help.

There is a new 'test' module which holds EmbeddedSchemaLoader and EmbeddedServerHelper. We took them out of core so they could be used outside of Hector (as the module now has no direct dependency on hector).
https://github.com/rantav/hector/tree/master/test
Let us know how everything works out.

Related

Upgrade issue in Application querying schema_keyspaces and using RoundRobin policy

In our java application version 1 which is using cassandra 2.1
At startup we are executing query : "*SELECT * from system.schema_keyspaces;*" to get keyspace info (if this fails application wont start)
However new code we are getting the keypspace information from driver's cluster.metadata instance which is using cassandra 3.11
We are using DC aware RoundRobin load balancing policy of java
datastax driver.
Now consider a scenario with upgrade of 3 nodes : A,B and C, where A is upgraded (new application + Cassandra 3.11), upgrade on B is under process (Cassandra is down here) and C is not upgraded (old application + Cassandra 2.1). and your client application on C node restarts.
I am getting InvalidQueryException if the old query present on java client of C node gets executed on A (as client will send query in round robin way). if it fails there is no handling in old application. How can we resolve this issue ?
com.datastax.driver.core.exceptions.InvalidQueryException: un-configured table schema_keyspaces
One way i figured out that remove A's Ip from contact points of client application + peers table on C Cassandra node . Now Restart the client application. and then Cassandra to restore peers table entry.
Other way is keep restarting the client application on C until client application query actually hit the Cassandra 2.1 and successfully restarts. But that seems ugly to me.
In your application it's better to explicitly set protocol version to match Cassandra 2.1, instead of trying to rely on auto-negotiation features. Driver's documentation explicitly says about this.
According the compatibility matrix you need to explicitly set protocol version to V3, but this also depends on the driver version, so you may need to stuck to version 2.
Cluster cluster = Cluster.builder()
.addContactPoint("xxxx")
.withProtocolVersion(ProtocolVersion.V3)
.build();
After upgrade to 3.11 is done you can switch to protocol version 4.

Upgrading from gridgain to Apache Ignite

We're currently running gridgain 6.2.1. Is there an existing upgrade guide in order to transition to apache ignite?
There is no such guide and it highly depends on what parts of GridGain you're using. All functionality that existed in 6.x was migrated to Ignite with a bit different API. So I suggest to update the version and start fixing compilation step by step.

Which Apache cassandra version to use for production

We are exploring apache cassandra and are going to use it for Production soon.
We are going to use mostly Datastax community edition of apache cassandra.
But after reading :
http://www.planetcassandra.org/blog/cassandra-2-2-3-0-and-beyond/
https://www.pythian.com/blog/cassandra-version-production/
With this sentence from above blog “If you don’t mind facing serious bugs and contribute to the development pick 3.x”
I am confused about which version to opt for our production deployment ?
Just need to know whether 3.5.0 and 3.0.6 are production ready.
Datastax community : 3.5.0 from http://www.planetcassandra.org/cassandra/
Datastax community : 3.0.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
or
Datastax community : 2.2.6 from
http://www.planetcassandra.org/archived-versions-of-datastaxs-distribution-of-apache-cassandra/
The version provided by datastax is supposed to be stable and production ready. You have an application to monitor your cluster, which is nice if you don't have any ops that knows about cassandra in the first place, and you can pay to get support.
However, you don't have the latest version of Cassandra, and you can miss interesting features.
As for Cassandra 3.x, as said above, you get more features (for example JSON support) and better performance, but if you find a critical bug and can't fix it, you can only writes a ticket and hope they will take care of it quickly. Yet it is production ready and this could work well for you.
In conclusion, go for the latest version only if you need a special feature, or if you have the developers in your team to back your choice. Go for Datastax if you want something that works with less effort.

YCSB for Cassandra 3.0 Benchmarking

I have a cassandra ubuntu visual cluster and need to benchmark it.
I try to do it with yahoo's ycsb (without use of maven if possible).
I use cassandra 3.0.1 but I cant find a suitbale version of ycsb.
I dont want to change to an oldest version of cassandra (ycsb latest cassandra-binding is for cassandra 2.x)
What should I do?
As suggested here, despite Cassandra 3.x is not officially supported, you can use the cassandra-cql binding.
For instance:
/bin/ycsb load cassandra-cql -threads 4 -P workloads/workloada
I just tested it on Cassandra 3.11.0 and it works for both load and run.
That said, the benchmark software to use depends on your test schedule. If you want to benchmark only Cassandra, then #gsteiner 's solution might be the best. If you want to benchmark different databases using the same tool to avoid variability, then YCSB is the right one.
I would recommend using Cassandra-stress to perform a load/performance test on your Cassandra cluster. It is very customizable, to the point that you can test distributions with different data models as well as specify how hard you want to push your cluster.
Here is a link to the Datastax documentation for it that goes into how to use the tool in depth.
https://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsCStress_t.html

Hbase Thrift in CDH 5

I'm using Node.js Thrift API to connect to Hbase. Everything was working great until I upgraded CDH 4.6 to CDH 5. After upgrading I regenerated the Thrift API for Node.js with this script:
thrift --gen js:node /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/hbase/include/thrift/hbase2.thrift
After replacing the original Node.js script with the newly generated script, everything stopped working.
You can view the new script and the basic methods in the demo that I'm trying to run on https://github.com/lgrcyanny/Node-HBase-Thrift2
When I run the 'get' method, it returns "Internal error processing get".
When I run the 'put' method, it returns "Invalid method name: 'put'".
It seems like the new Thrift API is completely incompatible? Am I missing anything here?
There are two Thrift IDL files that come with HBase:
hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift
hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift2/Hbase.thrift
Both have a get() method, but only one of them has a put() method, which is exactly what your error messages above are telling us.
Cited from the package summary page:
There are currently 2 thrift server implementations in HBase, the
packages:
org.apache.hadoop.hbase.thrift: This may one day be
marked as depreceated.
org.apache.hadoop.hbase.thrift2: i.e. this
package. This is intended to closely match to the HTable interface and
to one day supercede the older thrift (the old thrift mimics an API
HBase no longer has).
Also the install guides have a separate section for that scenario:
CDH 5 HBase Compatibility
CDH 5 HBase is [...] not wire
compatible with CDH 4 [...]. Consequently,
rolling upgrades from CDH 4 to CDH 5 are not possible because existing
CDH 4 HBase clients cannot make requests to CDH 5 servers and CDH 5
HBase clients cannot make requests to CDH 4 servers. Clients of the
Thrift and REST proxy servers, however, retain wire compatibility
between CDH 4 and CDH 5. [...]
The HBase User API (Get, Put, Result, Scanner etc; see Apache HBase
API documentation) has evolved and attempts have been made to make
sure the HBase Clients are source code compatible and thus should
recompile without needing any source code modifications. This cannot
be guaranteed however, since with the conversion to ProtoBufs, some
relatively obscure APIs have been removed. Rudimentary efforts have
also been made to preserve recompile compatibility with advanced APIs
such as Filters and Coprocessors. These advanced APIs are still
evolving and our guarantees for API compatibility are weaker here.

Resources