Cassandra (Datastax v3.5) using Stratio Lucene Index plugin - Windows - cassandra

I'm trying to look at using the Stratios Lucene index plugin (on Windows)installation of Cassandra (Datastax v3.5) but can't get Cassandra to recognize it.
I'm aware that you must use the corresponding version to Cassandra and have tried with 3.0.5 & 3.5 but both with the same results. The service is stopped, the index .jar file is copied to the lib directory & then the service is restarted. Then using CQLSH, I can create the relevant keyspace & table (as described in the Stratio documentation) but when attempting to create the index it fails with the following message:
Query invalid because of configuration issue: message="Unable to find custom indexer class 'com.stratio.cassandra.lucene.Index'"
https://github.com/Stratio/cassandra-lucene-index/tree/branch-3.5
Does anyone have any idea how to get this implemented & working?
Is there a central forum or a point of contact for Stratios Lucene index support?

This resource https://github.com/Stratio/cassandra-lucene-index/issues/118#issuecomment-211796434 suggests that only open source Apache Cassandra is officially supported by this plugin. It might work with DSE, might not. I checked 3.5.0 version works on Linux with Apache Cassandra but does not work on Windows with DSE :( According to Datastax docs, it should support custom secondary indexes. So, it might be the plugin does not run on Windows?

Related

HDInsight and Talend Open Studio for Big Data

I am currently working on a project in which I need to connect Talend open Studio for Big Data (v 6.3.1) to an Azure’s HDInsight (3.5) Hadoop Cluster. So far, I am trying a simple example which consists in creating an Hive Table.
For that, I am using the following diagram:
The hive connection was configured as followed:
… and please find below the specifications of the tHiveCreateTable_1 node:
By running this process:
· The specified container and deployment Blob is created (see image below) - which make me believe that everything is ok with the Windows Storage Configuration
· However the tHiveCreateTable_1 node has an error (see image below)
· I strongly believe that it´s something related with the Hostname and Port;
· I tried to use the host name of the cluster and the hostname of the Hive server that we can find in Ambari (see image below)
· But none of them worked as expected.
Has any one tried something similar to this?
Note: It seems reasonably important to say that the Azure version supported by Talend is 3.4, however, I am using 3.5, it might be it.
Many thanks for your help in advance.
According to the offical docuemnt about the differences between Hadoop components and versions available with HDInsight, HDInsight 3.5 is based on Hortonworks Data Platform(HDP) 2.5, but HDI 3.4 is based on HDP 2.4. However, there is not big version difference for their Hive componets or other componets. So, my suggestion is that you can try to create a HDI 3.4 using the same Azure Storage account for your current HDI 3.5, without more effects for your needs.

Upgrade Pig on HDInsight Emulator

I am currently using the HDInsight Hadoop Emulator, which comes with Pig Version .12. Our problem involves parsing xml files and I'd like to use the XPath command from PiggyBank, but it is only available with Pig version .13.
a. Can I Upgrade Pig in the emulator? How would I go about doing that?
b. Is the version of Pig really critical, or could I just get the latest version of the PiggyBank.jar file and use that?
currently there is no way to update component versions for HDInsight emulator (or at least that's very hard to do).
I have never used PiggyBank, but from the introduction page (https://cwiki.apache.org/confluence/display/PIG/PiggyBank) it seems that it is a collection of UDFs which should work with Pig 0.12. So i guess using the jar directly (and of course registering it in pig) should work.
Also, we are looking for an updated story for HDInsight emulator - so feel free to reach us at hdivstool at microsoft dot com if you have any thoughts, comments, requirements.
Xiaoyong Zhu from HDInsight team

External Authentication for Cassandra in DSE 4.7

We are trying to implement external authentication to Cassandra on DSE 4.7. Followed few of the guides where we have to extend IAuthenticator class but after doing that there is less documentation on how to integrate.
Is it more of plug and play where we extend IAuthenticator class build a jar and place it in lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file accordingly or is it take a source code from Github build entire tree and then use?
If so is Datastax's Cassandra available on Github?
What do we need to do to build external authentication other that LDAP and Kerberos in DSE 4.7?
extend IAuthenticator class build a jar and place it in
lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file
accordingly
^^ yes, this is the right approach.
Datastax's Cassandra available on Github?
Not exactly. You'll see the version of c* that ships with DSE in the release notes, you can check the source in the apache/cassandra github and it will match (up to and excluding the build number). The exact c* build under DSE will have some critical patches from future versions and that exact source code is not avaliable. However, the dot release in apache/cassandra is good enough for all intents and purposes.
I.E. look at https://github.com/apache/cassandra/tree/cassandra-2.1.8 for 4.7.1
As mentioned by #Mikea we need to override ISaslAwareAuthenticator and while using Cassandra in DSE need to be very sure of Cassandra version and then dig into appropriator Github repo.

How to Use Apache Drill with Cassandra

I am trying to query Cassandra using Apache Drill. The only connector I could find is here:
http://www.confusedcoders.com/bigdata/apache-drill/sql-on-cassandra-querying-cassandra-via-apache-drill
However this does not build. It comes up with an artifact not found error. I also had another developer who is more versed in these tools take a stab at it, but he also had no luck.
I tried contacting the developer of the plugin I referenced, but the blog does not work and won't let me post comments. Has anyone got this plugin to work (if so how?) or is there another plugin or method I can use to connect apache drill to Cassandra? If anyone could show me how to connect an execute a simple SQL query that would be much appreciated.
I looked at the latest Cassandra storage plugin patch and the latest apache drill source. The drill code has changed and the patch can no longer be applied.
I then manually took the patch apart (it id mostly diff output). Most of the patch was new classes which I could easily add to the latest drill source tree. Most of the other updates were easy to insert into the current source. There were two specific classes that required some minor code modifications/extensions. I rebuilt the distribution from the modified source and installed the drill servers it on a 3 node cluster. The Cassandra schema failed to initialize properly throwing a null pointer exception one of the new classes. This leads me to believe that the (latest) modifed storage plugin is incompatible with the latest version of Cassandra. Since the author of the original storage plugin is unreachable and no one else is stepping up to support the code, this is a dead horse. Beat it if you must.
I was the author of the patch written a year back. Could not get it merged into Drill then, and later got occupied with other stuffs :(
With so many changes to Drill internals, I am not sure what amount of welding would be needed at this point to get it working. Please use the code just as a reference for writing a Drill storage plugin.
Have added this banner on top of the blog post to save fellow developer's hours.
I don't know if anyone is still interested in this topic but I've been experimenting with this plugin and got it to work with Drill 1.18-SNAPSHOT. Here is a link to my branch with this code: 1. My plan is to submit this as a PR for Drill, but it still needs some work. This code will successfully query Cassandra 3.11.5 (latest stable version).

Is there any way to migrate the built-in usergrid db

I've been running usergrid-stack commit 08f26cc865c96185d11990bf622730beff59971a for a while using the built-in cassandra db.
I need to migrate to another server and thought I'd take the chance to update.
I tried copying the folder standalone/target/tmp to the new installation but doing so gives me the error below.
null; Can't construct a java object for tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create property=hinted_handoff_throttle_delay_in_ms for JavaBean=org.apache.cassandra.config.Config#6b7b9f29; Unable to find property 'hinted_handoff_throttle_delay_in_ms' on class: org.apache.cassandra.config.Config
Invalid yaml; unable to start server. See log for stacktrace.
Is there any way to migrate the db to a newer version?
If not, is there at least a way to migrate the db using the old version?
iirc hinted_handoff_throttle_delay_in_ms is a setting from Cassandra 1.1 which was removed in Cassandra 1.2 so you might need to edit the conf file (cassandra.yaml) and remove this setting.

Resources