External Authentication for Cassandra in DSE 4.7 - cassandra

We are trying to implement external authentication to Cassandra on DSE 4.7. Followed few of the guides where we have to extend IAuthenticator class but after doing that there is less documentation on how to integrate.
Is it more of plug and play where we extend IAuthenticator class build a jar and place it in lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file accordingly or is it take a source code from Github build entire tree and then use?
If so is Datastax's Cassandra available on Github?
What do we need to do to build external authentication other that LDAP and Kerberos in DSE 4.7?

extend IAuthenticator class build a jar and place it in
lib(/usr/share/dse/resources/cassandra/lib) and change the yaml file
accordingly
^^ yes, this is the right approach.
Datastax's Cassandra available on Github?
Not exactly. You'll see the version of c* that ships with DSE in the release notes, you can check the source in the apache/cassandra github and it will match (up to and excluding the build number). The exact c* build under DSE will have some critical patches from future versions and that exact source code is not avaliable. However, the dot release in apache/cassandra is good enough for all intents and purposes.
I.E. look at https://github.com/apache/cassandra/tree/cassandra-2.1.8 for 4.7.1

As mentioned by #Mikea we need to override ISaslAwareAuthenticator and while using Cassandra in DSE need to be very sure of Cassandra version and then dig into appropriator Github repo.

Related

How do I build Cassandra from GitHub source?

I found this repo on github: https://github.com/apache/cassandra
And I would like to import it into intellij and build it in order to run some code locally that I want to build on top of this github code. But there are no instructions for building it.
Where are the instructions?
Thank you for taking interest in writing Cassandra code.
The instructions for building Cassandra from source code including IDE integration is documented in the Contributing to Cassandra page on the official Apache Cassandra website. There are instructions for IntelliJ, NetBeans and Eclipse.
It's not as straightforward as we would like because everyone's laptop/desktop is different so I would recommend you join the ASF Slack to get help in real-time from other Cassandra contributors in the #cassandra-dev channel. For details, see the Community section of the Cassandra website. Cheers!
👉 Please support the Apache Cassandra community by hovering over cassandra then click on the Watch tag button. 🙏 Thanks!

Spark do not resolve ivy specified repositories after upgrade form 2.2.1 to 2.3

We have spark configuration that uses spark.jars.ivySettings to customize jars resolution.
Spark jobs run in environment without internet access, so we want to skip maven central calls and use our repositories.
In spark 2.2.1 everything was working fine, but when we upgraded to 2.3, repositories specified in ivy settings are ignored. As the result our jobs are failing due to missing dependencies.
Specifying our repos with new spark.jars.repositories makes it visible for spark, but does not change an order (so it will always first check maven central, which we cannot allow).
Is this some bug introduced in new version? Or I'm doing something wrong here?
Ok, I found where is the problem. So apparently the way of acquiring spark.jars.ivySettings has changed in 2.3. Now system properties are used for that:
sys.props.get("spark.jars.ivySettings")
This change is not followed by documentation update, and for me it seems like a bug.

.Netcore alternative for Microsoft.Azure.Management.HDInsight.Job?

I'm working on converting a library from full .NetFramework to .NetCore
I'm looking for a replacement for Microsoft.Azure.Management.HDInsight.Job, which hasn't been updated in over a year and is not compatible with .NetCore. I was hoping that the functionality would be rolled up into the much-more-recently-updated and netcore-compatible Microsoft.Azure.Management.HDInsight, but that doesn't appear to be the case.
I'm down to use the REST API, but I haven't been able to find the same functionality there. Any guidance would be appreciated.
You could try to install Microsoft.Azure.Management.HDInsight.Job with Package Manager to install some prerelease versions, so that its dependencies would not be conflict with your asp.net core.
I test them, no matter it is preview, it also have the functionality what you want.
Write in Package Manager Console such as:
Install-Package Microsoft.Azure.Management.HDInsight.Job -Version 1.0.7-preview
You could only install the version <= 1.0.7-preview. If not, you may could not install it.
For more detail, you could refer to this article.
I found the REST API I was looking for. It is the WebHCat API, not an Azure API.
MapReduce Job creation: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+MapReduceJar
Pig Job creation:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig
Hive Job creation:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive
Sqoop Job creation: https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-curl and https://sqoop.apache.org/docs/1.99.3/RESTAPI.html
Hopefully they will release 3.0.0 soon
https://github.com/Azure/azure-sdk-for-net/issues/9219

Monitor/Log slow running queries in Apache Cassandra 2.2.X

how to monitor/log slow running queries in Apache Cassandra 2.2.X version without using any external monitoring tools? Is there is any parameter that we can set in YAML to log slow running queries? or any other approach?
Also in CASSANDRA-12403, i see they added parameter "slow_query_log_timeout_in_ms: 500" for this purpose. Can we add this parameter in Cassandra 2.2.X version's Cassandra.YAML file? or do we need to apply this patch for 2.2.X version in order to make it work?
Its a feature in a newer version, you can upgrade or apply the patch and go off of a custom build. In 2.2.x theres no support to do it by itself.
Its a bit of a long shot but you might be able to get https://github.com/smartcat-labs/cassandra-diagnostics with https://github.com/smartcat-labs/cassandra-diagnostics/blob/dev/cassandra-diagnostics-core/COREMODULES.md#slow-query-module to work. It also only supports 2.1 and 3.0 though, I dont see 2.2 there.

How to Use Apache Drill with Cassandra

I am trying to query Cassandra using Apache Drill. The only connector I could find is here:
http://www.confusedcoders.com/bigdata/apache-drill/sql-on-cassandra-querying-cassandra-via-apache-drill
However this does not build. It comes up with an artifact not found error. I also had another developer who is more versed in these tools take a stab at it, but he also had no luck.
I tried contacting the developer of the plugin I referenced, but the blog does not work and won't let me post comments. Has anyone got this plugin to work (if so how?) or is there another plugin or method I can use to connect apache drill to Cassandra? If anyone could show me how to connect an execute a simple SQL query that would be much appreciated.
I looked at the latest Cassandra storage plugin patch and the latest apache drill source. The drill code has changed and the patch can no longer be applied.
I then manually took the patch apart (it id mostly diff output). Most of the patch was new classes which I could easily add to the latest drill source tree. Most of the other updates were easy to insert into the current source. There were two specific classes that required some minor code modifications/extensions. I rebuilt the distribution from the modified source and installed the drill servers it on a 3 node cluster. The Cassandra schema failed to initialize properly throwing a null pointer exception one of the new classes. This leads me to believe that the (latest) modifed storage plugin is incompatible with the latest version of Cassandra. Since the author of the original storage plugin is unreachable and no one else is stepping up to support the code, this is a dead horse. Beat it if you must.
I was the author of the patch written a year back. Could not get it merged into Drill then, and later got occupied with other stuffs :(
With so many changes to Drill internals, I am not sure what amount of welding would be needed at this point to get it working. Please use the code just as a reference for writing a Drill storage plugin.
Have added this banner on top of the blog post to save fellow developer's hours.
I don't know if anyone is still interested in this topic but I've been experimenting with this plugin and got it to work with Drill 1.18-SNAPSHOT. Here is a link to my branch with this code: 1. My plan is to submit this as a PR for Drill, but it still needs some work. This code will successfully query Cassandra 3.11.5 (latest stable version).

Resources