hadoop aws versions compatibility - apache-spark

Is there any reference as to what sets of versions are compatible between aws java sdk, hadoop, hadoop-aws bundle, hive, spark?
For example, I know Spark is not compatible with hive versions above Hive 2.1.1

You cannot drop in a later version of the AWS SDK from what which hadoop-aws was built with and expect the s3a connector to work. Ever. That is now written down quite clearly in the S3A troubleshooting docs
Whatever problem you have, changing the AWS SDK version will not fix things, only change the stack traces you see.
This may seem frustrating, given the rate at which the AWS team push out a new SDK, but you have to understand that (a) the API often changes incompatibly between versions (as you have seen), and (b) every release introduces/moves bugs which end up causing problems.
Here is the 3.x timeline of things which broke on updates of the AWS SDK.
Move 1.11.86 and some tests hang under load.
Fix: move to 1.11.134 leading to logs are full of AWS telling us off for deliberatly calling abort() on a read.
Fix: move to 1.11.199 leading to logs full of stack traces.
Fix: move to 1.11.271 and shaded JAR pulls in netty unshaded.
Every upgrade of the AWS SDK JAR causes a problem, somewhere. Sometimes an edit to the code and recompile, most commonly: logs filling up with false-alarm messages, dependency problems, threading quirks, etc. Things which can take time to surface.
what you see when you get a hadoop release is not just an aws-sdk JAR which it was compiled against, you get a hadoop-aws JAR which contains the workarounds and fixes for whatever problems that release has introduced and which were identified in the minimum of 4 weeks of testing before the hadoop release ships.
Which is why, no, you shouldn't be changing JARs unless you plan to do a complete end-to-end retest of the s3a client code, including load tests. You are encouraged to do that, the hadoop project always welcomes more testing of our pre-release code, with the Hadoop 3.1 binaries ready to play with. But trying to do it yourself by changing JARs? Sadly, an isolated exercise in pain.

In Hadoop documentation, it is stated that by adding hadoop-aws JAR to the build dependencies; it will pull in a compatible aws-sdk JAR.
So, I created a dummy Maven project with these dependencies to download the compatible versions
<properties>
<!-- Your exact Hadoop version here-->
<hadoop.version>3.3.1</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
Then, I checked my dependencies versions, used it in my project and it worked.

Related

How to exclude package from the jar which is native?

I'm using avro-tools 1.9.2 in my project and due to some reason can't even update it. I see that avro-tools 1.9.2 using the old log4 1.x API natively (its not a transitive dependency instead its included natively in the jar itself), Is there any way to exclude package when using the jar file at runtime? I know its very unfair/weird questions. But I really need get going.
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro-tools</artifactId>
<version>1.9.2</version>
</dependency>

Is WildFly affected by the log4j 2 vulnerability CVE-2021-44228?

We are using wildfly 10 and 16 in production and a zero-day exploit exists CVE-2021-44228 for log4j for some versions.
How can I be sure that none of the code and libraries use a log4j lib that has that issue?
I do not use any log4j property file nor do I add a dependency by myself.
Any help would be greatly appreciated!
The affected log4j versions are:
Versions Affected: all log4j-core versions >=2.0-beta9 and <=2.14.1
WildFly uses log4j shaded via its log4j-jboss-logmanager module. Even the latest 1.2.2.Final version depends on log4j 1.2.17.
This means WildFly <22 is definitely not affected.
There is a log4j2-jboss-logmanager as well - but only WildFly 22+ has it. And as this doc explains:
This will be an implementation of the log4j2 API only. The core log manager for log4j2 will not be supported.
Usage of any org.apache.logging.log4j:log4j-core API’s or implementations will not be supported. In other words the log4j2 log manager implementation, including configuration files, will not be supported.
You can see that the current latest 1.0.0.Final release does not depend on log4j-core at all, only log4j-api.
So WildFly versions >=22 are not affected as well.
The official tweet confirms this.
But what about WFCORE-5743 raising the log4j-core version? Look in the pom:
<!-- This is a test only dependency -->
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>${version.org.apache.logging.log4j}</version>
<scope>test</scope>
</dependency>
It's not bundled with WildFly, only used in WildFly's build for tests.
Fixed in WildFly Core 18.0.0, to be included in WildFly 26.0.0.Final:
https://issues.redhat.com/browse/WFCORE-5743
https://issues.redhat.com/browse/WFLY-15807
If you need to use WildFly 10 or 16 in production, you should use JBoss EAP instead:
https://access.redhat.com/articles/112673#EAP_7

Some issues with using DeepLearning4J dlls

I am working with DeepLearning4J working with the 1.0.0-beta7 release. I am getting two errors at run time.
jnind4jcpu.dll unsupported jni version 0xffffffff
no nd4jcpu in java.library.path
I setup a path to the to a folder where I have a few other dlls for this effort. I am using java jvm 1.8.
So what version of the jvm should I use for question #1 and where in the dn4j maven project can I find the second one? I tried the uber jar for nd4j and still the same errors.
Thanks for any help!
Your issue doesn't have anything to do with the java version. Make sure you're not mixing versions of dl4j.
You don't really need to dig in to the internals or deal with any of the manual workarounds that you normally see in the jni based libraries.
All you need to do is include nd4j-native-platform in your classpath:
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>1.0.0-beta7</version>
</dependency>
Nd4j/dl4j is based on javacpp and takes care of all of that for you.
To give you even more targeted advice, I would have to know more about your environment (ideally reproducible on github)

Cassandra difference between com.datastax.oss -> java-driver-core and com.datastax.cassandra -> cassandra-driver-core

I wrote a program with the following package
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.6.0</version>
</dependency>
But when I change it to 4.0.0 it doesn't have some critical classes.
I search and read in Datastax that I must use the following package:
<dependency>
<groupId>com.datastax.oss</groupId>
<artifactId>java-driver-core</artifactId>
<version>4.0.0</version>
</dependency>
So, it totally have different classes, So what is the right package from Datastax to use in java for connecting and working with Cassandra? Which one is better to use and what is the main difference?
This really depends on your requirements:
if you're developing completely new application, then you need to use 4.x versions of the driver, as 3.x won't get new functionality, only critical bug fixes. Latest version right now is 4.9.0 - check documentation for exact Maven definition - it should match your second snippet. Please take into account that this driver is quite different from the 3.x versions, so copy-pasting old examples won't work, but you can follow the upgrade guide to translate them to the new version.
if you have existing code, then you may stick to the latest 3.x version (3.10 right now) because porting to 4.x versions could lead to significant rewrite of application caused by architectural changes in new version. Consult upgrade guide for details of porting. Also, you can check following repository for examples of code ported to 4.x.

gwt-maven-plugin appends "-linux" when getting gwt-dev artifact

I've been trying to get the gwt-maven-plugin to work for me. Hopefully someone can help me.
I'm using gwt-maven-plugin 1.2 and trying to get it to work with gwt 2.1.0.M3. We have a nexus repo at work and I've put the latest gwt jars there. The plugin fails when trying to download the gwt-dev jar.
The gwt-dev jar is located at 2.1.0.M3/gwt-dev-2.1.0.M3.jar
The plugin tries to download 2.1.0.M3/gwt-dev-2.1.0.M3-linux.jar.
I don't have a dependency anywhere on my pom for gwt-dev, the plugin takes care of that. How can I stop it from appending "-linux" to it? I'm aware that I could just change the name of the jar in my repo or set it up manually on my local machine, but I want to figure out how to get this working on nexus because we have several developers working on this at the same time.
Thanks!
You should upgrade your gwt-maven-plugin to version 1.2, which has some support for GWT-2.0.
As of GWT-2.0, the gwt-dev jars are no longer distributed separately per platform - a fact which the older versions of the gwt-maven-plugin are not aware.
I am having the same problem. Here's what worked for me, which I gleaned from this sample pom: http://code.google.com/p/google-web-toolkit/source/browse/trunk/samples/expenses/pom.xml
Add this to your plugin repos:
<pluginRepositories>
<pluginRepository>
<id>gwt-plugin-repo</id>
<url>http://google-web-toolkit.googlecode.com/svn/2.1.0.M3/gwt/maven</url>
<name>Google Web Toolkit Plugin Repository</name>
</pluginRepository>
</pluginRepositories>
Set your gwt-maven-plugin to 1.3.1.google
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>gwt-maven-plugin</artifactId>
<version>1.3.1.google</version>
i was trying to upgrade my project to gwt 2.1, and with the things that Bohemian said and with
<repository>
<id>googlecode</id>
<url>http://code.google.com/p/google-web-toolkit/source/browse/#svn/2.1.0/gwt/maven</url>
</repository>
i succeeded to do so.

Resources