Correct configuration for Maven to load Native Hadoop Libraries - linux

I am trying to run a Mahout project which I wrote in Eclipse with the Mahout and Hadoop libraries. It loads in a dataset and runs the FPGrowth algorithm. I set up the following Run configuration to run the project:
mvn exec:java -Dexec.mainClass=com.patternmatching.RecommendApp.TopPatternMatches
After running the program, I get the following error message:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I researched this issue, and realized that the Native hadoop libraries have to either be compiled, or downloaded from Apache (Hadoop "Unable to load native-hadoop library for your platform" warning) . I downloaded the libraries on a Cloudera Quickstart VM, on which I set up Mahout and Maven, along with my project package. After running it in cloudera, I get the same error. I also ran the Hadoop checknative -a command, which verifies that the Native libraries are available:
[root#quickstart /]# hadoop checknative -a
16/10/22 19:32:16 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/10/22 19:32:16 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
The output of the command verifies that the libraries are available, but are not being correctly loaded into the program or classpath. I am not sure how to configure Maven so that it loads in the Hadoop native libraries when running the program. This is the dependencies section of the Maven pom.xml file:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.0.0-alpha1</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.9</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
and the command I run to execute my Mahout java program is
mvn exec:java -Dexec.mainClass=com.patternmatching.RecommendApp.TopPatternMatches
How can I configure Maven to see these libraries so they are used in the program?

Related

m2e logback-test.xml not found during incremental build

In a simple Maven project, that I am editing with Eclipse 2022-12 and installing and testing with m2e, I am getting this warning in Eclipse Error Log window:
!ENTRY org.eclipse.m2e.logback.appender 2 0 2023-02-09 16:30:22.592
!MESSAGE sourceFile /home/rsc/share/eclipse-workspace/HelloClient/src/test/resources/logback-test.xml not found during incremental build
My pom.xml specifies to use slf4j with log4j2
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.6</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.19.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j2-impl</artifactId>
<version>2.19.0</version>
</dependency>
When I run mvn test from the command line, no such warning occurs. Hence, no need to use logback.
Why does m2e ask for logback and how can I avoid it?

Cassandra driver 3.4 is not compatible with Guava 30

We've a Java 8 standalone applicat that reads from Cassandra tables, The client version we're currently using is 3.4.0. The application should also support reading from Google Cloud Storage, but once we added the GCS dependencies to the pom file we started see exceptions when reading from Cassandra. Seems like the 3.4 driver uses Guava 19, and the GCS uses Guava 30. Is it possible to make them both live together in the same Java process? Trying to exlude Guava from the cassandra-driver-core 3.4 causing the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback
at com.datastax.driver.core.GuavaCompatibility.selectImplementation(GuavaCompatibility.java:136)
at com.datastax.driver.core.GuavaCompatibility.<clinit>(GuavaCompatibility.java:52)
at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:68)
at com.myorg.infra.cassandra.CassandraConnector.basicBuilder(CassandraConnector.java:32)
at com.myorg.infra.cassandra.CassandraConnector.connect(CassandraConnector.java:61)
at com.myorg.aggregator.cassandra.analytics.repository.CategoryDetailsRepository.main(CategoryDetailsRepository.java:56)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.FutureFallback
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Cassandra dependencies:
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
GCS dependencies:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>20.1.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
</dependency>
I had a similar issue and found that upgrading Cassandra driver version to 3.11.0 was the best solution. It takes Guava 30 while keeping most of the driver interfaces.
Note that Cassandra driver version 4.0+ is not binary compatible, meaning you can not drop it in and hoping it works. It does require a complete re-write of application code.
As to your question,
Is it possible to make them both live together in the same Java process?
It's possible with multiple classloaders but you may not want to do that.

Giving maven dependency error in Spark 2.3

I am building spark 2.3 scala code using maven , giving following error.
error: missing or invalid dependency detected while loading class file SparkSession.class.
This is snippet of pom file, please advise
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
</dependency>
You might want to check your Java and Scala versions. They should be 1.6 or higher and 2.11 respectively. Or it could also be a mismatch with other dependencies like spark_sql.Make sure you have same version across all dependencies.

Does logback need groovy.jar or groovy-all.jar?

I'm looking to minimize the size of my software distribution, and groovy-all.jar is by far the biggest JAR. Groovy is used for logback configuration[1]. On the bottom of the Groovy download page there's a section on the split Groovy distribution.
Which modules / JAR files does logback need to function properly? Is just groovy.jar sufficient?
[1] Yes, I realize I could configure logback with XML, eliminating the need for Groovy support. That is not my question.
I haven't found a source, but as of logback version 1.0.13 my tests show that groovy-jsr223 is needed as well. If I import only groovy in my pom.xml, logback complains about missing classes. The error message is
ERROR in ch.qos.logback.classic.LoggerContext[default] - Groovy classes are not available on the class path. ABORTING INITIALIZATION.
My dependency configuration that works is:
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.0.13</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy</artifactId>
<version>2.5.8</version>
<type>pom</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy-jsr223</artifactId>
<version>2.5.8</version>
<type>pom</type>
<scope>test</scope>
</dependency>

NoClassDefFoundError - datastax java driver for Cassandra

I am currently unable to connect to my cassandra database using the datastax driver. I am getting the following error:
com.datastax.driver.core.TransportException: [/127.0.0.1] Unexpected exception triggered (java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:556)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
Caused by: java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;
at com.datastax.driver.core.DataType.<clinit>(DataType.java:144)
at com.datastax.driver.core.Codec.<clinit>(Codec.java:31)
However, I have included the guava artefact in my pom.xml as follows:
<!-- Datastax driver -->
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>1.0.4</version>
</dependency>
<!-- Cassandra -->
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>1.2.9</version>
</dependency>
<!-- guava --<
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
Full pom.xml: http://pastebin.ubuntu.com/6358603/
Am I missing a dependency?
According to its POM, version 1.0.4 of cassandra-driver-core uses version 14.0.1 of Guava, not version 15.0. I'm guessing you are seeing a version clash. Even if that version difference is not the cause of this problem, it might cause other problems.
You do not usually need to include transitive dependencies in POMs, Maven takes care of them for you. Or does your own code use Guava itself?
Based on the advice of this question: no such method error: ImmutableList.copyOf()
I had to exclude the google collections jar:
<dependency>
<groupId>org.zkoss.zk</groupId>
<artifactId>zkspring-core</artifactId>
<version>3.1</version>
<exclusions>
<exclusion>
<groupId>com.google.collections</groupId>
<artifactId>google-collections</artifactId>
</exclusion>
</exclusions>
</dependency>

Resources