Spark job failing on jackson dependencies - apache-spark

I have spark job that is failing after the upgrade of the cdh from 5.5.4 which had spark 1.5.0 to cdh 5.13.0 which has spark 1.6.0
The job is running with the new spark dependencies but i see strange behavior for one spark job that:
1) sometimes it's oozie launcher marked as success and other as killed,
2) also for the spark job itself i see that is failing on the jackson databind.
2018-01-05 19:07:17,672 [Driver] ERROR
org.apache.spark.deploy.yarn.ApplicationMaster - User class threw
exception: java.lang.VerifyError: Bad type on operand stack Exception
Details: Location:
org/apache/spark/metrics/sink/MetricsServlet.(Ljava/util/Properties;Lcom/codahale/metrics/MetricRegistry;Lorg/apache/spark/SecurityManager;)V
#116: invokevirtual Reason:
Type 'com/codahale/metrics/json/MetricsModule' (current frame, stack[2]) is not assignable to 'com/fasterxml/jackson/databind/Module'

The error you are getting is a Java Bytecode Verification Error.
This happens right before the class can be loaded onto the JVM by the classloader.
The purpose of this step is to ensure that the code didn't come from a malicious compiler, but indeed follows the Java language rules.
Read more about it here: http://www.oracle.com/technetwork/java/security-136118.html
Now, to address your problem. This error is also thrown when your code finds different jars/classes at runtime than the ones which were used during compile time.
The MetricServlet class in the spark-core lib tries to instantiate an object of type MetricsModule which is packaged inside the metrics-json jar.
Then it tries to register this object (within it's 'ObjectMapper') as a generic Module object.
Note: MetricsModule extends from the Module class of jackson-databind jar.
So, in simple terms, an object of type MetricsModule is being type-casted to parent class Module.
However, the MetricsModule class in your environment is not loaded from the metrics-json Jar, but some other foreign Jar or third party library, where it extended a different Module parent class.
This Jar must have been compiled using some.other.package.Module class rather than the original com.fasterxml.jackson.databind.Module from jackson-databind.
E.g. Uber JAR for CosmosDB connector for Spark packages both MetricsModule and Module class. But the latter is packaged under "cosmosdb_connector_shaded.jackson.databind.Module" giving the exact same error -
"Type 'com/codahale/metrics/json/MetricsModule' (current frame,
stack[2]) is not assignable to
'com/fasterxml/jackson/databind/Module'"
To resolve this class conflict you need to find the JAR which actually loaded MetricsModule class. Use -verbose:class JVM option with your Spark Driver JVM to track this.

#sg1 explanation is accurate. For me, I fixed this error by adding the jars as part of spark.driver.extraClassPath instead of copying them in jars/ directory of spark. You can also try shading the particular dependency such as Jackson in your uber jar.

Since Spark already shipped the metrics-json jar, we can mark the scope as provided, which will resolve the conflicts.
<!-- Metrics -->
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>${metrics.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-json</artifactId>
<version>${metrics.version}</version>
<scope>provided</scope>
</dependency>

Related

Groovy-Eclipse 2.5.2: java.lang.ClassNotFoundException: picocli.CommandLine$ParameterException

I'm using Eclipse 4.5 with the Groovy-Eclipse 2.9.2/4.5 plugin which I thought was supposed to have the Groovy 2.5 compiler. However, it didn't have any picocli support so I added the groovy-cli-picocli-2.5.2-indy.jar to my classpath and was able to compile. However #2, when trying to run the script via Eclipse I get:
java.lang.ClassNotFoundException: picocli.CommandLine$ParameterException
It looks like groovy-cli-picocli-2.5.2-indy.jar does not have CommandLine class at all.
I would just throw jars at this from the fullblown picocli distribution but I'm under the impression they all have to somehow wrap nicely into Eclipse Groovy library via groovy.cli.picocli.CliBuilder.
Is my Groovy 2.5.2 missing this or am I somehow missing the boat on how it's supposed to work because picocli is not working for me in this configuration. Thanks!
You are correct: groovy-cli-picocli-2.5.2.jar (and groovy-cli-picocli-2.5.2-indy.jar) do not contain the picocli classes.
You need to add the picocli jar to the classpath.
If you use Maven, the groovy-all POM should include all dependencies.
(My original answer mentioned picocli classes that are shaded into the groovy-2.5.x.jar under the groovyjarjarpicocli package but these are intended for use internally by Groovy and not meant to be used by applications.)

java.lang.LinkageError in Spark Streaming

I am using Spark 2.2 on CDH 5.10 cluster with Scala 2.11.8. Everything was working fine but then I suddenly started getting this in the Driver code:
Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method
"org.apache.spark.streaming.StreamingContext$.getOrCreate(Ljava/lang/String;Lscala/Function0;Lorg/apache/hadoop/conf/Configuration;Z)Lorg/apache/spark/streaming/StreamingContext;"
the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class, com/hp/hawkeye/driver/StreamingDriver$,
and the class loader (instance of sun/misc/Launcher$AppClassLoader)
for the method's defining class, org/apache/spark/streaming/StreamingContext$,
have different Class objects for the type scala/Function0 used in the signature
Any ideas how I can fix this?
Figured out the solution - there was a class loader conflict which was because of manually placing a dependency jar on the cluster. These helped :
rm -rf ~/.sbt
rm -rf ~/.ivy2/cache
Then restarted IDEA. Spark-submit on cluster was fine. But placing an extra dependent jar in lib (spark-avro-assembly-4.0.0-snapshot) brought back this issue. Somehow that jar which has a fix for Spark 2.2 with spark-avro 3.2 was creating the problem.

May I use jxls and apache poi together?

I'm making an application to analize some data and the result must be presented in excel files. In that sense I started to use Apache POI (3.11). Due to some reports consumes a lot of time and memory to be reproduce, I made an investigation and I found jxls, after some test I thought was the solution. But now I found a problem: can´t work both frameworks together.
I have to update Apache POI from 3.11 to 3.14, in order to work with jxls-2.3.0
I made an extra package in order to make my tests with jxls, not problem
I try to migrated one of my classes from Apache POI to jxls, and a I got this error: java.lang.IllegalStateException: Cannot load XLS transformer. Please make sure a Transformer implementation is in classpath. This is the code of my method:
private void prepareNewReport(File excelFile) {
List perforaciones = makePerforacionReport
.makePerforacionData(escenario);
try (InputStream is = ReportePerforacionTotalDialog.class
.getResourceAsStream("PerforacionTotal_template.xls")){
try (OutputStream os = new FileOutputStream(excelFile)) {
Context context = new Context();
context.putVar("perforaciones", perforaciones);
JxlsHelper.getInstance().processTemplate(is, os, context);
LOGGER.logger.log(Level.INFO, "Archivo de perfortacion generado con éxito");
}
} catch (IOException e) {
LOGGER.logger.log(Level.SEVERE, "Problemas buscando el archivo", e);
}
}
How could be this possible?. In the same project I have my test class, just another package and its working fine. As you can see it´s not so much different from the example in the jxls page and the imports are the same.
But even worst, when I tried to make clean & build of my project, then I got this other error:
java.lang.RuntimeException: com.sun.tools.javac.code.Symbol$CompletionFailure: class file for org.openxmlformats.schemas.officeDocument.x2006.docPropsVTypes.CTArray not found
I looked at every library that I importede in order to work with jxls and apache poi, and that´s rigth, that class is not there. Just to see if there a conflict among these two framewoks, I eliminated from the class path all libraries needed to use jxls. Clean & build again, and not problem, I have my .jar file to send to my customer, but incomplete.
I could try to replace all classes that use Apache POI, but that means a lot of work, since POI is used in my project to read excel files with data many times and to write another many files to excel. I planned to use jxls in order to take advantage of use templates.
I will apreciate any help or suggestion.
For the first error, it would appear that the JXLS transformer for Apache POI is missing in your classpath when running the application. Check the JXLS getting started info here: http://jxls.sourceforge.net/getting_started.html
As it is explained in Transformers section (see Main Concepts)) Jxls core module does not depend on any specific Java-Excel library and works with Excel exclusively through a predefined interface. Currently Jxls supplies two implementations of this interface in separate modules based on the well-known Apache POI and Java Excel API libraries.
If you're using maven, be sure to include in your pom.xml the jxls-poi dependency listed on the JXLS getting started page:
<dependency>
<groupId>org.jxls</groupId>
<artifactId>jxls-poi</artifactId>
<version>1.0.9</version>
</dependency>
For the second issue, org.openxmlformats.schemas.officeDocument.x2006.docPropsVTypes.CTArray is not in the apache POI ooxml schemas jar files for either 3.11 (poi-ooxml-schemas-3.11-20141221.jar) or 3.14 (poi-ooxml-schemas-3.14-20160307.jar). POI uses a stripped down set of ooxml schema classes, you will need to get the ooxml schemas complete jar from http://central.maven.org/maven2/org/apache/poi/ooxml-schemas/1.3/ or if you're using maven (or another build tool), get the dependency for your build from https://mvnrepository.com/artifact/org.apache.poi/ooxml-schemas/1.3
e.g for maven:
<!-- https://mvnrepository.com/artifact/org.apache.poi/ooxml-schemas -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>ooxml-schemas</artifactId>
<version>1.3</version>
</dependency>
Be sure to remove the poi-ooxml-schemas dependency from your maven pom.xml so that ooxml-schemas above takes precedence instead.

ASM's Frame class has no generic type

ASM documentation (pdf) says, that Frame class has generic type, providing an example of usage: Frame<BasicValue>. (at p. 119, if needed)
When looking at the source, we can see it's declaration like Frame<V extends Value>.
But for some reason, when in my project I specify maven dependencies,
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm</artifactId>
<version>4.2</version>
</dependency>
<dependency>
<groupId>org.ow2.asm</groupId>
<artifactId>asm-analysis</artifactId>
<version>4.2</version>
</dependency>
or just load according artifacts manually from repository, attempt to use Frame<...> ends with error:
Type org.objectweb.asm.tree.analysis.Frame doesn't have type parameters
And Intellij IDEA decompiler says Frame really has no ones.
The same issue takes place with Analyzer and Interpreter classes.
How can I beat that?
Complementing an answer of #dejvuth:
asm-debug-all happens to be of Java 5.0 version and contains all generic types. Morever, it's binary compatible with plain asm library with no generics
From ASM FAQ
14. What is the earliest JDK required to use ASM?
...
The asm.util and asm.tree packages require JDK 1.2, ...
and History of ASM 4.0 RC1
generified the API to use generics and varargs. However, almost all jars are still small and 1.2 compatible.
Basically, when jarred, ASM optimizes the bytecode, which (among others) makes it backward-compatible with 1.2 by changing its major version to 46 (see org.objectweb.asm.optimizer.ClassOptimizer).
I guess there are two options available: use it without generics or compile the source by yourself.

What driver are you expected to provide for Reflections.collect() to work from a Groovy script?

I have the following snippet of scratch code
import com.google.appengine.api.datastore.Entity
import org.reflections.Reflections
Reflections r = Reflections.collect()
Set<Class<?>> entities = r.getTypesAnnotatedWith(Entity.class)
print entities
that throws the following exception:
org.xml.sax.SAXException: Can't create default XMLReader; is system property org.xml.sax.driver set?
at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(Unknown Source)
at org.dom4j.io.SAXHelper.createXMLReader(SAXHelper.java:83)
Googling for org.xml.sax.SAXException: Can't create default XMLReader; is system property org.xml.sax.driver set? brings up questions, mostly about Android with link only answers or code based answers that do not actually address the issue of providing the correct system property value.
The same code works as Java code from the same IDE project.
So what do I have to supply to get this to work as a Groovy script?
I have this script in the src/test/groovy in my Maven project so I added.
<dependency>
<groupId>org.apache.servicemix.bundles</groupId>
<artifactId>org.apache.servicemix.bundles.crimson</artifactId>
<version>1.1.3_2</version>
<scope>test</scope>
</dependency>
to my pom.xml
And I added -Dorg.xml.sax.driver=org.apache.crimson.parser.XMLReaderImpl to the VM Options: in the Run/Debug Configuration for the script.
This makes it work, but I would still like to know what I can used without having to add a dependency to get things in the test scope to run since things in the main scope work without this dependency.

Resources