sbt run works but ./spark-submit does not - apache-spark

I want to work with lift-json parser using sbt built. My built.sbt file has the following contents:
name := "MyProject"
version := "1.0"
scalaVersion := "2.10.0"
// https://mvnrepository.com/artifact/net.liftweb/lift-json_2.10
libraryDependencies += "net.liftweb" % "lift-json_2.10" % "3.0-M1"
val lift_json = "net.liftweb" %% "lift-json_2.10" % "3.0-M1"
//val json4sNative = "org.json4s" %% "json4s-native" % "3.3.0"
//libraryDependencies += "org.scala-lang" % "scala-library" % "2.9.1"
lazy val gitclonefile = "/root/githubdependencies/lift"
lazy val g = RootProject(file(gitclonefile))
lazy val root = project in file(".") dependsOn g
MY code is this:
package org.inno.parsertest
import net.liftweb.json._
//import org.json4s._
//import org.json4s.native.JsonMethods._
object parser {
def main (args: Array[String]){
val x = parse(""" { "numbers" : [1, 2, 3, 4] } """)
println(x)
val x1 = "jaimin is awesome"
println(x1)
}
}
sbt package and then sbt run works. but when I want to run this using spark-submit, I am getting the following error:
Error: application failed with exception
java.lang.NoClassDefFoundError: net/liftweb/json/package$
at org.inno.parsertest.parser$.main(jsonparser.scala:7)
at org.inno.parsertest.parser.main(jsonparser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: net.liftweb.json.package$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
How can I make ./spark-submit work?

As soon as the spark driver starts working on your app (when you submit it), it has to deal with the import net.liftweb.json._ line, which means it will look for this class in its classpath.
But Spark does not ship with liftweb's jar, so it is a miss, and you end up with a ClassNotFoundException.
So you need to provide the required jars with your application. There are several way, discussed at length, to do that.
You might start with the spark documentation.
Bundling Your Application’s Dependencies
If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar.
One might suggest :
Package your application as what is often called an "uber jar" or "fat jar", with e.g. sbt's "assembly" plugin, or maven shade, depending on your preference. This strategy merges all of the classes and ressources of all dependencies in a single JAR, the one you submit.
Add arguments to the spark-submit call. There are several way, an easy one being to use the --jars argument, followed by the (comma separated) list of jar files you need. These jars will be added by spark to the actual driver/worker classpath before launching your jobs
Tell spark-submit to "bind" to a maven repository
Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories.
But a full discussion of all options is a rather long one, and I suggest you google "package spark applications" or search StackOverflow with these subjects to gain a better overview.
Sidenote : submitting to Spark an app that does not use a SparkContext seems pointless, but I guess you're just experimenting at this point.

Related

Troubleshooting with Export TableView to excel JavaFx

recently i found this stackoverflow page enter link description here "JavaFx:Export TableView to excel with name of columns", i run this code and i got the following issue
import org.apache.poi.ss.usermodel.Row;
^
TableViewExample.java:13: error: package org.apache.poi.ss.usermodel does not exist
import org.apache.poi.ss.usermodel.Sheet;
^
TableViewExample.java:14: error: package org.apache.poi.ss.usermodel does not exist
import org.apache.poi.ss.usermodel.Workbook;
^
TableViewExample.java:42: error: cannot find symbol
Workbook workbook = new HSSFWorkbook();
^
symbol: class Workbook
location: class TableViewExample
TableViewExample.java:43: error: cannot find symbol
Sheet spreadsheet = workbook.createSheet("sample");
^
symbol: class Sheet
location: class TableViewExample
TableViewExample.java:45: error: cannot find symbol
Row row = spreadsheet.createRow(0);
^
symbol: class Row
location: class TableViewExample
6 errors
According to what is described there, a library named Apache poi is being used.
i used this library in order to run this code poi-3.0.2.jar in this way
javac -classpath ".:poi-3.0.2.jar" TableViewExample.java
using command line in linux ubuntu, i compiled archives.java before with great success using command line.
Other persons have used jakarta-poi-3.0.2.jar but this .jar file library does not exist anymore and i used poi-3.0.2.jar instead.
Now i used the new poi-5.1.0.jar and i got the following error message
library Exception in Application start method
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javafx.application.LauncherImpl.launchApplicationWithArgs(LauncherImpl.java:389)
at com.sun.javafx.application.LauncherImpl.launchApplication(LauncherImpl.java:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.launcher.LauncherHelper$FXHelper.main(LauncherHelper.java:767)
Caused by: java.lang.RuntimeException: Exception in Application start method
at com.sun.javafx.application.LauncherImpl.launchApplication1(LauncherImpl.java:917)
at com.sun.javafx.application.LauncherImpl.lambda$launchApplication$1(LauncherImpl.java:182)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/io/output/UnsynchronizedByteArrayOutputStream
at TableViewExample.start(TableViewExample.java:42)
at com.sun.javafx.application.LauncherImpl.lambda$launchApplication1$8(LauncherImpl.java:863)
at com.sun.javafx.application.PlatformImpl.lambda$runAndWait$7(PlatformImpl.java:326)
at com.sun.javafx.application.PlatformImpl.lambda$null$5(PlatformImpl.java:295)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.javafx.application.PlatformImpl.lambda$runLater$6(PlatformImpl.java:294)
at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method)
at com.sun.glass.ui.gtk.GtkApplication.lambda$null$10(GtkApplication.java:245)
... 1 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more
Related to "java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream"
any information about how to solve this issue and make this piece of code work?
Thank you in advance
The most recent apache poi library is 5.1.0, I do not recommend using old versions. The download is here, though I don't recommend directly downloading libraries either, instead add them as dependencies using a build tool like maven or gradle.
When you look at the project file for poi, it has lots of dependencies, so simply adding the poi jar to the classpath is not enough, instead, let your build tool work it out.
Modern JavaFX distributions (which you should be using) are modular, so you need to either have a module-info file or command-line arguments for the module path and modules added (which you don't have in your example).
If you have a module-info, then you may need to require modules for poi to work (I don't know the command for that).
It may be easier to run without a module-info as long as you have correct module specifications for JavaFX on your command line (as defined by the openjfx.io getting started doc for non-modular applications).
FAQ
i am thinking now in another solutions in how to convert a JavaFX tableview to excel
I agree, don’t use poi, instead write the data to a csv file. You can search for solutions on how to write data in csv format using Java. That is what I would do.
now i implemented the new poi library poi-5.1.0.jar and i got the error message
java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream
During execution, you are still missing the required transitive dependencies required to run the application.
If you use poi, one way to get the required dependencies is to ask maven to copy them to a lib directory:
Make Maven to copy dependencies into target/lib
then when you execute your app jar, have the lib directory for your app on your classpath and the JavaFX SDK lib directory on your module path.
i can create .csv files but unfortunately client ask .xls files
POI can create the xls files directly, or you could run an external tool to do the csv to xls conversion:
Convert .CSV to .XLSX using command line
The tool could be included in your application packaging and installation and invoked from java:
https://www.baeldung.com/run-shell-command-in-java

Databricks-connect: sparkContext.wholeTextFiles

I have setup databricks-connect version 5.5.0. This runtime includes Scala 2.11 and Spark 2.4.3. All the Spark code I have written has been correctly executed and without any issues until I tried calling sparkContext.wholeTextFiles. The error that I get is the following:
Exception in thread "main" java.lang.NoClassDefFoundError: shaded/databricks/v20180920_b33d810/com/google/common/base/Preconditions
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.ensureAuthority(AzureBlobFileSystem.java:775)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:94)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:469)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:997)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:820)
at org.apache.spark.SparkContext.wholeTextFiles(SparkContext.scala:992)
...
Caused by: java.lang.ClassNotFoundException: shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 20 more
One attempt at solving the problem was to move to the latest Databricks runtime - which at the time of this writing is 6.5. That didn't help. I have proceeded going back in versions - 6.4 and 6.3 - since they use different Spark versions but to no avail.
Another thing that I tried was adding "com.google.guava" % "guava" % "23.0" as a dependency to my build.sbt. That in itself results in errors like:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: abfss://abc-fs#cloud.dfs.core.windows.net/paths/something, expected: file:///
I feel that going down the road of satisfying in and every dependency that somehow is not included in the jar may not be the best option.
I wonder if someone has had a similar experience and if so - how did they solve this.
I am willing to give more context if that is necessary.
Thank you!

Spark job failing on jackson dependencies

I have spark job that is failing after the upgrade of the cdh from 5.5.4 which had spark 1.5.0 to cdh 5.13.0 which has spark 1.6.0
The job is running with the new spark dependencies but i see strange behavior for one spark job that:
1) sometimes it's oozie launcher marked as success and other as killed,
2) also for the spark job itself i see that is failing on the jackson databind.
2018-01-05 19:07:17,672 [Driver] ERROR
org.apache.spark.deploy.yarn.ApplicationMaster - User class threw
exception: java.lang.VerifyError: Bad type on operand stack Exception
Details: Location:
org/apache/spark/metrics/sink/MetricsServlet.(Ljava/util/Properties;Lcom/codahale/metrics/MetricRegistry;Lorg/apache/spark/SecurityManager;)V
#116: invokevirtual Reason:
Type 'com/codahale/metrics/json/MetricsModule' (current frame, stack[2]) is not assignable to 'com/fasterxml/jackson/databind/Module'
The error you are getting is a Java Bytecode Verification Error.
This happens right before the class can be loaded onto the JVM by the classloader.
The purpose of this step is to ensure that the code didn't come from a malicious compiler, but indeed follows the Java language rules.
Read more about it here: http://www.oracle.com/technetwork/java/security-136118.html
Now, to address your problem. This error is also thrown when your code finds different jars/classes at runtime than the ones which were used during compile time.
The MetricServlet class in the spark-core lib tries to instantiate an object of type MetricsModule which is packaged inside the metrics-json jar.
Then it tries to register this object (within it's 'ObjectMapper') as a generic Module object.
Note: MetricsModule extends from the Module class of jackson-databind jar.
So, in simple terms, an object of type MetricsModule is being type-casted to parent class Module.
However, the MetricsModule class in your environment is not loaded from the metrics-json Jar, but some other foreign Jar or third party library, where it extended a different Module parent class.
This Jar must have been compiled using some.other.package.Module class rather than the original com.fasterxml.jackson.databind.Module from jackson-databind.
E.g. Uber JAR for CosmosDB connector for Spark packages both MetricsModule and Module class. But the latter is packaged under "cosmosdb_connector_shaded.jackson.databind.Module" giving the exact same error -
"Type 'com/codahale/metrics/json/MetricsModule' (current frame,
stack[2]) is not assignable to
'com/fasterxml/jackson/databind/Module'"
To resolve this class conflict you need to find the JAR which actually loaded MetricsModule class. Use -verbose:class JVM option with your Spark Driver JVM to track this.
#sg1 explanation is accurate. For me, I fixed this error by adding the jars as part of spark.driver.extraClassPath instead of copying them in jars/ directory of spark. You can also try shading the particular dependency such as Jackson in your uber jar.
Since Spark already shipped the metrics-json jar, we can mark the scope as provided, which will resolve the conflicts.
<!-- Metrics -->
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>${metrics.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-json</artifactId>
<version>${metrics.version}</version>
<scope>provided</scope>
</dependency>

java.lang.LinkageError in Spark Streaming

I am using Spark 2.2 on CDH 5.10 cluster with Scala 2.11.8. Everything was working fine but then I suddenly started getting this in the Driver code:
Exception in thread "main" java.lang.LinkageError: loader constraint violation: when resolving method
"org.apache.spark.streaming.StreamingContext$.getOrCreate(Ljava/lang/String;Lscala/Function0;Lorg/apache/hadoop/conf/Configuration;Z)Lorg/apache/spark/streaming/StreamingContext;"
the class loader (instance of org/apache/spark/util/ChildFirstURLClassLoader) of the current class, com/hp/hawkeye/driver/StreamingDriver$,
and the class loader (instance of sun/misc/Launcher$AppClassLoader)
for the method's defining class, org/apache/spark/streaming/StreamingContext$,
have different Class objects for the type scala/Function0 used in the signature
Any ideas how I can fix this?
Figured out the solution - there was a class loader conflict which was because of manually placing a dependency jar on the cluster. These helped :
rm -rf ~/.sbt
rm -rf ~/.ivy2/cache
Then restarted IDEA. Spark-submit on cluster was fine. But placing an extra dependent jar in lib (spark-avro-assembly-4.0.0-snapshot) brought back this issue. Somehow that jar which has a fix for Spark 2.2 with spark-avro 3.2 was creating the problem.

getting ClassCastException in OSGi standalone framework with JAXB

I am running OSGi framework through CLI by running command as below:
java -jar org.eclipse.osgi_3.6.2.R36x_v20110210.jar -console
My plugins run fine but while running my plugin that requires JAXB packages of system library (JavaSE1.6.xx) to parse xml file I am getting exception trace as below:
Exception in thread "DummyProgram" java.lang.ExceptionInInitializerError
at javax.xml.bind.DatatypeConverter.<clinit>(DatatypeConverter.java:78)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$3.run(JAXBContextImpl.java:262)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$3.run(JAXBContextImpl.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:260)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1100)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:143)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:202)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:376)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:574)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:522)
at com.entities.conf.JAXBMTSConfig.unmarshalApps(JAXBMTSConfig.java:113)
20 more..
Caused by: java.lang.ClassCastException: org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl cannot be cast to javax.xml.datatype.DatatypeFactory
at javax.xml.datatype.DatatypeFactory.newInstance(Unknown Source)
at javax.xml.bind.DatatypeConverterImpl.<clinit>(DatatypeConverterImpl.java:742)
I think there is a conflict with different versions of javax.xml.bind.* packages. I guess they are exported by the System library and a xerces (?) jar?
So you'll need to find out which bundle exports those packages, and resolve the conflict.
regards, Frank
A class-cast exception in OSGi is typically caused by the fact, that every bundle has it's own class loader.
It is possible that two bundles load the same class from other sources (because they are exported twice). Because every bundle has its own class loader, they are loaded by two different class loaders, so java doesn't accept this as the same class.
There are two workarounds:
- check if the class is exported twice. If this is the case, try to solve this by exporting it only once.
This may not be possible, because bundle a may need version 1.4, and bundle b 1.7.
if this is the case, import the exported class in the bundle.
So, for example:
Bundle A exports xyz-1.4 and imports xyz-1.4
Bundle B exports xyz-1.7 and imports xyz-1.7
Now the framework can decide which class is used. If bundle a runs alone, 1.4 will be used.
Otherwise, Bundle A and B are needed by a Bundle C, 1.7 will be used (in case that it is downward compatible to 1.4)

Resources