Databricks-connect: sparkContext.wholeTextFiles - apache-spark

I have setup databricks-connect version 5.5.0. This runtime includes Scala 2.11 and Spark 2.4.3. All the Spark code I have written has been correctly executed and without any issues until I tried calling sparkContext.wholeTextFiles. The error that I get is the following:
Exception in thread "main" java.lang.NoClassDefFoundError: shaded/databricks/v20180920_b33d810/com/google/common/base/Preconditions
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.ensureAuthority(AzureBlobFileSystem.java:775)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.initialize(AzureBlobFileSystem.java:94)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:500)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:469)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:997)
at org.apache.spark.SparkContext$$anonfun$wholeTextFiles$1.apply(SparkContext.scala:992)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:820)
at org.apache.spark.SparkContext.wholeTextFiles(SparkContext.scala:992)
...
Caused by: java.lang.ClassNotFoundException: shaded.databricks.v20180920_b33d810.com.google.common.base.Preconditions
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
... 20 more
One attempt at solving the problem was to move to the latest Databricks runtime - which at the time of this writing is 6.5. That didn't help. I have proceeded going back in versions - 6.4 and 6.3 - since they use different Spark versions but to no avail.
Another thing that I tried was adding "com.google.guava" % "guava" % "23.0" as a dependency to my build.sbt. That in itself results in errors like:
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: abfss://abc-fs#cloud.dfs.core.windows.net/paths/something, expected: file:///
I feel that going down the road of satisfying in and every dependency that somehow is not included in the jar may not be the best option.
I wonder if someone has had a similar experience and if so - how did they solve this.
I am willing to give more context if that is necessary.
Thank you!

Related

Troubleshooting with Export TableView to excel JavaFx

recently i found this stackoverflow page enter link description here "JavaFx:Export TableView to excel with name of columns", i run this code and i got the following issue
import org.apache.poi.ss.usermodel.Row;
^
TableViewExample.java:13: error: package org.apache.poi.ss.usermodel does not exist
import org.apache.poi.ss.usermodel.Sheet;
^
TableViewExample.java:14: error: package org.apache.poi.ss.usermodel does not exist
import org.apache.poi.ss.usermodel.Workbook;
^
TableViewExample.java:42: error: cannot find symbol
Workbook workbook = new HSSFWorkbook();
^
symbol: class Workbook
location: class TableViewExample
TableViewExample.java:43: error: cannot find symbol
Sheet spreadsheet = workbook.createSheet("sample");
^
symbol: class Sheet
location: class TableViewExample
TableViewExample.java:45: error: cannot find symbol
Row row = spreadsheet.createRow(0);
^
symbol: class Row
location: class TableViewExample
6 errors
According to what is described there, a library named Apache poi is being used.
i used this library in order to run this code poi-3.0.2.jar in this way
javac -classpath ".:poi-3.0.2.jar" TableViewExample.java
using command line in linux ubuntu, i compiled archives.java before with great success using command line.
Other persons have used jakarta-poi-3.0.2.jar but this .jar file library does not exist anymore and i used poi-3.0.2.jar instead.
Now i used the new poi-5.1.0.jar and i got the following error message
library Exception in Application start method
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javafx.application.LauncherImpl.launchApplicationWithArgs(LauncherImpl.java:389)
at com.sun.javafx.application.LauncherImpl.launchApplication(LauncherImpl.java:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at sun.launcher.LauncherHelper$FXHelper.main(LauncherHelper.java:767)
Caused by: java.lang.RuntimeException: Exception in Application start method
at com.sun.javafx.application.LauncherImpl.launchApplication1(LauncherImpl.java:917)
at com.sun.javafx.application.LauncherImpl.lambda$launchApplication$1(LauncherImpl.java:182)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/commons/io/output/UnsynchronizedByteArrayOutputStream
at TableViewExample.start(TableViewExample.java:42)
at com.sun.javafx.application.LauncherImpl.lambda$launchApplication1$8(LauncherImpl.java:863)
at com.sun.javafx.application.PlatformImpl.lambda$runAndWait$7(PlatformImpl.java:326)
at com.sun.javafx.application.PlatformImpl.lambda$null$5(PlatformImpl.java:295)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.javafx.application.PlatformImpl.lambda$runLater$6(PlatformImpl.java:294)
at com.sun.glass.ui.InvokeLaterDispatcher$Future.run(InvokeLaterDispatcher.java:95)
at com.sun.glass.ui.gtk.GtkApplication._runLoop(Native Method)
at com.sun.glass.ui.gtk.GtkApplication.lambda$null$10(GtkApplication.java:245)
... 1 more
Caused by: java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 10 more
Related to "java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream"
any information about how to solve this issue and make this piece of code work?
Thank you in advance
The most recent apache poi library is 5.1.0, I do not recommend using old versions. The download is here, though I don't recommend directly downloading libraries either, instead add them as dependencies using a build tool like maven or gradle.
When you look at the project file for poi, it has lots of dependencies, so simply adding the poi jar to the classpath is not enough, instead, let your build tool work it out.
Modern JavaFX distributions (which you should be using) are modular, so you need to either have a module-info file or command-line arguments for the module path and modules added (which you don't have in your example).
If you have a module-info, then you may need to require modules for poi to work (I don't know the command for that).
It may be easier to run without a module-info as long as you have correct module specifications for JavaFX on your command line (as defined by the openjfx.io getting started doc for non-modular applications).
FAQ
i am thinking now in another solutions in how to convert a JavaFX tableview to excel
I agree, don’t use poi, instead write the data to a csv file. You can search for solutions on how to write data in csv format using Java. That is what I would do.
now i implemented the new poi library poi-5.1.0.jar and i got the error message
java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream
During execution, you are still missing the required transitive dependencies required to run the application.
If you use poi, one way to get the required dependencies is to ask maven to copy them to a lib directory:
Make Maven to copy dependencies into target/lib
then when you execute your app jar, have the lib directory for your app on your classpath and the JavaFX SDK lib directory on your module path.
i can create .csv files but unfortunately client ask .xls files
POI can create the xls files directly, or you could run an external tool to do the csv to xls conversion:
Convert .CSV to .XLSX using command line
The tool could be included in your application packaging and installation and invoked from java:
https://www.baeldung.com/run-shell-command-in-java

Issue with bounty castle versions

My code which signs and verifies the string works fine when using bounty castle bcprov-jdk16-1.46.jar.
I have upgraded the jar to bcprov-jdk15on-1.66.jar, my code starts throwing below error.
Exception in thread "main" java.lang.NoClassDefFoundError: org/bouncycastle/asn1/DEREncodable
at com.esb.cms.CmsCryptographyEngine.sign(CmsCryptographyEngine.java:124)
at com.esb.cms.CmsCryptographyEngine.main(CmsCryptographyEngine.java:53)
Caused by: java.lang.ClassNotFoundException: org.bouncycastle.asn1.DEREncodable
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 2 more
The restriction is, I have to use bcprov-jdk15on-1.66.jar. Any Idea how to fix it on this version
Thanks
Thanks All, I was able to solve this issue. Code change was required from 1.46 to 1.66.

sbt run works but ./spark-submit does not

I want to work with lift-json parser using sbt built. My built.sbt file has the following contents:
name := "MyProject"
version := "1.0"
scalaVersion := "2.10.0"
// https://mvnrepository.com/artifact/net.liftweb/lift-json_2.10
libraryDependencies += "net.liftweb" % "lift-json_2.10" % "3.0-M1"
val lift_json = "net.liftweb" %% "lift-json_2.10" % "3.0-M1"
//val json4sNative = "org.json4s" %% "json4s-native" % "3.3.0"
//libraryDependencies += "org.scala-lang" % "scala-library" % "2.9.1"
lazy val gitclonefile = "/root/githubdependencies/lift"
lazy val g = RootProject(file(gitclonefile))
lazy val root = project in file(".") dependsOn g
MY code is this:
package org.inno.parsertest
import net.liftweb.json._
//import org.json4s._
//import org.json4s.native.JsonMethods._
object parser {
def main (args: Array[String]){
val x = parse(""" { "numbers" : [1, 2, 3, 4] } """)
println(x)
val x1 = "jaimin is awesome"
println(x1)
}
}
sbt package and then sbt run works. but when I want to run this using spark-submit, I am getting the following error:
Error: application failed with exception
java.lang.NoClassDefFoundError: net/liftweb/json/package$
at org.inno.parsertest.parser$.main(jsonparser.scala:7)
at org.inno.parsertest.parser.main(jsonparser.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:77)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: net.liftweb.json.package$
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
How can I make ./spark-submit work?
As soon as the spark driver starts working on your app (when you submit it), it has to deal with the import net.liftweb.json._ line, which means it will look for this class in its classpath.
But Spark does not ship with liftweb's jar, so it is a miss, and you end up with a ClassNotFoundException.
So you need to provide the required jars with your application. There are several way, discussed at length, to do that.
You might start with the spark documentation.
Bundling Your Application’s Dependencies
If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar.
One might suggest :
Package your application as what is often called an "uber jar" or "fat jar", with e.g. sbt's "assembly" plugin, or maven shade, depending on your preference. This strategy merges all of the classes and ressources of all dependencies in a single JAR, the one you submit.
Add arguments to the spark-submit call. There are several way, an easy one being to use the --jars argument, followed by the (comma separated) list of jar files you need. These jars will be added by spark to the actual driver/worker classpath before launching your jobs
Tell spark-submit to "bind" to a maven repository
Users may also include any other dependencies by supplying a comma-delimited list of maven coordinates with --packages. All transitive dependencies will be handled when using this command. Additional repositories (or resolvers in SBT) can be added in a comma-delimited fashion with the flag --repositories.
But a full discussion of all options is a rather long one, and I suggest you google "package spark applications" or search StackOverflow with these subjects to gain a better overview.
Sidenote : submitting to Spark an app that does not use a SparkContext seems pointless, but I guess you're just experimenting at this point.

java.lang.NoClassDefFoundError: Could not initialize class org.apache.poi.openxml4j.opc.internal.marshallers.ZipPackagePropertiesMarshaller

I am trying to generate Excel using Xssf API because its memory footprint is small.
It is working fine in my local machine which is having jdk1.7.
But when I try to run it on UNIX where java version is 1.6.0_75 it gives me the following error.
java.lang.NoClassDefFoundError: Could not initialize class org.apache.poi.openxml4j.opc.internal.marshallers.ZipPackagePropertiesMarshaller
I have following jars in my classpath
poi-3.11-20141221.jar
poi-excelant-3.11-20141221.jar
poi-ooxml-3.11-20141221.jar
poi-ooxml-schemas-3.11-20141221.jar
xmlbeans-2.6.0.jar
xercesImpl.jar
I have verified that poi-3.11-20141221.jar has the ZipPackagePropertiesMarshaller class.
Seems that some jar is missing.
Am I missing something?
I have found a solution to my own problem.
I replaced poi-3.11-20141221.jar with poi-ooxml-3.9.jar. That worked.
Java version 1.6.0_75 does not exists, I suppose you make a typo. The last update of Java 6 is the update 45 (6u45).
The class ZipPackagePropertiesMarshaller is loaded at run-time for sure. The exception NoClassDefFoundError occurs during the initialization phase; if the exception had been ClassNotFoundException, it would have been different...
The class ZipPackagePropertiesMarshaller is unaltered between the versions 3.11 and 3.9, but the class PackagePropertiesMarshaller extended by ZipPackagePropertiesMarshaller is changed: the main change regards the use of StAX in the newer version.
The distribution of StAX coming with Java 6, but the version of Java 6 update 18 (http://www.oracle.com/technetwork/java/javase/6u18-142093.html) introduces the StAX 1.2 API version.
Consider to use Java 6u18 or newer. This should solve your problem.
In the official FAQ there are some indications about a similar problem: https://poi.apache.org/faq.html#faq-N1017E.
Moreover, the workaround you found is not the best one, see the last FAQ of POI.

getting ClassCastException in OSGi standalone framework with JAXB

I am running OSGi framework through CLI by running command as below:
java -jar org.eclipse.osgi_3.6.2.R36x_v20110210.jar -console
My plugins run fine but while running my plugin that requires JAXB packages of system library (JavaSE1.6.xx) to parse xml file I am getting exception trace as below:
Exception in thread "DummyProgram" java.lang.ExceptionInInitializerError
at javax.xml.bind.DatatypeConverter.<clinit>(DatatypeConverter.java:78)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$3.run(JAXBContextImpl.java:262)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$3.run(JAXBContextImpl.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.<init>(JAXBContextImpl.java:260)
at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1100)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:143)
at com.sun.xml.internal.bind.v2.ContextFactory.createContext(ContextFactory.java:110)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:202)
at javax.xml.bind.ContextFinder.find(ContextFinder.java:376)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:574)
at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:522)
at com.entities.conf.JAXBMTSConfig.unmarshalApps(JAXBMTSConfig.java:113)
20 more..
Caused by: java.lang.ClassCastException: org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl cannot be cast to javax.xml.datatype.DatatypeFactory
at javax.xml.datatype.DatatypeFactory.newInstance(Unknown Source)
at javax.xml.bind.DatatypeConverterImpl.<clinit>(DatatypeConverterImpl.java:742)
I think there is a conflict with different versions of javax.xml.bind.* packages. I guess they are exported by the System library and a xerces (?) jar?
So you'll need to find out which bundle exports those packages, and resolve the conflict.
regards, Frank
A class-cast exception in OSGi is typically caused by the fact, that every bundle has it's own class loader.
It is possible that two bundles load the same class from other sources (because they are exported twice). Because every bundle has its own class loader, they are loaded by two different class loaders, so java doesn't accept this as the same class.
There are two workarounds:
- check if the class is exported twice. If this is the case, try to solve this by exporting it only once.
This may not be possible, because bundle a may need version 1.4, and bundle b 1.7.
if this is the case, import the exported class in the bundle.
So, for example:
Bundle A exports xyz-1.4 and imports xyz-1.4
Bundle B exports xyz-1.7 and imports xyz-1.7
Now the framework can decide which class is used. If bundle a runs alone, 1.4 will be used.
Otherwise, Bundle A and B are needed by a Bundle C, 1.7 will be used (in case that it is downward compatible to 1.4)

Resources