Setting up Spark on IntelliJ for contributing to Spark - apache-spark

I have lot a lot of time trying to set up spark on Intellij on my local machine.
Goal : To run SparkPi.scala with out any errors.
Steps Taken:
git clone `https://github.com/apache/spark`
Import the project to Intellij as a Maven Project
build/mvn -DskipTests clean package
navigate to examples folder modify pom.xml (change occurrences of provided , test -> compile)
Open SparkPi.scala and add `.master("local[4]")` to Spark Session
Right click and run SparkPi
Error I am faced with
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/MapMaker
at org.apache.spark.SparkContext.<init>(SparkContext.scala:271)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2257)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:822)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:814)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:814)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.MapMaker
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more

You need to rebuild the project under Intellij. Sad but true: IJ is unable to simply reuse the maven built infrastructure.
However it does use part of the command line mvn structure: you do need to run the mvn first.
As for the google MapMaker class: it means the dependencies are not being downloaded properly and not available. This should be resolved after the full rebuild.

Related

Spark job can't connect to cassandra when ran from a jar

I have spark job that writes data to Cassandra(Cassandra is on GCP). When I run this from IntelliJIDEA (my IDE) it works perfectly fine. The data is perfectly sent and written to Cassandra. However, this fails when I package my project into a fat jar and run it.
Here is an example of how I run it.
spark-submit --class com.testing.Job --master local out/artifacts/SparkJob_jar/SparkJob.jar 1 0
However, this fails for me and gives me the following errors
Caused by: java.io.IOException: Failed to open native connection to Cassandra at {X.X.X:9042} :: 'com.datastax.oss.driver.api.core.config.ProgrammaticDriverConfigLoaderBuilder com.datastax.oss.driver.api.core.config.DriverConfigLoader.programmaticBuilder()'
Caused by: java.lang.NoSuchMethodError: 'com.datastax.oss.driver.api.core.config.ProgrammaticDriverConfigLoaderBuilder com.datastax.oss.driver.api.core.config.DriverConfigLoader.programmaticBuilder()'
My artifacts file does include the spark-Cassandra files
spark-cassandra-connector-driver_2.12-3.0.0-beta.jar
spark-cassandra-connector_2.12-3.0.0-beta.jar
I'm wondering why this is happening and how I can fix it?
The problem is that besides that 2 things, you need to have more jars - full Java driver, and its dependencies. You have following possibilities to fix that:
You need to make sure that these artifact is packaged into the resulting jar (so-called "fat jar" or an "assembly") using Maven or SBT, or anything else
you can can specify maven coordinates com.datastax.spark:spark-cassandra-connector_2.12:3.0.0-beta with --packages like this --packages com.datastax.spark:spark-cassandra-connector_2.12:3.0.0-beta
you can download spark-cassandra-connector-assembly artifact to the node from which you're doing spark-submit, and then use that file name with --jars
See the documentation for Spark Cassandra Connector for more details.

log4j runtime NoClassDefFoundError

I am using log4j in android jar module. I can build the jar file successfully and run it in AndroidStudio successfully.
My gradle config is:
implementation 'log4j:log4j:1.2.17'
But when I try the jar file from command line:
java -jar test.jar
I got below error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Logger
at com.yeetor.Main.<clinit>(Main.java:39)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Why it can run in AndroidStudio but not work from command line?
Refer to this SO, to generate one jar file include dependency. the dependency should be included using "compile" but not "implementation", then you will get a bigger Jar file include all dependency. Usually gradle will not include dependency in Jar file but generate all each dependency as a single jar file.

spark-submit dependency conflict

I'm trying to submit a jar to spark but my jar contains dependencies that conflict with spark's built-in jars (snakeyml and others).
Is there a way to tell spark to prefer whatever dependencies my project has over the jars inside /jar
UPDATE
When i run spark-submit, i get the following exception:
Caused by: java.lang.NoSuchMethodError: javax.validation.BootstrapConfiguration.getClockProviderClassName()Ljava/lang/String;
at org.hibernate.validator.internal.xml.ValidationBootstrapParameters.<init>(ValidationBootstrapParameters.java:63)
at org.hibernate.validator.internal.engine.ConfigurationImpl.parseValidationXml(ConfigurationImpl.java:540)
at org.hibernate.validator.internal.engine.ConfigurationImpl.buildValidatorFactory(ConfigurationImpl.java:337)
at javax.validation.Validation.buildDefaultValidatorFactory(Validation.java:110)
at org.hibernate.cfg.beanvalidation.TypeSafeActivator.getValidatorFactory(TypeSafeActivator.java:501)
at org.hibernate.cfg.beanvalidation.TypeSafeActivator.activate(TypeSafeActivator.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.hibernate.cfg.beanvalidation.BeanValidationIntegrator.integrate(BeanValidationIntegrator.java:132)
... 41 more
which is caused by spark having an older version of validation-api (validation-api-1.1.0.Final.jar)
My project has a dependency on the newer version and it does get bundled with my jar (javax.validation:validation-api:jar:2.0.1.Final:compile)
I submit using this command:
/spark/bin/spark-submit --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true
but i still get the same exception
If you are building your jar using SBT, you need to exclude those classes which are on the cluster. For example like below:
"org.apache.spark" %% "spark-core" % "2.2.0" % "provided"
You are doing that by adding "provided", that means these classes is provided already in the environment where you run it.
Not sure if using SBT, but I used this in build.sbt via assembly as I had also sorts of dependency conflicts at one stage. See below, maybe this will help.
This is controlled by setting the following confs to true:
spark.driver.userClassPathFirst
spark.executor.userClassPathFirst
I had issues with 2 jars, and this is what I ended up doing, ie copied the required jars to a directory, and used the extraClasspath option
spark-submit --conf spark.driver.extraClassPath="C:\sparkjars\validation-api-2.0.1.Final.jar;C:\sparkjars\gson-2.8.6.jar" myspringbootapp.jar
From the documentaion, spark.driver.extraClassPath Extra classpath entries to prepend to the classpath of the driver.

Apache-Poi word and intellij

How do i add the jar files to intellij properly. The program works in intellij but not as a .jar file.
I have tried adding the files and exporting them. I dont know what to do. I've been going around in circles for the last 6 hours. Its obiously something to do with the runtime but i dont know anything about that and either there is not much information on it or (most probably) im not googling the right things.
Its this error Caused by: java.lang.NoClassDefFoundError: org/apache/poi/xwpf/usermodel/XWPFDocument
at Methods.word_output.createDocx(word_output.java:28)
at Controllers.mainController.createReport(mainController.java:466)
... 53 more
Caused by: java.lang.ClassNotFoundException: org.apache.poi.xwpf.usermodel.XWPFDocument
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 55 more
A good option is to use a Gradle build.gradle file in the project main directory and then import the project in IntelliJ via "Import project" and choosing that file. This way IntelliJ resolves all the necessary dependencies for you.
A sample minimal Gralde build-file is
apply plugin: 'java'
repositories {
mavenCentral()
}
dependencies {
compile 'org.apache.poi:poi:3.15-beta1'
compile 'org.apache.poi:poi-ooxml:3.15-beta1'
testCompile "junit:junit:[4.12,)"
}
Ideally you would follow the layout of Gradle builds and put your sources in src/main/java and your tests in src/test/java.
As a bonus you gain the possibility to build the project on the commandline/CI/... whatever!

java.lang.ClassNotFoundException: javax.servlet.jsp.jstl.core.Config [duplicate]

This question already has answers here:
Deploying a JSF webapp results in java.lang.ClassNotFoundException: javax.servlet.jsp.jstl.core.Config [duplicate]
(3 answers)
Closed 7 years ago.
When I am run my application after entering the URL, this exception is coming.I am using Eclipse and Tomcat7.0.35. I also added Jstl.jar and jstl1.2.jar
My code is
java.lang.ClassNotFoundException: javax.servlet.jsp.jstl.core.Config
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1714)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1559)
at org.apache.myfaces.view.jsp.JspViewDeclarationLanguage.buildView(JspViewDeclarationLanguage.java:91)
at org.apache.myfaces.lifecycle.RenderResponseExecutor.execute(RenderResponseExecutor.java:78)
at org.apache.myfaces.lifecycle.LifecycleImpl.render(LifecycleImpl.java:241)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:199)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
The error is telling you it cannot find the class because it is not available in your application.
If you are using Maven, make sure you have the dependency for jstl artifact:
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>jstl</artifactId>
<version>1.2</version>
</dependency>
If you are not using it, just make sure you include the JAR in your classpath. See this answer for download links.
Download the following jars and add it to your WEB-INF/lib directory:
jsp-api-2.0.jar
jstl-1.2.jar
By default, Tomcat container doesn’t contain any jstl library. To fix it, declares jstl.jar in your Maven pom.xml file if you are working in Maven project or add it to your application's classpath
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>jstl</artifactId>
<version>1.2</version>
</dependency>
Probably the jstl libraries are missing from your classpath/not accessible by tomcat.
You need to add at least the following jar files in your WEB-INF/lib directory:
jsf-impl.jar
jsf-api.jar
jstl.jar
Add jstl jar to your application classpath.
Just a quick comment: sometimes Maven does not copy the jstl-.jar to the WEB-INF folder even if the pom.xml has the entry for it.
I had to manually copy the JSTL jar to /WEB-INF/lib on the file system. That resolved the problem. The issue may be related to Maven war packaging plugin.
Add JSTL library as dependency to your project (javax.servlet.jsp.jstl.core.Config is a part of this package).
For example, if you were using Gradle, you could write in a build.gradle:
dependencies {
compile 'javax.servlet:jstl:1.2'
}
I had the same problem. Go to Project Properties -> Deployment Assemplbly and add jstl jar

Resources