Writing a simple program in Spark 1.4.0 - apache-spark

I'm new in Spark. I installed jdk8 and eclipse (mars) in debian 8. And installes Spark1.4.0 and used sbt/sbt assembly command to get all required. COuld anyone tell me how to write a simple hello program in spark using eclipse ide which need to coded in java. or tell me a url to do the same. I need a step-by-step help.
Thank you in advance

You can make a maven project and add spark 1.4 maven dependency as follow.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.4.0</version>
</dependency>
And start coding in eclipse ide.
you can follow this or this and here is the java wordcount in spark example code
The link is for scala, but the same goes for java. hope it will help.

Related

Run test with Sikuli 2.x on Linux give runtime error:`GLIBC_2.27' not found

I have selenium tests with Sikuli.
When I was using sikuli 1.x all worked fine.
The issue started when I moved to sikuli 2.x
I import Sikuli by maven:
<dependency>
<groupId>com.sikulix</groupId>
<artifactId>sikulixapi</artifactId>
<version>2.0.5</version>
</dependency>
I tried to run this with Jenkins on CentOS7 and AWS linux.
I got on both a runtime error:
GLIBC_2.27' not found (required by /root/.Sikulix/SikulixLibs/libopencv_java430.so)
Did someone see and solve this issue?
Thanks a head for any help
Lior

Log4j 2.15.0 update issue

My application is using Log4j 2.11.1 now. Because of the Log4j security vulnerabilities reported a couple of days ago, I need to update Log4j to 2.15.0. But it fails when I deploy my application on a Linux server.
Here is the error message:
[ERROR] Failed to execute goal on project ***: Could not resolve
dependencies for project ***:1.0-SNAPSHOT: Failed to collect
dependencies at org.apache.logging.log4j:log4j-api:jar:2.15.0: Failed
to read artifact descriptor for
org.apache.logging.log4j:log4j-api:jar:2.15.0: Could not transfer
artifact org.apache.logging.log4j:log4j-api:pom:2.15.0 from/to central
(https://repo1.maven.org/maven2):
sun.security.validator.ValidatorException: PKIX path building failed:
sun.security.provider.certpath.SunCertPathBuilderException: unable to
find valid certification path to requested target -> [Help 1]
I've added the certificate of Maven 2 to my Java keystore, but it does not work. My Java version is 1.8.181.
I had log4j-core and log4j-api which needed to be updated. It is a similar case as you had, deployment on as a Linux server. It works for me.
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
<version>2.15.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.15.0</version>
</dependency>
I added these dependencies and updated the maven project (in the Eclipse IDE, right click on Project → Go to Maven → Update Project).
Log4j Vulnerability issue with later versions
The version Log4j 2.15.0 was released as a possible fix for this critical vulnerability, but this version was found to be still vulnerable (by Apache Software Foundation).
Solution: Log4j 2.16.0 fixes this issue by removing support for message lookup patterns and disabling JNDI functionality by default.
You can have a look at Maven, Ivy, Gradle, and SBT Artifacts.
In my case I had to switch from 1.2.x version to 2.16.0.
You can try using this dependency:
<dependencies>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
<version>2.16.0</version>
</dependency>
</dependencies>
As Log4j security vulnerabilities are addressed in log4j-core, please try the latest version of log4j-core using a Maven dependency:
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.16.0</version>
</dependency>
Reference - Maven Repository: log4j
Regarding the exception you are facing, this may help you -
Java: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Specifically for a "PKIX path building failed" error in Maven:
If you are on Windows and your IT folks have added transparent proxies that intercept SSL traffic, you'll want to set MAVEN_OPTS to the following:
-Djavax.net.ssl.keyStoreType=Windows-MY -Djavax.net.ssl.trustStoreType=Windows-ROOT
This will direct Maven to use the Windows trust store when vetting SSL certificates issued internally by your IT staff.
If it is a transparent proxy peeking at SSL, but you're not in Windows, you may need to add that certificate to your JVMs trusted keystore, as the JVM options I have only work on Windows.
I faced the same problem with the following dependency:
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j</artifactId>
<version>2.15.0</version>
</dependency>
I replaced the above dependency with:
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-1.2-api</artifactId>
<version>2.15.0</version>
</dependency>
It works OK.

Spark netty Version Mismatch on HDInsight Cluster

I am currently having an issue when running my Spark job remotely in a HDInsight Cluster:
My project has a dependency on netty-all and here is what I explicitly specify for it in the pom file:
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.51.Final</version>
</dependency>
The final built jar includes this package with the specified version and running the Spark job on my local machine works fine. However, when I try to run it in the remote HDInsight cluster, the job throws the following exception:
java.lang.NoSuchMethodError: io.netty.handler.ssl.SslProvider.isAlpnSupported(Lio/netty/handler/ssl/SslProvider;)Z
I believe this is due to the netty version mismatch as Spark was picking up the old netty version (netty-all-4.1.17) from its default system classpath in the remote cluster rather than the newer netty package defined in the uber jar.
I have tried different ways to resolve this issue but they don't seem to work well:
Relocating classes using Maven Shade plugin:
More details and its issues are here - Missing Abstract Class using Maven Shade Plugin for Relocating Classes
Spark configurations
spark.driver.extraClassPath=<path to netty-all-4.1.50.Final.jar>
spark.executor.extraClassPath=<path to netty-all-4.1.50.Final.jar>
Would like to know if there is any other solutions to solve this issue or any steps missing here?
You will need to ensure you only have Netty 4.1.50.Final or higher on the classpath

How to install adminlte on netbeans 11

I have just upgraded from netbeans 8.2 to Apache Netbeans 11 in order to start my first webapp. I would like to use adminfaces and I am stuck.
I first used the admin-starter project and cloned it from github. In netbeans IDE I always got the errorpage 500. After that I tried from scratch: New Project, Java with Maven, Web Application
This got me a nice demoPage with my Tomcat9.0 server. Now I tried all different type of things - did not get it to work.
How to Install Adminlte template on netbeans
Basically this is what I was looking for, but this does not work on Netbeans 11 anymore.
Any help is kindly appreciated.
Well first of all I gone back to the original github project. Needless to say the problem has to do with java 11 and some depreciated functions.
So for people who want to use adminfaces with Netbeans 11 and Java 11 (OpenJDK) and Tomcat 9.0:
Download admin-starter-tomcat-master.zip from github
Import in Netbeans. Rename Directory to admin-starter-tomcat.
add dependency for javax.annotation V1.3.2 in pom.xml
<dependency>
<groupId>javax.annotation</groupId>
<artifactId>javax.annotation-api</artifactId>
<version>1.3.2</version>
</dependency>
add dependency for javax.xml.bind.JAXBContext in pom.xml
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-core</artifactId>
<version>2.3.0.1</version>
</dependency>
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
<version>2.3.1</version>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>2.3.1</version>
</dependency>
This worked for me after waisting one week of my time.

error while writing Dataframe into HDFS path [duplicate]

I begin to test spark.
I installed spark on my local machine and run a local cluster with a single worker. when I tried to execute my job from my IDE by setting the sparconf as follows:
final SparkConf conf = new SparkConf().setAppName("testSparkfromJava").setMaster("spark://XXXXXXXXXX:7077");
final JavaSparkContext sc = new JavaSparkContext(conf);
final JavaRDD<String> distFile = sc.textFile(Paths.get("").toAbsolutePath().toString() + "dataSpark/datastores.json");*
I got this exception:
java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RequestMessage; local class incompatible: stream classdesc serialVersionUID = -5447855329526097695, local class serialVersionUID = -2221986757032131007
It can be multiple incompatible reasons below:
Hadoop version;
Spark version;
Scala version;
...
For me, its Scala version , I using 2.11.X in my IDE but official doc says:
Spark runs on Java 7+, Python 2.6+ and R 3.1+. For the Scala API, Spark 1.6.1 uses Scala 2.10. You will need to use a compatible Scala version (2.10.x).
and the x in the doc told cannot be smaller than 3 if you using latest Java(1.8), cause this.
Hope it will help you!
Got it all working with below combination of versions
Installed spark 1.6.2
verify with bin/spark-submit --version
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.2</version>
</dependency>
and
Scala 2.10.6 and Java 8.
Note it did NOT work and have similar class incompatible issue with below versions
Scala 2.11.8 and Java 8
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.6.2</version>
</dependency>
Looks your installed Spark version is not same as the Spark version used in your IDE.
If you are using maven, just compare the version of the dependency declared in pom.xml and the output of bin/spark-submit --version and make sure they are same.
I faced this issue because Spark jar dependency was 2.1.0 but installed Spark Engine version is 2.0.0 Hence version mismatch, So it throws this exception.
The root cause of this problem is version mismatch of Spark jar dependency in project and installed Spark Engine where execute spark job is running.
Hence verify both versions and make them identical.
Example Spark-core Jar version 2.1.0 and Spark Computation Engine version must be: 2.1.0
Spark-core Jar version 2.0.0 and Spark Computation Engine version must be: 2.0.0
It's working for me perfectly.
I had this problem.
when I run the code with spark-submit it works (instead of running with IDE).
./bin/spark-submit --master spark://HOST:PORT target/APP-NAME.jar

Resources