How to install Spark 2.1.0 on Windows 7 64-bit? - apache-spark

I'm on Windows 7 64-bit and am following this blog to install Spark 2.1.0.
So I tried to build Spark from the sources that I'd cloned from https://github.com/apache/spark to C:\spark-2.1.0.
When I run sbt assembly or sbt -J-Xms2048m -J-Xmx2048m assembly, I get:
[info] Loading project definition from C:\spark-2.1.0\project
[info] Compiling 3 Scala sources to C:\spark-2.1.0\project\target\scala-2.10\sbt-0.13\classes...
java.lang.StackOverflowError
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.<init>(Unknown Source)
at java.io.PrintWriter.<init>(Unknown Source)
at scala.reflect.api.Printers$class.render(Printers.scala:168)
at scala.reflect.api.Universe.render(Universe.scala:59)
at scala.reflect.api.Printers$class.show(Printers.scala:190)
at scala.reflect.api.Universe.show(Universe.scala:59)
at scala.reflect.api.Printers$class.treeToString(Printers.scala:182)
...
I adapted the memory settings of sbt as suggested, which are ignored anyway. Any ideas?

The linked blog post was "Posted on April 29, 2015" that's 2 years old now and should only be read to learn how things have changed since (I'm not even going to link the blog post to stop directing people to the site).
The 2017 way of installing Spark on Windows is as follows:
Download Spark from http://spark.apache.org/downloads.html.
Read the official documentation starting from Downloading.
That's it.
Installing Spark on Windows
Windows is known to give you problems due to Hadoop's requirements (and Spark does use Hadoop API under the covers).
You'll have to install winutils binary that you can find at https://github.com/steveloughran/winutils repository.
TIP: You should select the version of Hadoop the Spark distribution was compiled with, e.g. use hadoop-2.7.1 for Spark 2.1.0.
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin and define HADOOP_HOME to include c:\hadoop.
See Running Spark Applications on Windows for further steps.

The following settings worked for me (sbtconfig.txt):
# Set the java args to high
-Xmx1024M
-XX:MaxPermSize=2048m
-Xss2M
-XX:ReservedCodeCacheSize=128m
# Set the extra SBT options
-Dsbt.log.format=true

Related

java.lang.NoSuchMethodError when loading external FAT-JARs in Zeppelin

While trying to run a piece of code that used some FAT JARs (that share some common submodules) built using sbt assembly, I'm running into this nasty java.lang.NoSuchMethodError
The JAR is built on EMR itself (and not uploaded from some other environment), so version conflict in libraries / Spark / Scala etc is unlikely
My EMR environment:
Release label: emr-5.11.0
Hadoop distribution: Amazon 2.7.3
Applications: Spark 2.2.1, Zeppelin 0.7.3, Ganglia 3.7.2, Hive 2.3.2, Livy 0.4.0, Sqoop 1.4.6, Presto 0.187
Project configurations:
Scala 2.11.11
Spark 2.2.1
SBT 1.0.3
It turned out that the real culprit were the shared submodules in those jars.
Two fat jars built out of projects containing common submodules were leading to this conflict. Removing one of those jars resolved the issue.
I'm not sure if this conflict happened only under some particular circumstances or would always occur upon uploading such jars (that have same submodules) in Zeppelin interpreter, so still waiting for proper explanation.

Issues building sbt tool while installing Spark

While installing Spark standalone mode on Ubuntu, I am facing issue while running sbt/sbt assembly command, it says No such file or directory found. I did installation from scratch which Covers installation of Java, Scala, Git and finally building Spark using sbt tool. I followed the below tutorial for installations.
https://www.youtube.com/watch?v=eQ0nPdfVfc0

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

How can spark-shell work without installing Scala beforehand?

I have downloaded Spark 1.2.0 (pre-built for Hadoop 2.4). In its quick start doc, it says:
It is available in either Scala or Python.
What confuses me is that my computer doesn't have Scala installed separately before (OS X 10.10), but when I type spark-shell it works well, and the output shows:
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
as depicted in the screenshot:
I didn't install any Scala distribution before.
How can spark-shell run without Scala?
tl;dr Scala binaries are included in Spark already (to make Spark users' life easier).
Under Downloading in Spark Overview you can read about what is required to run Spark:
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS).
It’s easy to run locally on one machine — all you need is to have java
installed on your system PATH, or the JAVA_HOME environment variable
pointing to a Java installation.
Spark runs on Java 6+ and Python 2.6+. For the Scala API, Spark 1.2.0
uses Scala 2.10. You will need to use a compatible Scala version
(2.10.x).
Scala program, including spark-shell, is compiled to Java byte code, which can be run with Java virtual machine (JVM). Therefore, as long as you have JVM installed, meaning java command, you can run the Spark-related tools written in Scala.

Run Presto on JDK 6

I tried to run the launcher but encountered this error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: sun/misc/FloatingDecimal : Unsupported major.minor version 51.0
at java.lang.Double.toString(Double.java:196)
at java.lang.String.valueOf(String.java:2985)
at java.security.Provider.putId(Provider.java:433)
at java.security.Provider.<init>(Provider.java:137)
at sun.security.jca.ProviderList$1.<init>(ProviderList.java:71)
at sun.security.jca.ProviderList.<clinit>(ProviderList.java:70)
at sun.security.jca.Providers.<clinit>(Providers.java:56)
at sun.security.util.ManifestEntryVerifier.<clinit>(ManifestEntryVerifier.java:47)
at java.util.jar.JarFile.initializeVerifier(JarFile.java:335)
at java.util.jar.JarFile.getInputStream(JarFile.java:410)
at sun.misc.URLClassPath$JarLoader$2.getInputStream(URLClassPath.java:721)
at sun.misc.Resource.cachedInputStream(Resource.java:77)
at sun.misc.Resource.getByteBuffer(Resource.java:160)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:266)
at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: com.facebook.presto.server.PrestoServer. Program will exit.
I think that it is because I run the launcher on JDK 6.(whereas it required JDK 7.) Is there any version of Presto that can run on JDK 6? Because I currently want to run it on my Cloudera Hadoop cluster and Cloudera seems only to work well with JDK 6.
Thanks.
Presto is only compatible with Java 7.
You should be able to install both Java 6 and 7 on the same machine. You just need to make sure Java 7's bin directory is in your PATH before you start the presto launcher.
Presto will definitely not work with JDK 6. In addition to heavily using features like try-with-resources, the bytecode compiler for queries is all based on invokedynamic. JDK 7 is substantially faster, not to mention that JDK 6 has been end of life since February.
That said, you can easily have both JDKs installed on the same machine and use JDK 6 for Hadoop and JDK 7 for Presto. The Presto launcher will simply use whichever java is first in PATH, so put JDK 7 first in your PATH before running the launcher.

Resources