How to build Spark 1.1.0 on Windows 8? - apache-spark

I'm attempting to build Apache Spark 1.1.0 on Windows 8.
I've installed all prerequisites (except Hadoop) and ran sbt/sbt assembly while in the root directory. After downloading many files, I'm getting an error after the line:
Set current project to root <in build file:C:/.../spark-0.9.0-incubating/>". The error is:
[error] Not a valid command: /
[error] /sbt
[error] ^
How to build Spark on Windows?

NOTE Please see my comment about the differences in versions.
The error Not a valid command: / comes from sbt that got executed and attempted to execute a command / (as the first character in /sbt string). It can only mean that you've got sbt shell script available in PATH (possibly installed separately outside the current working directory) or in the current working directory.
Just execute sbt assembly and it should build Spark fine.
According to the main page of Spark:
If you’d like to build Spark from scratch, visit building Spark with Maven.
that clearly states that the official build tool for Spark is now Maven (unfortunately).
You should be able to build a Spark package with the following command:
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
It worked fine for me.

Related

Maven Release plugin not working properly in Windows 10 but works fine on Macbook Pro

I have been happily using the mvn release:prepare command on my Macbook pro for some time and really like it. Unfortunately, due to circumstances, I need the projects to be maintained on a Windows 10 machine. After cloning a project (where release:prepare worked fine) and doing a release:prepare, I now get an error that the pom.xml cannot be found:
[ERROR] The goal you specified requires a project to execute but there is no POM in this directory (<XXXXX>). Please verify you invoked Maven from the correct directory. -> [Help 1]
This is correct because <XXXXX> is a different directory than where I execute the maven command. I have tried to forcibly set the location of the pom.xml file using the following line in the configuration of the release plugin:
<pomFileName>${project.basedir}\pom.xml<pomFileName>
This does not solve my issue. Am I the only one that has this problem on Windows 10?
I am using:
maven version: 3.6.3
release plugin version: 2.5.3
java version: 11.0.17
Weirdest part is that mvn release:clean works fine.
I am not sure if this is a release plugin issue because when running the following maven goal it has a similar problem. The tests fail because it cannot find file in the project folder (folder the command is executed in)
mvn clean verify
I was expecting the release:prepare goal to work the same on Windows 10 as it did on MacOS.

How to build Apache POI - Java API To Access Microsoft Format Files from sources?

I need to build https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml-schemas/4.1.2 project from sources. Since I didn't find any sources except these https://archive.apache.org/dist/poi/release/src/poi-src-4.1.2-20200217.zip I am trying to build them.
gradle build
...
* What went wrong:
Execution failed for task ':ooxml:ant-fetch-ooxml-xsds'.
> Can't get https://www.ecma-international.org/publications/files/ECMA-ST/Office%20Open%20XML%201st%20edition%20Part%204%20(PDF).zip to /home/katya/tmp_work/poi-4.1.2/ooxml-lib/OfficeOpenXML-Part4.zip
Could you tell me where to get working sources of Apache POI version 4.1.2? I must build this library from sources version 4.1.2 .
Apache POI's source code can be obtained from the official SVN reposoitory or from the mirror GIT repository on GitHub.
Apache POI Developer Guide should give you enough information to get started. There's also a general readme on GitHub.
The "Apache POI - How To Build" section from the Apache POI Developer Guide:
Apache POI - How To Build
JDK Version
POI 4.0 and later require JDK version 1.8 or later.
POI 3.11 and later 3.x versions require JDK version 1.6 or later.
POI 3.5 to 3.10 required the JDK version 1.5 or later. Versions prior to 3.5 required JDK 1.4+.
Install Apache Ant
The POI build system requires Apache Ant version 1.8 - 1.9.x for Java 1.6 and higher. Newer versions (1.10.x) require Java 8 or higher.
The current source code has been tested to work with Ant version 1.9.x and 1.10.x.
Remember to set the ANT_HOME environment variable to where Ant was installed/extracted and add ANT_HOME/bin to your shell's PATH.
If you are unsure about the correct value for ANT_HOME, search your file system for "ant.jar". This file will be in the directory %ANT_HOME%/lib." For example, if the path to ant.jar is "C:/Programs/Ant/lib/ant.jar", then set ANT_HOME to "C:/Progams/Ant".
Install Apache Forrest
The POI build system requires Apache Forrest to build the documentation.
Specifically, the build has been tested to work with Forrest 0.90.
Remember to set the FORREST_HOME environment variable.
Building Targets with Ant
The main targets of interest to our users are:
Ant Target
Description
clean
Erase all build work products (ie. everything in the build directory).
compile
Compiles all files from main, ooxml and scratchpad.
test
Run all unit tests from main, ooxml and scratchpad.
jar
Produce jar files.
assemble
Produce .zip and tar.gz distribution packages.
docs
Generate all documentation (Requires Apache Forrest).
jenkins
Runs the tests which Jenkins, our Continuous Integration system, does. This includes the unit tests and various code quality checks.
Working with Eclipse
Apache POI includes a pre-defined Eclipse project file which can be used to quickly get set up in the Eclipse IDE.
First make sure that Java is set up properly and that you can execute the 'javac' executable in your shell.
Next, open Eclipse and create either a local SVN repository, or a copy of the Git repository, and import the project into Eclipse.
Right-click on "build.xml", and select "Run As / Ant Build...". The "Edit Configuration" dialog should appear. In the "Targets" tab, select the "Compile" target and click on "Run".
Note: when executing junit tests from within Eclipse, you might need to set the system property "POI.testdata.path" to the actual location of the 'test-data' directory to make the test framework find the required test-files. A simple value of 'test-data' usually works.
Working with IntelliJ Idea
Import the Gradle project into your IDE. Execute a build to get all the dependencies and generated code in place.
Note: when executing junit tests from within IntelliJ, you might need to set the system property "POI.testdata.path" to the actual location of the 'test-data' directory to make the test framework find the required test-files. A simple value of 'test-data' usually works.
Using Maven
Building Apache POI using Maven is not currently officially supported, and we strongly suggest continuing to use the official Ant build.
However, including Apache POI within your own Maven project is fully supported, and widely used. Please see the Components Page for details of the Maven artifacts available.
Setting environment variables
Linux: help.ubuntu.com, unix.stackexchange.com
Windows: en.wikipedia.org
by Glen Stampoultzis, Tetsuya Kitahata, David Fisher
-- https://poi.apache.org/devel/#Using+Maven

How to install Spark 2.1.0 on Windows 7 64-bit?

I'm on Windows 7 64-bit and am following this blog to install Spark 2.1.0.
So I tried to build Spark from the sources that I'd cloned from https://github.com/apache/spark to C:\spark-2.1.0.
When I run sbt assembly or sbt -J-Xms2048m -J-Xmx2048m assembly, I get:
[info] Loading project definition from C:\spark-2.1.0\project
[info] Compiling 3 Scala sources to C:\spark-2.1.0\project\target\scala-2.10\sbt-0.13\classes...
java.lang.StackOverflowError
at java.security.AccessController.doPrivileged(Native Method)
at java.io.PrintWriter.<init>(Unknown Source)
at java.io.PrintWriter.<init>(Unknown Source)
at scala.reflect.api.Printers$class.render(Printers.scala:168)
at scala.reflect.api.Universe.render(Universe.scala:59)
at scala.reflect.api.Printers$class.show(Printers.scala:190)
at scala.reflect.api.Universe.show(Universe.scala:59)
at scala.reflect.api.Printers$class.treeToString(Printers.scala:182)
...
I adapted the memory settings of sbt as suggested, which are ignored anyway. Any ideas?
The linked blog post was "Posted on April 29, 2015" that's 2 years old now and should only be read to learn how things have changed since (I'm not even going to link the blog post to stop directing people to the site).
The 2017 way of installing Spark on Windows is as follows:
Download Spark from http://spark.apache.org/downloads.html.
Read the official documentation starting from Downloading.
That's it.
Installing Spark on Windows
Windows is known to give you problems due to Hadoop's requirements (and Spark does use Hadoop API under the covers).
You'll have to install winutils binary that you can find at https://github.com/steveloughran/winutils repository.
TIP: You should select the version of Hadoop the Spark distribution was compiled with, e.g. use hadoop-2.7.1 for Spark 2.1.0.
Save winutils.exe binary to a directory of your choice, e.g. c:\hadoop\bin and define HADOOP_HOME to include c:\hadoop.
See Running Spark Applications on Windows for further steps.
The following settings worked for me (sbtconfig.txt):
# Set the java args to high
-Xmx1024M
-XX:MaxPermSize=2048m
-Xss2M
-XX:ReservedCodeCacheSize=128m
# Set the extra SBT options
-Dsbt.log.format=true

Issues building sbt tool while installing Spark

While installing Spark standalone mode on Ubuntu, I am facing issue while running sbt/sbt assembly command, it says No such file or directory found. I did installation from scratch which Covers installation of Java, Scala, Git and finally building Spark using sbt tool. I followed the below tutorial for installations.
https://www.youtube.com/watch?v=eQ0nPdfVfc0

How to build latest version of Spark for a MapR version?

I am running an old MapR cluster, mapr3.
How can I build a custom distribution for Spark 1.5.x for mapr3?
What I understand is getting the right hadoop.version is the key step to make everything work.
I went back to version spark 1.3.1 and found the mapr3 profile in which it had hadoop.version=1.0.3-mapr-3.0.3. To build a complete distribution, the following command will work if you have JAVA_HOME set already:
./make-distribution.sh --name custom-spark --tgz -Dhadoop.version=1.0.3-mapr-3.0.3 -Phadoop-1 -DskipTests

Resources