sbt, ivy, offline work, and weirdness - apache-spark

I'm trying to work on an sbt project offline (again). Things almost seem to be ok, but there are strange things that I'm baffled by. Here's what I'm noticing:
I've created an empty sbt project and am considering the following dependencies in build.sbt:
name := "sbtSand"
version := "1.0"
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"joda-time" % "joda-time" % "2.9.1",
"org.apache.spark" %% "spark-core" % "1.5.2"
)
I've built the project while online, and can see all the packages in [userhome]/.ivy2/cache. The project builds fine. I then turn off wifi, sbt clean and attempt to build. The build fails. I comment out the spark dependency (keeping the joda-time one). Still offline, I run sbt compile. The project builds fine. I put the spark dependency back in, and sbt clean. It again fails to build. I get back online. I can build again.
The sbt output for the failed builds are like: https://gist.github.com/ashic/9e5ebc39ff4eb8c41ffb
The key part of it is:
[info] Resolving org.apache.hadoop#hadoop-mapreduce-client-app;2.2.0 ...
[warn] Host repo1.maven.org not found. url=https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-app/2.2.0/hadoop-mapreduce-client-app-2.2.0.pom
[info] You probably access the destination server through a proxy server that is not well configured.
It's interesting that sbt is managing to use the joda-time from ivy cache, but for the spark-core package (or rather its dependency) it wants to reach out to the internet and fails the build. Could anybody please help me understand this, and what I can do so that I can get this to work while fully offline?

It seems the issue is resolved in 0.13.9. I was using 0.13.8. [The 0.13.9 msi for windows seemed to give me 0.13.8, while the 0.13.9.2 msi installed the right version. Existing projects need updating manually to 0.13.9 in build properties.]

Related

Spark doesn't find BLAS in MKL dll

I'm working on IntelliJ and specified this parameter to my JVM :
-Dcom.github.fommil.netlib.BLAS=mkl_rt.dll (my mkl folder is in the Path)
However I still have the following warning :
WARN BLAS: Failed to load implementation from: mkl_rt.dll
Any help ?
I finally solved this issue, here's the complete step to do make it work on intelliJ Idea on Windows :
First create an SBT project and make sure to put the following line in build.SBT :
libraryDependencies ++= Seq("com.github.fommil.netlib" % "all" % "1.1.1" pomOnly())
Refresh the project, after that you should have the libraries available. If that doesn't work for some reason, you can go here : http://repo1.maven.org/maven2/com/github/fommil/netlib/ and download the necessary resources for your system directly.
Copy your mkl_rt.dll twice and rename the copies libblas3.dll and liblapack3.dll. Make sure your folders containing all the Dll is in the PATH environment variable.
Finally, go to Run -> Edit configuration and in the VM options put :
-Dcom.github.fommil.netlib.BLAS=mkl_rt.dll

unresolved dependency: com.eed3si9n#sbt-assembly;0.13.0: not found

Did lots of search, saw many people having the similar issue and tried various suggested solution. None worked.
Can someone help me?
resolvers += Resolver.url("bintray-sbt-plugins", url("http://dl.bintray.com/sbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
The file is inside the project folder.
Instead of 0.13.0 version, I used 0.14.0 version.
I fixed this by adding POM file which I downloaded from
https://dl.bintray.com/sbt/sbt-plugin-releases/com.eed3si9n/sbt-assembly/scala_2.10/sbt_0.13/0.14.4/ivys/
to my local ivy folder under below location .ivy/local ( if not present, create the local folder).
once it was there I ran the build and it downloaded the jar.
You need to add [root_dir]/project/plugins.sbt file with the following content:
// packager
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")
Event better - don't use sbt-assembly at all! Flat-jars cause conflicts during merging which need to be resolved with assemblyMergeStrategy.
Use the binary distribution format plugin that sbt offers which enables you to distribute in binary script, dmg, msi and tar.gz.
Check out sbt-native-packager

Aerospark build (and running with PySpark)

I am having trouble building the Aerospark connector # https://github.com/aerospike/aerospark
I have created fresh Ubuntu box, installed JDK, SBT, Maven, Scala 2.10, and then followed the steps at the github page above. During the build i get this error and I am not sure which is the most direct way to build this properly....
[error] 1 error was encountered during merge
java.lang.RuntimeException: deduplicate: different file contents found in
the following: /opt/astools/aerospark/lib/aerospike-helper-java-
1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
/root/.m2/repository/com/aerospike/aerospike-helper-java/1.0.6/aerospike-
helper-java-1.0.6.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
/root/.ivy2/cache/com.aerospike/aerospike-client/jars/aerospike-client
3.3.1.jar:META-INF/maven/com.aerospike/aerospike-client/pom.xml
Are there any updated instructions for building this - the git link in the instructions seems outdated incidentally.
As an aside, has anyone tried this with PySpark?
There was an issue with the build file which affected ubuntu. Can you pull the latest changes and go through the build process again?

Unresolved dependency when assembly Spark 1.2.0

I'm trying to build Spark 1.2.0 on ubuntu but i'm getting dependency issues.
I basically download the files extract the folder and run sbt/sbt/assembly
sbt = 0.13.6
scala = 2.10.4
sbt.ResolveException: unresolved dependency: org.apache.spark#spark-
network-common_2.10;1.2.0: configuration not public in
org.apache.spark#spark-network-common_2.10;1.2.0: 'test'. It was
required from org.apache.spark#spark-network-shuffle_2.10;1.2.0 test
This sbt issue seems to explain it: this would be a consequence of trying to get a test->test dependency if the same version has been resolved out of a public Maven repository.
A workaround would be using git SHA versioning or SNAPSHOT for non final builds of that test dependency, but we won't know more unless we get an idea of how you got to a 'bad' ivy cache state.
TL;DR : try clearing your cache of spark artefacts before building.
Edit: This is fixed in sbt 0.13.10-RC1 https://github.com/sbt/sbt/pull/2345 Please update

SBT assembly jar exclusion

Im using spark (in java API) and require a single jar that can be pushed to the cluster, however the jar itself should not include spark. The app that deploys the jobs of course should include spark.
I would like:
sbt run - everything should be compiled and excuted
sbt smallAssembly - create a jar without spark
sbt assembly - create an uber jar with everything (including spark) for ease of deployment.
I have 1. and 3. working. Any ideas on how I can 2. ? What code would I need to add to my build.sbt file?
The question is not relevant only to spark, but any other dependency that I may wish to exclude as well.
% "provided" configuration
The first option to exclude a jar from the fat jar is to use "provided" configuration on the library dependency. "provided" comes from Maven's provided scope that's defined as follows:
This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.
Since you're deploying your code to a container (in this case Spark), contrary to your comment you'd probably need Scala standard library, and other library jars (e.g. Dispatch if you used it). This won't affect run or test.
packageBin
If you just want your source code, and no Scala standard library or other library dependencies, that would be packageBin built into sbt. This packaged jar can be combined with dependency-only jar you can make using sbt-assembly's assemblyPackageDependency.
excludedJars in assembly
The final option is to use excludedJars in assembly:
excludedJars in assembly := {
val cp = (fullClasspath in assembly).value
cp filter {_.data.getName == "spark-core_2.9.3-0.8.0-incubating.jar"}
}
For beginners like me, simply add the % Provided to Spark dependencies to exclude them from an uber-jar:
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.0" % Provided
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.4.0" % Provided
in build.sbt.

Resources