JavaSparkContext - jarOfClass or jarOfObject doesnt work - apache-spark

Hi I am trying to run my spark service against cluster. As it turns out I have to do setJars and set my applicaiton jar in there. If I do it using physical path like following it works
conf.setJars(new String[]{"/path/to/jar/Sample.jar"});
but If i try to use JavaSparkContext (or SparkContext) api jarOfClass or jarOfObject it doesnt work. Basically API cant find jar itself.
Following returns empty
JavaSparkContext.jarOfObject(this);
JavaSparkContext.jarOfClass(this.getClass())
Its an excellent API only if it worked! Any one else able to make use of this?

[I have included example for Scala. I am sure it will work the same way for Java]
It will work if you do :
SparkContext.jarOfObject(this.getClass)
Surprisingly, this works for Scala Object as well as Scala Class.

How are you running the app? If you are running it from an IDE or compilation tool such as sbt, then
the jar is not package when running,
if you have package it once before, then your /path/to/jar/Sample.jar exist, thus giving the hard coded path works, but this is not the .class that the jvm runing your app is using and it cannot find.

Related

How to run Spark processes in develop environment using a cluster?

I'm implementing differents Apache Spark solutions using IntelliJ IDEA, Scala and SBT, however, each time that I want to run my implementation I need to do the next steps after creating the jar:
Amazon: To send the .jar to the master node using SSH, and then run
the command line spark-shell.
Azure: I'm using Databricks CLI, so each time that I want to upload a
jar, I uninstall the old library, remove the jar stored in the cluster,
and finally, I upload and install the new .jar.
So I was wondering if it is possible to do all these processes just in one click, using the IntelliJ IDEA RUN button for example, or using another method to make simpler all of it. Also, I was thinking about Jenkins as an alternative.
Basically, I'm looking for easier deployment options.

Spring-boot-devtools not restarted when change groovy-files

In my project I work with spring-boot and groovy. Also I using spring-boot-devtool. This is a good tool that helps me.
But, when I change groovy-files, server not restarted, I have to change other files
Like a IDE I use a IDEA. The project assembled and tested with the command
gradle bootRun
Has anyone experienced this and found a solution?
In my case pattern !?*.groovy was absent in Resource patterns in Compiler settings
Try with <CTRL+F9>
I am using Spring Boot 1.3.0-RELEASE with groovy only.
Maybe this auto-generated demo project may help you to compare what went wrong with your project, using Sring CLI (installed via SDKMAN):
run in command line: spring init --dependencies=devtools,web --type=gradle-project --language=groovy example
Import to IDEA
run gradle bootRun
change source and hit <CTRL+F9>

Apache Spark app workflow

How do You organize the Spark development workflow?
My way:
Local hadoop/yarn service.
Local spark service.
Intellij on one screen
Terminal with running sbt console
After I change Spark app code, I switch to terminal and run "package" to compile to jar and "submitSpark" which is stb task that runs spark-submit
Wait for exception in sbt console :)
I also tried to work with spark-shell:
Run shell and load previously written app.
Write line in shell
Evaluate it
If it's fine copy to IDE
After few 2,3,4, paste code to IDE, compile spark app and start again
Is there any way to develop Spark apps faster?
I develop the core logic of our Spark jobs using an interactive environment for rapid prototyping. We use the Spark Notebook running against a development cluster for that purpose.
Once I've prototyped the logic and it's working as expected, I "industrialize" the code in a Scala project, with the classical build lifecycle: create tests; build, package and create artifacts by Jenkins.
I found writing scripts and using :load / :copy streamlined things a bit since I didn't need to package anything. If you do use sbt I suggest you start it and use ~ package such that it automatically packages the jar when changes are made. Eventually of course everything will end up in an application jar, this is for prototyping and exploring.
Local Spark
Vim
Spark-Shell
APIs
Console
We develop ours applications using an IDE (Intellij because we code your spark's applications in Scala) using scalaTest for testing.
In those tests we use local[*] as SparkMaster in order to allow the debugging.
For integration testing we used Jenkins and we launch an "end to end" script as an Scala application.
I hope this will be useful

Packaging a Groovy application

I want to package a Groovy CLI application in a form that's easy to distribute, similar to what Java does with JARs. I haven't been able to find anything that seems to be able to do this. I've found a couple of things like this that are intended for one-off scripts, but nothing that can compile an entire Groovy application made up of a lot of separate Groovy files and resource data.
I don't necessarily need to have the Groovy standalone executable be a part of it (though that would be nice), and this is not a library intended to be used by other JVM languages. All I want is a simply packaged version of my application.
EDIT:
Based on the couple of responses I got, I don't think I was being clear enough on my goal. What I'm looking for is basically a archive format that Groovy can support. The goal here is to make this easier to distribute. Right now, the best way is to ZIP it up, have the user unzip it, and then modify a batch/shell file to start it. I was hoping to find a way to make this more like an executable JAR file, where the user just has to run a single file.
I know that Groovy compiles down to JVM-compatible byte-code, but I'm not trying to get this to run as Java code. I'm doing some dynamic addition of Groovy classes at runtime based on the user's configuration and Java won't be able to handle that. As I said in the original post, having the Groovy executable is included in the archive is kind of a nice-to-have. However, I do actually need Groovy to be executable that runs, not Java.
The Gradle Cookbook shows how to make a "fat jar" from a groovy project: http://wiki.gradle.org/display/GRADLE/Cookbook#Cookbook-Creatingafatjar
This bundles up all the dependencies, including groovy. The resulting jar file can be run on the command line like:
java -jar myapp.jar
I've had a lot of success using a combination of the eclipse Fat Jar plugin and Yet Another Java Service Wrapper.
Essentially this becomes a 'Java' problem not a groovy problem. Fat Jar is painless to use. It might take you a couple of tries to get your single jar right, but once all the dependencies are flattened into a single jar you are now off an running it at the command line with
java -jar application.jar
I then wrap these jars as a service. I often develop standalone groovy based services that perform some task. I set it up as a service on Windows server using Yet Another Java Service and schedule it using various techniques to interact with Windows services.

How can I get external jars added to the classpath of a executable jar created by GroovyWrapper

I have a simple groovy script that executes some sql and plays with the results. It runs quite happily from Eclipse when I add the SQL Server jar to the classpath. However, I now wish to hand it over to a co-worker as an executable jar.
I found the (GroovyWrapper) script which works great, so long as the script doesn't have any extra dependencies.
I can put together all the jars manually and pass them via the -cp option, which works but you can't use -cp with -jar so I needed some other solution.
I tried adding an optional parameter to the GroovyWrapper script to embed SQL Server classes but that didn't work in the end as the SQL Server classes are signed and therefore can't be copied in.
I then tried to add a Class-Path manifest entry to point to the sqljdbc4.jar in the current directory. I have done similar things previously when creating standalone jars from java without issues but for some reason it still doesn't work.
I don't want to play around with fat jars, custom class loaders or the like if I can avoid it as the script is nice and simple at the moment.
Has anyone a solution? Have I missed something obvious.

Resources