Error in running livy spark server in hue - apache-spark

When I run following command
hue livy_server
Following error is shown
Failed to run spark-submit executable: java.io.IOException: Cannot run program "spark-submit": error=2, No such file or directory
I have set SPARK_HOME=/home/amandeep/spark

If you run Livy on local mode it will except to find the spark-submit script in its environment. Check your shell PATH variable.

Related

Running Python app on Spark with Conda dependencies

I am trying to run a Python script in Spark. I am running Spark in client mode (i.e. single node) with a Python script that has some dependencies (e.g. pandas) installed via Conda. There are various resources which cover this usage case, for example:
https://conda.github.io/conda-pack/spark.html
https://databricks.com/blog/2020/12/22/how-to-manage-python-dependencies-in-pyspark.html
Using those as an example I run Spark via the following command in the Spark bin directory, where /tmp/env.tar is the Conda environment packed by conda-pack:
export PYSPARK_PYTHON=./environment/bin/python
./spark-submit --archives=/tmp/env.tar#environment script.py
Spark throws the following exception:
java.io.IOException: Cannot run program "./environment/bin/python": error=2, No such file or directory
Why does this not work? I am curious also about the ./ in the Python path as it's not clear where Spark unpacks the tar file. I assumed I did not need to load the tar file into HDFS since this is all running on a single node (but perhaps I do for cluster mode?).

Run spark from source code on Windows - no such file or directory error

I would like to run Spark from source code on my Windows machine. I did the following steps:
git clone https://github.com/apache/spark
Added the SPARK_HOME variable into the user variables.
Added %SPARK_HOME%\bin to the PATH variable.
./build/mvn -DskipTests clean package
./bin/spark-shell
The last command returns the following error:
What should I do to fix the error?
First, refer to the link below for the solution. The top voted answer gave me the working script for this problem.
: Failed to start master for Spark in Windows
The reason is that spark launch scripts do not support Windows. The spark documentation (https://spark.apache.org/docs/1.2.0/spark-standalone.html) insists you to start the master and workers manually if you are a Windows user. So you need to first run the master and then run spark-shell.

spark-shell in windows 7 is throwing \spark-2.4.2-bin-hadoop2.7\bin\..' was unexpected at this time

I had downloaded spark 2.4.2 distribution and placed it under C:\Program Files (x86). I had also set SPARK_HOME and added SPARK_HOME to path. Now when I trying to open spark-shell using command prompt it is throwing following error:
C:\Users\Admin>spark-shell
\spark-2.4.2-bin-hadoop2.7\bin..' was unexpected at this time.
I am expecting it to launch spark shell

Unable to access pyspark

when I am executing command bin/pyspark from the path where spark has been installed I am getting the error
-bash: bin/pyspark: No such file or directory
I tried to build spark using command sbt/sbt assembly. This one also went on error with message
-bash: sbt/sbt: No such file or directory
I am able to access scala by executing command bin/spark-shell from the same path.
I am using Red Hat 4.4.5-6.
Below is a screen shot of bin folder and error messages.

Why does spark-submit and spark-shell fail with "Failed to find Spark assembly JAR. You need to build Spark before running this program."?

I was trying to run spark-submit and I get
"Failed to find Spark assembly JAR.
You need to build Spark before running this program."
When I try to run spark-shell I get the same error.
What I have to do in this situation.
On Windows, I found that if it is installed in a directory that has a space in the path (C:\Program Files\Spark) the installation will fail. Move it to the root or another directory with no spaces.
Your Spark package doesn't include compiled Spark code. That's why you got the error message from these scripts spark-submit and spark-shell.
You have to download one of pre-built version in "Choose a package type" section from the Spark download page.
Try running mvn -DskipTests clean package first to build Spark.
If your spark binaries are in a folder where the name of the folder has spaces (for example, "Program Files (x86)"), it didn't work. I changed it to "Program_Files", then the spark_shell command works in cmd.
In my case, I install spark by pip3 install pyspark on macOS system, and the error caused by incorrect SPARK_HOME variable. It works when I run command like below:
PYSPARK_PYTHON=python3 SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark python3 wordcount.py a.txt
Go to SPARK_HOME. Note that your SPARK_HOME variable should not include /bin at the end. Mention it when you're when you're adding it to path like this: export PATH=$SPARK_HOME/bin:$PATH
Run export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g" to allot more memory to maven.
Run ./build/mvn -DskipTests clean package and be patient. It took my system 1 hour and 17 minutes to finish this.
Run ./dev/make-distribution.sh --name custom-spark --pip. This is just for python/pyspark. You can add more flags for Hive, Kubernetes, etc.
Running pyspark or spark-shell will now start pyspark and spark respectively.
If you have downloaded binary and getting this exception
Then please check your Spark_home path may contain spaces like "apache spark"/bin
Just remove spaces will works.
Just to add to #jurban1997 answer.
If you are running windows then make sure that SPARK_HOME and SCALA_HOME environment variables are setup right. SPARK_HOME should be pointing to {SPARK_HOME}\bin\spark-shell.cmd
For Windows machine with the pre-build version as of today (21.01.2022):
In order to verify all the edge cases you may have and avoid tedious guesswork about what exactly is not configred properly:
Find spark-class2.cmd and open it in with a text editor
Inspect the arguments of commands staring with call or if exists by typing the arguments in Command Prompt like this:
Open Command Prompt. (For PowerShell you need to print the var another way)
Copy-paste %SPARK_HOME%\bin\ as is and press enter.
If you see something like bin\bin in the path displayed now then you have appended /bin in your environment variable %SPARK_HOME%.
Now you have to add the path to the spark/bin to your PATH variable or it will not find spark-submit command
Try out and correct every path variable that the script in this file uses and and you should be good to go.
After that enter spark-submit ... you may now encounter the missing hadoop winutils.exe for which problem you can go get the tool and paste it where the spark-submit.cmd is located
Spark Installation:
For Window machine:
Download spark-2.1.1-bin-hadoop2.7.tgz from this site https://spark.apache.org/downloads.html
Unzip and Paste your spark folder in C:\ drive and set environment variable.
If you don’t have Hadoop,
you need to create Hadoop folder and also create Bin folder in it and then copy and paste winutils.exe file in it.
download winutils file from [https://codeload.github.com/gvreddy1210/64bit/zip/master][1]
and paste winutils.exe file in Hadoop\bin folder and set environment variable for c:\hadoop\bin;
create temp\hive folder in C:\ drive and give the full permission to this folder like:
C:\Windows\system32>C:\hadoop\bin\winutils.exe chmod 777 /tmp/hive
open command prompt first run C:\hadoop\bin> winutils.exe and then navigate to C:\spark\bin>
run spark-shell

Resources