Why I take "spark-shell: Permission denied" error in Spark Setup? - apache-spark

I am new on Apache Spark. I am trying to setup Apache Spark to my Macbook.
I download file "spark-2.4.0-bin-hadoop2.7" from Apache Spark official web site.
When I try to run ./bin/spark-shell or ./bin/pyspark I get Permission denied error.
I want to just run spark on my local machine.
I also tried to give permission to all folders but it does not help.
Why do I this error?

This should solve your problem chmod +x /Users/apple/spark-2.4.0-bin-hadoop2.7/bin/*
Then you could try executing bin/pyspark (spark shell in python) or bin/spark-shell (spark shell in scala).

I solve this issue by adding /libexec folder to spark home path
set $SPARK_HOME to
/usr/local/Cellar/apache-spark/<your_spark_version>/libexec

Related

Can't find Spark Submit when using Spark shell

I installed spark and am trying to run a file 'train.py' in the directory, '/home/xxx/Desktop/BD_Project', in shell using the following command:
$SPARK_HOME/bin/spark-submit /home/xxx/Desktop/BD_Project/train.py > output.txt
My teammates who used the same page that I did for spark installations have no problem when running this. However, it throws up the following error for me:
bash: /bin/spark-submit: No such file or directory
You need to set your SPARK_HOME to where your spark is installed, typically its in /usr/local/spark/bin/bin/spark-submit
Before you set it make sure where spark is installed by going to the directory.
You can set it like this before running your command :
export SPARK_HOME=/usr/local/spark/bin/bin/spark-submit
If you are homebrew user, setting your SPARK_HOME to
/opt/homebrew/Cellar/apache-spark/3.3.1/libexec"
would solve. Sorry for too late responding. Hoping this would help someone with this odd error.

What is the use of winutils.exe?

I am running apache spark on windows(locally) using intellij.
I chose enableHiveSupport while creating spark session object.
I converted a dataframe into temp view and ran some queries.
Initially I got an error that tmp/hive does not exist. So I created one on the C: drive.
Then I got an error that tmp/hive is not writable.
So I changed the permissions in the file properties. But I still got the same error.
After researching I found the solution i.e use winutils.exe to change the permissions.
So what exactly is winutils.exe? Where is it used spark? the tmp/hive/username was empty after I ran the application.
Thank you
I advise you run on linux, but if using Windows for Spark accessing Hadoop on Windows, then cmd> winutils.exe chmod -R 777 D:\tmp\hive allows you to read and write to this pseudo Hadoop.

Unable to launch spark using spark-shell

I am trying to set up SPARK2 on my cloudera cluster. For that, I have JDK1.8:
I have installed scala 2.11.8 using the rpm file:
I have downloaded, extracted the spark version 2.2.0 on my home directory: /home/cloudera.
I made changes to the PATH variable in .bashrc as below:
But when I try to execute spark-shell from the home directory: /home/cloudera, it says no such file or directory which can be seen below:
[cloudera#quickstart ~]$ spark-shell
/home/cloudera/spark/bin/spark-class: line 71: /usr/java/jdk1.7.0_67-cloudera/bin/java: No such file or directory
[cloudera#quickstart ~]$
Could anyone let me know how can I fix the problem and configure it properly ?
Java/JVM applications (and spark-shell in particular) uses java binary to launch itself. Therefore they need to know where it is located, which is usually done via JAVA_HOME environment variable.
In your case it's not reset explicitely and value from Clauder's default one Java distribution is used (even if it points to empty location).
You need to set JAVA_HOME pointing to correct java distribution directory for the user under which you want to launch spark-shell and other application.

"Cannot find hadoop installation : $HADOOP_HOME .. " getting this error while trying to run hive on spark.

I have followed this https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started#HiveonSpark:GettingStarted-Configurationpropertydetails
Have executed:
set spark.home=/location/to/sparkHome;
set hive.execution.engine=spark;
set spark.master= Spark-Master-URL
However, on running ./hive i am getting the above error:-
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must
be set or hadoop must be in the path
I do not have Hadoop installed, and want to run hive on top of spark running on standalone.
Is it mandatory that i need to have HADOOP set up to run hive over spark?
IMHO hive cannot run without the hadoop. There may be VM's which have pre installed everything. Hive will run on top of Hadoop. So First you need to install Hadoop and then you can try hive.
Please refer this https://stackoverflow.com/a/21339399/5756149.
Anyone Correct me if I am wrong

Copying the Apache Spark installation folder to another system will work properly?

I am using Apache Spark. Working in cluster properly with 3 machines. Now I want to install Spark on another 3 machines.
What I did: I tried to just copy the folder of Spark, which I am using currently.
Problem: ./bin/spark-shell and all other spark commands are not working and throwing error 'No Such Command'
Question: 1. Why it is not working?
Is it possible that I just build Spark installation for 1 machine and then from that installation I can distribute it to other machines?
I am using Ubuntu.
We were looking into problem and found that Spark Installation Folder , which was copied, having the .sh files but was not executable. We just make the files executable and now spark is running.
Yes, It would work but should ensure that you have set all the environment variables required for spark to work.
like SPARK_HOME, WEBUI_PORT etc...
also use hadoop integrated spark build which comes with the supported versions of hadoop.

Resources