issue exporting path to spark home on ubuntu - apache-spark

I’m installing spark, and pyspark on my ubuntu server. I’m trying to set my SPARK_HOME path on ubuntu server and I’m getting the error below. Does anyone see what the issue might be?
code:
export SPARK_HOME='/home/username/spark-2.4.3-bin-hadoop2.7'
export PATH$SPARK_HOME:$PATH
output:
-bash: export: `PATH/home/username/spark-2.4.3-bin-hadoop2.7:/home/username/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin': not a valid identifier

export PATH=$SPARK_HOME:$PATH
Missing equalTo..

Related

Wrong JAVA_HOME in hadoop for spark-shell

I needed to install Hadoop in order to have Spark running on my WSL2 Ubuntu for school projects. I installed Hadoop 3.3.1 and Spark 3.2.1 follow those two tutorials :
Hadoop Tutorial on Kontext.tech
Spark Tutorial on Kontext.tech
I correctly set up env variables in my .bashrc :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
export PATH=$PATH:$JAVA_HOME
export HADOOP_HOME=~/hadoop/hadoop-3.3.1
export SPARK_HOME=~/hadoop/spark-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:/usr/local/hadoop/bin/
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$SPARK_HOME/bin:$PATH
# Configure Spark to use Hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
As well as the ~/hadoop/spark-3.2.1/conf/spark-env.sh.template :
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/
However when I launch spark-shell, I get this error :
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 71: /usr/lib/jvm/java-8-openjdk-amd64/jre/bin//bin/java: No such file or directory
/home/adrien/hadoop/spark-3.2.1/bin/spark-class: line 96: CMD: bad array subscript
There seems to be a mess up in a redefinition of the $PATH variable but I can't figure out where it can be. Can you help me solve it please ? I don't know Hadoop and know spark well but I never had to install them.
First, certain Spark packages come with Hadoop, so you don't need to download them separately. More specifically, Spark is built against Hadoop 3.2 for now, so using the latest version might cause its own problems
For your problem, JAVA_HOME should not end in /bin or /bin/java. Check the linked post again...
If you used apt install for java, you shouldn't really need to set JAVA_HOME or the PATH for Java, either, as the package manager will do this for you. Or you can use https://sdkman.io
Note: Java 11 is preferred
You also need to remove .template from any config files for them to actually be used... However, JAVA_HOME is automatically detected by spark-submit, so it's completely optional in spark-env.sh
Same applies for hadoop-env.sh
Also remove /usr/local/hadoop/bin/ from your PATH since it doesn't appear you've put anything in that location

Can't find Spark Submit when using Spark shell

I installed spark and am trying to run a file 'train.py' in the directory, '/home/xxx/Desktop/BD_Project', in shell using the following command:
$SPARK_HOME/bin/spark-submit /home/xxx/Desktop/BD_Project/train.py > output.txt
My teammates who used the same page that I did for spark installations have no problem when running this. However, it throws up the following error for me:
bash: /bin/spark-submit: No such file or directory
You need to set your SPARK_HOME to where your spark is installed, typically its in /usr/local/spark/bin/bin/spark-submit
Before you set it make sure where spark is installed by going to the directory.
You can set it like this before running your command :
export SPARK_HOME=/usr/local/spark/bin/bin/spark-submit
If you are homebrew user, setting your SPARK_HOME to
/opt/homebrew/Cellar/apache-spark/3.3.1/libexec"
would solve. Sorry for too late responding. Hoping this would help someone with this odd error.

Pyspark command not recognized (Ubuntu)

I have successfully installed pyspark using anaconda and configured paths in the .bashrc file.
Post typing pyspark command, it opens Jupyter-notebook in which python code is working properly. Like, print "Hello" etc.
But when I execute the Pyspark commands like collect(), take(5) etc, it gives an error that "Cannot run program '/usr/bin/Python-3.7.4". Permission denied.
It is referring wrong directory, as Python-3.7.4 is installed in the Anaconda directory.
Is there any configuration/step, I need to perform to resolve this issue?
Try to update PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables to the correct Python 3 distribution path

NEO4J : ERROR: Unable to find java. (Cannot execute /usr/lib/jvm/java-7-oracle/jre/bin/java/bin/java)

I am running into problems with Neo4J server under UBUNTU 16.04.
I wanted to install the version 2.3.3 of neo4j. Now when I tried to start the server, I got this error:
➜ ~ sudo /var/lib/neo4j/bin/neo4j start
ERROR: Unable to find java. (Cannot execute /usr/lib/jvm/java-7-oracle/jre/bin/java/bin/java)
* Please use Oracle(R) Java(TM) 7 or OpenJDK(TM) to run Neo4j Server.
* Please see http://docs.neo4j.org/ for Neo4j Server installation instructions.
The thing is that my JAVA_HOME is not the path given by the error:
➜ ~ echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle/
Any idea about the root cause of this issue?
When you run it as 'sudo' you are running it as the root user, not your user. As a result your JAVA_HOME path (which is your user environment) won't apply.

Unable to start beeline client

I installed spark-1.5.1-bin-without-hadoop and trying to start beeline using the following command from spark install directory.
./bin/beeline
I get "Error: Could not find or load main class org.apache.hive.beeline.BeeLine".
Not sure why the classpath is not working. I ran into same issue and ended up running java with the jars under lib_managed directory. Note that verbose option is used because no errors are shown in some NoClassDef cases.
java -cp lib_managed/jars/hive-exec-1.2.1.spark.jar:lib_managed/jars/hive-metastore-1.2.1.spark.jar:lib_managed/jars/httpcore-4.3.1.jar:lib_managed/jars/httpclient-4.3.2.jar:lib_managed/jars/libthrift-0.9.2.jar:lib_managed/jars/hive-beeline-1.2.1.spark.jar:lib_managed/jars/jline-2.12.jar:lib_managed/jars/commons-cli-1.2.jar:lib_managed/jars/super-csv-2.2.0.jar:lib_managed/jars/commons-logging-1.1.3.jar:lib_managed/jars/hive-jdbc-1.2.1.spark.jar:lib_managed/jars/hive-cli-1.2.1.spark.jar:lib_managed/jars/hive-service-1.2.1.spark.jar:assembly/target/scala-2.10/spark-assembly-1.5.3-SNAPSHOT-hadoop2.2.0.jar org.apache.hive.beeline.BeeLine -u jdbc:hive2://<thrift server public address>:10000/default --verbose=true
I had exactly same problem. For me setting SPARK_HOME environment variable did it!
export SPARK_HOME=/Users/../Downloads/spark-2.1.1-bin-hadoop2.7
This is because if you actually open and see "bin/beeline" script file, you'll find this line:
Figure out if SPARK_HOME is set
So, after setting SPARK_HOME to proper location, beeline started working fine.

Resources