Spark in Yarn Web UI not getting displayed - apache-spark

I am unable to view Spark history through Yarn UI(yarn web address 8088 in yarn-site.ml). Spark job completed successfully
Spark application was run in datanode shell with cluster-mode as cluster
When clicked on history it is redirecting to http://namenode:18088/history/application_1472647811761_0001/1 and it says page cannot be displayed
Hadoop Version: 2.7.0
Spark Version: 2.0.0
Cluster: one namenode and one datanode
spark-default.xml
spark.eventLog.dir=hdfs://namenode:9000/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.yarn.historyServer.address=namenode:18088
spark.history.fs.logDirectory=hdfs://namenode:9000/shared/spark-logs

Related

Error when running spark application with zeppelin

When I run the above spark application with zeppelin in Yarn cluster with cluster mode, I get the following error:
Where may be the problem? Thanks

spark.shuffle.service.enabled=true cluster.YarnScheduler: Initial job has not accepted any resources

I am trying to run a pyspark job using yarn with the spark.shuffle.service.enabled=true option but the job never completes :
Without the option, the job works well:
user#e7524bf7f996:~$ pyspark --master yarn
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0004).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
45
With the option --conf spark.shuffle.service.enabled=true
user#e7524bf7f996:~$ pyspark --master yarn --conf spark.shuffle.service.enabled=true
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0005).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
2022-02-15 15:10:14,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:29,590 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:44,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Are there other options in Spark or Yarn that should be enabled to make spark.shuffle.service.enabled work ?
I am running Spark 3.1.2 , Python 3.9.7, hadoop-3.2.1
Thank you,
Bertrand
You need to configure external shuffle service on Yarn cluster by following
Build Spark with the YARN profile. Skip this step if you are using a
pre-packaged distribution.
Locate the
spark-<version>-yarn-shuffle.jar. This should be under
$SPARK_HOME/common/network-yarn/target/scala- if you are
building Spark yourself, and under yarn if you are using a
distribution.
Add this jar to the classpath of all NodeManagers in
your cluster.
In the yarn-site.xml on each node, add spark_shuffle
to yarn.nodemanager.aux-services, then set
yarn.nodemanager.aux-services.spark_shuffle.class to
org.apache.spark.network.yarn.YarnShuffleService.
Increase
NodeManager's heap size by setting YARN_HEAPSIZE (1000 by default)
in etc/hadoop/yarn-env.sh to avoid garbage collection issues during
shuffle.
Restart all NodeManagers in your cluster.
For details, please refer https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-the-external-shuffle-service
If still not working, check below:
Check Yarn UI to ensure enough resources available.
Try --deploy-mode cluster to ensure driver could communicate with yarn cluster for scheduling
Thanks Warren for your help.
Here is the setup working for me:
https://github.com/BertrandBrelier/SparkYarn/blob/main/yarn-site.xml
echo "export YARN_HEAPSIZE=2000" >> /home/user/hadoop-3.2.1/etc/hadoop/yarn-env.sh
ln -s /home/user/spark-3.1.2-bin-hadoop3.2/yarn/spark-3.1.2-yarn-shuffle.jar /home/user/hadoop-3.2.1/share/hadoop/yarn/lib/.
echo "spark.shuffle.service.enabled true" >> /home/user/spark-3.1.2-bin-hadoop3.2/conf/spark-defaults.conf
restarting Hadoop and Spark
I was able to start a pyspark session:
pyspark --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true

Spark on YARN : Job Submitted v/s Accepted?

I am running spark job on YARN-cluster mode . What is the difference between YARN Accepted and YARN Submitted status ?
We submit the spark job using spark-submit (cluster mode YARN).
YARN submitted: Job has submitted to the YARN scheduler queue (FIFO/Fair scheduler) and waiting for its turn.
YARN accepted: YARN has started execution of the job but only application master is running, Application master has not got resources from the resource manager to run the job.

Start Spark master on the IP instead of Hostname

I'm trying to set up a remote Spark 2.4.5 cluster on Ubuntu 18. After I start ./sbin/stat-master.sh WebUI is available at <INSTANCE-IP>:8080 but it shows "Spark Master at spark://spark-master:7077" where spark-master is my hostname on the remote machine.
I'm able to start a worker with ./sbin/start-slave.sh spark://spark-master:7077 only, but <INSTANCE-IP>:4040 doesn't work. When I try ./sbin/start-slave.sh spark://<INSTANCE-IP>:7077 I can see the process but the worker is not visible in WebUI.
As a result, I can not connect to the cluster from my local machine with spark-shell --master spark://<INSTANCE-IP>:7077. The error is:
StandaloneAppClient$ClientEndpoint: Failed to connect to master <INSTANCE-IP>:7077

Running spark application in local mode

I'm trying to start my Spark application in local mode using spark-submit. I am using Spark 2.0.2, Hadoop 2.6 & Scala 2.11.8 on Windows. The application runs fine from within my IDE (IntelliJ), and I can also start it on a cluster with actual, physical executors.
The command I'm running is
spark-submit --class [MyClassName] --master local[*] target/[MyApp]-jar-with-dependencies.jar [Params]
Spark starts up as usual, but then terminates with
java.io.Exception: Failed to connect to /192.168.88.1:56370
What am I missing here?
Check which port you are using: if on cluster: log in to master node and include:
--master spark://XXXX:7077
You can find it always in spark ui under port 8080
Also check your spark builder config if you have set master already as it takes priority when launching eg:
val spark = SparkSession
.builder
.appName("myapp")
.master("local[*]")

Resources