Spark standalone master HA jobs in WAITING status - apache-spark

We are trying to setup HA on spark standalone master using zookeeper.
We have two zookeeper hosts which we are using for spark ha as well.
Configured following thing in spark-env.sh
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=zk_server1:2181,zk_server2:2181"
Started both the masters.
started shell and status of the job is RUNNING.
master1 is in ALIVE and master2 is in STANDBY status.
Killed the master1 and master2 has been picked up and all the workers appeared alive in master2.
The shell which is already running has been moved to new master. However, the status is in WAITING status and executors are in LOADING status.
No error in worker log and executor log, except notification that connected to new master.
I could see the worker re-registered, but the executor does not seems to be started. Is there any thing that i am missing.?
My spark version is 1.5.0

Related

On kubernetes my spark worker pod is trying to access thrift pod by name

Okay. Where to start? I am deploying a set of Spark applications to a Kubernetes cluster. I have one Spark Master, 2 Spark Workers, MariaDB, a Hive Metastore (that uses MariaDB - and it's not a full Hive install - it's just the Metastore), and a Spark Thrift Server (that talks to Hive Metastore and implements the Hive API).
So this setup is working pretty well for everything except the setup of the Thrift Server job (start-thriftserver.sh in the Spark sbin directory on the thrift server pod). By working well I say that outside my cluster I can create spark jobs and submit them to master and then using the Web UI I can see my code test app ran to completion utilizing both workers.
Now the problem. When you launch the start-thriftserver.sh it submits a job to the cluster with itself as the driver (I believe - which is correct behavior). And when I look at the related spark job via the WebUI I see it has workers and they repeatedly get hatched and then exit shortly therafter. When I look at the workers' stderr logs I see that every worker launches and tries to connect back to the thrift server pod at the spark.driver.port. This is correct behavior I believe. The gotcha is that connection fails because it says unknown host exception and it uses a kubernetes raw pod name (not a service name and with no IP in the name) of the thrift server pod to say it can't find the thrift server that initiated the connection. Now Kubernetes DNS stores service names and then only pod names as prefaced with their private IP. In other words the raw name of the pod (without an IP) is never registered with the DNS. That is not how kubernetes works.
So my question. I am struggling to figure out why the spark worker pod is using a raw pod name to try to find the thrift server. It seems it should never do this and that it should be impossible to ever satisfy that request. I have wondered if there is some spark config setting that would tell the workers that the (thrift) driver it needs to be searching for is actually spark-thriftserver.my-namespace.svc. But I can't find anything having done much searching.
There are so many settings that go into a cluster like this that I don't want to barrage you with info. One thing that might clarify my setup: the following string is dumped at the top of a worker log that fails. Notice the raw pod name of the thrift server for driver-url. If anyone has any clue what steps to take to fix this please let me know. I'll edit this post and share settings etc as people request them. Thanks for helping.
Spark Executor Command: "/usr/lib/jvm/java-1.8-openjdk/jre/bin/java" "-cp" "/spark/conf/:/spark/jars/*" "-Xmx512M" "-Dspark.master.port=7077" "-Dspark.history.ui.port=18081" "-Dspark.ui.port=4040" "-Dspark.driver.port=41617" "-Dspark.blockManager.port=41618" "-Dspark.master.rest.port=6066" "-Dspark.master.ui.port=8080" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#spark-thriftserver-6bbb54768b-j8hz8:41617" "--executor-id" "12" "--hostname" "172.17.0.6" "--cores" "1" "--app-id" "app-20220408001035-0000" "--worker-url" "spark://Worker#172.17.0.6:37369"

Spark app fails after ACCEPTED state for a long time. Log says Socket timeout exception

I have Hadoop 3.2.2 running on a cluster with 1 name node, 2 data nodes and 1 resource manager node. I tried to run the sparkpi example on cluster mode. The spark-submit is done from my local machine. YARN accepts the job but the application UI says
this. Further in the terminal where I submitted the job it says
2021-06-05 13:10:03,881 INFO yarn.Client: Application report for application_1622897708349_0001 (state: ACCEPTED)
This continues to print until it fails. Upon failure it prints
I tried increasing the spark.executor.heartbeatInterval to 3600 secs. Still no luck. I also tried running the code from namenode thinking there must be some connection issue with my local machine. Still I'm unable to run it
found the answer albeit I don't know why it works! Adding the private IP address to the security group in AWS did the trick.

Why does start-slave.sh require master URL?

I'm wondering why the client, using apache-spark/sbin/start-slave.sh <master's URL> has to indicate this master's URL, since the master already indicates it in : apache-spark/sbin/start-master.sh --master spark://my-master:7077e.g. ?
Is it because the client must wait for the master to receive the submit sent by the master ? If yes : then why the master must specify --master spark://.... in its submit ?
start-slave.sh <master's URL> starts a standalone Worker (formerly a slave) that the standalone Master available at <master's URL> uses to offer resources to Spark applications.
Standalone Master manages workers and it's workers to register themselves with a master and give CPU and memory for resource offering.
From Starting a Cluster Manually:
You can start a standalone master server by executing:
./sbin/start-master.sh
Once started, the master will print out a spark://HOST:PORT URL for itself, which you can use to connect workers to it, or pass as the “master” argument to SparkContext. You can also find this URL on the master’s web UI, which is http://localhost:8080 by default.
Similarly, you can start one or more workers and connect them to the master via:
./sbin/start-slave.sh <master-spark-URL>
since the master already indicates it in : apache-spark/sbin/start-master.sh --master spark://my-master:7077
You can specify the URL of the standalone Master that defaults to spark://my-master:7077, but that's not announced on the network so anyone could know the URL (unless specified on command line).
why the master must specify --master spark://.... in its submit
It does not. Standalone Master and submit are different "tools", i.e. the former is a cluster manager for Spark applications while the latter is to submit Spark applications to a cluster manager for execution (that could be on any of the three supported cluster managers: Spark Standalone, Apache Mesos and Hadoop YARN).
See Submitting Applications.

shut down local client of hazelcast exector service

We are using a hazelcast executor service to distribute tasks across our cluster of servers.
We want to shut down one of our servers and take it out of the cluster but allow it to continue working for a period to finish what it is doing but not accept any new tasks from the hazelcast executor service.
I don't want to shut down the hazelcast instance because the current tasks may need it to complete their work.
Shutting down the hazelcast executor service is not what I want. That shuts down the executor cluster-wide.
I would like to continue processing the tasks in the local queue until it is empty and then shut down.
Is there a way for me to let a node in the cluster continue to use hazelcast but tell it to stop accepting new tasks from the executor service?
Not that easily, however you have member attributes (Member::setX/::getX) and you could set an attribute to signal "no new tasks please" and when you submit a tasks you either preselect a member to execute on based on the attribute or you use the overload with the MemberSelector.

Apache Spark behavior when a node in a cluster fails.

What's the behavior when a partition is sent to a node and the node crashes right before executing a job? If a new node is introduced into the cluster, what's the entity that detects the addition of this new machine? Does the new machine get assigned the partition that didn't get processed?
The master considers the worker to be failure if it didnt receive the heartbeat message for past 60 sec (according to spark.worker.timeout). In that case the partition is assigned to another worker(remember partitioned RDD can be reconstructed even if its lost).
For the question if the new node is introduced into cluster? the spark-master will not detect the new node addition to the cluster once the slaves are started, because before application-submit in cluster the sbin/start-master.sh starts the master and sbin/start-slaves.sh reads the conf/slaves file (contains IP address of all slaves) in spark-master machine and starts a slave instance on each machine specified. The spark-master will not read this configuration file after being started. so its not possible to add a new node once all slaves started.

Resources