How does spark choose nodes to run executors?(spark on yarn) - apache-spark

How does spark choose nodes to run executors?(spark on yarn)
We use spark on yarn mode, with a cluster of 120 nodes.
Yesterday one spark job create 200 executors, while 11 executors on node1,
10 executors on node2, and other executors distributed equally on the other nodes.
Since there are so many executors on node1 and node2, the job run slowly.
How does spark select the node to run executors?
according to yarn resourceManager?

As you mentioned Spark on Yarn:
Yarn Services choose executor nodes for spark job based on the availability of the cluster resource. Please check queue system and dynamic allocation of Yarn. the best documentation https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/

Cluster Manager allocates resources across the other applications.
I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.
You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/

Are all 120 nodes having identical capacity?
Moreover the jobs will be submitted to a suitable node manager based on the health and resource availability of the node manager.
To optimise spark job, You can use dynamic resource allocation, where you do not need to define the number of executors required for running a job. By default it runs the application with the configured minimum cpu and memory. Later it acquires resource from the cluster for executing tasks. It will release the resources to the cluster manager once the job has completed and if the job is idle up to the configured idle timeout value. It reclaims the resources from the cluster once it starts again.

Related

Does executors in spark nd application master in yarn do the same job?

Does executors in spark nd application master in yarn do the same
In Spark, there is a Driver and Executors. I'm not gonna go into detail of what driver and executors are but in a one-liner, the driver manages the job flow and schedules tasks, and Executors are worker nodes processes in charge of running individual tasks.
YARN is basically a resource manager which allocates memory to compute engines. Now, this compute engine can be Spark/Tez/Map-reduce. What you need to understand here is when YARN successfully allocates memory they are called containers.
Now when Spark Job is deployed in YARN, Assuming that YARN has sufficient memory for the spark job to run, Yarn first allocates resources as containers for Spark Application Master which will have the driver program (in case of cluster mode). This Application Master will further requests resources for spark executors which YARN will further allocate as containers. So spark job will have multiple containers, one for the driver program and n containers for n executors. So you see in the computing sense the fundamental difference between spark running in a spark cluster and spark running in YARN is the use of containers.
So executors and application master in YARN run inside containers and do the same thing as spark on spark clusters.

Spark job on Kubernetes Under Resource Starvation Wait Indefinitely For SPARK_MIN_EXECUTORS

I am using Spark 3.0.1 and working on a project spark deployment on Kubernetes where Kubernetes acting cluster manager for spark job and spark submits the job using client mode. In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum number of executors , the executors goes in Pending State for indefinite time until the resource gets free.
Suppose, Cluster Configurations are:
total Memory=204Gi
used Memory=200Gi
free memory= 4Gi
SPARK.EXECUTOR.MEMORY=10G
SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
Here job should not be submitted as executors allocated are less than MIN_EXECUTORS.
How can driver abort the job in this scenario?
Firstly would like to mention that, spark dynamic allocation not supported for kubernetes yet(as of version 3.0.1), its in pipeline for future release Link
while for the requirement you have posted, you could address by running a resource monitor code snippet before the job initialized and terminate the initialization pod itself with error.
if you want to run this from CLI you could use kubectl describe nodes/ kube-capacity utility to monitor the resources

GCP Dataproc : CPUs and Memory for Spark Job

I am completely new to GCP. Is it the user who must manage the amount of allocated memory for the driver and workers and the number of CPUs to run the Spark job in a Dataproc cluster? If yes, what are the aspects of Elasticity for Dataproc usage?
Thank you.
Usually you don't, resources of a Dataproc cluster are managed by YARN, Spark jobs are automatically configured to make use of them. In particular, Spark dynamic allocation is enabled by default. But your application code still matters, e.g., you need to specify the appropriate number of partitions.

How does spark dynamic resource allocation work on YARN (with regards to NodeManagers)?

Let's assume that I have 4 NM and I have configured spark in yarn-client mode. Then, I set dynamic allocation to true to automatically add or remove a executor based on workload. If I understand correctly, each Spark executor runs as a Yarn container.
So, if I add more NM will the number of executors increase ?
If I remove a NM while a Spark application is running, something will happen to that application?
Can I add/remove executors based on other metrics ? If the answer is yes, there is a function, preferably in python,that does that ?
If I understand correctly, each Spark executor runs as a Yarn container.
Yes. That's how it happens for any application deployed to YARN, Spark including. Spark is not in any way special to YARN.
So, if I add more NM will the number of executors increase ?
No. There's no relationship between the number of YARN NodeManagers and Spark's executors.
From Dynamic Resource Allocation:
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.
As you may have guessed correctly by now, it is irrelevant how many NMs you have in your cluster and it's by the workload when Spark decides whether to request new executors or remove some.
If I remove a NM while a Spark application is running, something will happen to that application?
Yes, but only when Spark uses that NM for executors. After all, NodeManager gives resources (CPU and memory) to a YARN cluster manager that will in turn give them to applications like Spark applications. If you take them back, say by shutting the node down, the resource won't be available anymore and the process of a Spark executor simply dies (as any other process with no resources to run).
Can I add/remove executors based on other metrics ?
Yes, but usually it's Spark job (no pun intended) to do the calculation and requesting new executors.
You can use SparkContext to manage executors using killExecutors, requestExecutors and requestTotalExecutors methods.
killExecutor(executorId: String): Boolean Request that the cluster manager kill the specified executor.
requestExecutors(numAdditionalExecutors: Int): Boolean Request an additional number of executors from the cluster manager.
requestTotalExecutors(numExecutors: Int, localityAwareTasks: Int, hostToLocalTaskCount: Map[String, Int]): Boolean Update the cluster manager on our scheduling needs.

Is it possible to run multiple Spark applications on a mesos cluster?

I have a Mesos cluster with 1 Master and 3 slaves (with 2 cores and 4GB RAM each) that has a Spark application already up and running. I wanted to run another application on the same cluster, as the CPU and Memory utilization isn't high. Regardless, when I try to run the new Application, I get the error:
16/02/25 13:40:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
I guess the new process is not getting any CPU as the old one occupies all 6.
I have tried enabling dynamic allocation, making the spark app Fine grained. Assigning numerous combinations of executor cores and number of executors. What I am missing here? Is it possible to run a Mesos Cluster with multiple Spark Frameworks at all?
You can try setting spark.cores.max to limit the number of CPUs used by each Spark driver, which will free up some resources.
Docs: https://spark.apache.org/docs/latest/configuration.html#scheduling

Resources