Spark with Yarn: Point in providing spark-resource related parameters? - apache-spark

I am reading through literature about Spark & Resource Management i.e. Yarn in my case.
I think I understood the basic concept and how Yarn encapsulates Spark Master/Workers in containers.
Is there any point in still providing resource-parameters such as --driver-memory, --executor-memory or --number-executors? Shouldn't the Yarn-application-master(spark-master) figure out the demand and request accordingly new resources?
Or is it wise to interfere in the resource negotiation process by providing this parameters?

Spark needs to negotiate the resources from YARN. Providing the resource-parameters tells Spark how many resources to request from YARN.
For executors on YARN:
Spark applications use a fixed number of executors (default = 2).
The --num-executors flag for spark-submit, spark-shell, etc. sets the number of executors as expected.
For memory management on YARN:
Set the memory used by each executor using --executor-memory.
Setting --executor-cores tells Spark how many cores to claim from YARN.
Set the amount of memory for the driver process with --driver-memory.
Some general Spark-on-YARN notes:
Use the --queue option if your YARN cluster schedules application into queues.
Spark is optimized for in-memory computation, so ask YARN for a smaller number of memory-heavy executors (with multiple cores and more memory). Be careful if you have set memory caps within YARN.
The Spark on YARN Documentation has more details.

Related

How do the yarn and spark parameters interplay together?

There are parameters that decide the maximum, minimum and total of the memory and cpu that yarn can allocate via containers
example:
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.minimum-allocation-mb
yarn.nodemanager.resource.cpu-vcores
yarn.scheduler.maximum-allocation-vcores
yarn.scheduler.minimum-allocation-vcores
There are also spark side parameters that seemingly would control similar kind of allocations:
spark.executor.instances
spark.executor.memory
spark.executor.cores
etc
What happens when the two set of parameters are infeasible according to the bounds set by the other. For example: What if yarn.scheduler.maximum-allocation-mb is set to 1G and the spark.executor.memory is set to 2G? Similar conflicts and infeasibilities could be imagined for the other parameters as well.
What happens in such cases? And, what is the suggested way to set these parameters?
When running Spark on YARN, each Spark executor runs as a YARN container
So take spark.executor.memory as an example:
If spark.executor.memory is 2G and yarn.scheduler.maximum-allocation-mb is 1G, then your container will be OOM killer
If spark.executor.memory is 2G and yarn.scheduler.minimum-allocation-mb is 4G, then your container is much bigger than needed by the Spark application
Suggestions for setting parameters depend on your hardware resources and other services running on this machine. You can try to use the default value first, and then make adjustments by monitoring machine resources
This excellent https://community.cloudera.com/t5/Support-Questions/Yarn-container-size-flexible-to-satisfy-what-application-ask/m-p/115458 and Difference between `yarn.scheduler.maximum-allocation-mb` and `yarn.nodemanager.resource.memory-mb`? should give you the basics. Additionally, here is a good SO-related answer Spark on YARN resource manager: Relation between YARN Containers and Spark Executors.
TL;DR
As you are not talking about Kubernetes, then YARN as Resource / Cluster Mgr allocates Executors with needed resouces, based on Spark params / defaults that are allocated based on those YARN params for the Containers.
1 Container = 1 Executor. Some state incorrectly, 1 Container N Executors, not so.
There is minimum allocation and max allocation of resources, based on those YARN params. So,YARN will provide Executors with some wastage of resources, if it can - or of restricted size.
If non-dynamic YARN resource allocation, then Apps start with less resources, else there will be a wait to get all resources, and those acquired are not available for others.
There is also a fair scheduler for more smooth, uniform throughput for many concurrent apps.

Spark Standalone vs YARN

What features of YARN make it better than Spark Standalone mode for multi-tenant cluster running only Spark applications? Maybe besides authentication.
There are a lot of answers at Google, pretty much of them sounds wrong to me, so I'm not sure where is the truth.
For example:
DZone, Deep Dive Into Spark Cluster Management
Standalone is good for small Spark clusters, but it is not good for
bigger clusters (there is an overhead of running Spark daemons —
master + slave — in cluster nodes)
But other cluster managers also require running agents on cluster nodes. I.e. YARN's slaves are called node managers. They may consume even more memory than Spark's slaves (Spark default is 1 GB).
This answer
The Spark standalone mode requires each application to run an executor
on every node in the cluster; whereas with YARN, you choose the number
of executors to use
agains Spark Standalone # executor/cores control, that shows how you can specify number of consumed resources at Standalone mode.
Spark Standalone Mode documentation
The standalone cluster mode currently only supports a simple FIFO
scheduler across applications.
Against the fact Standalone mode can use Dynamic Allocation, and you can specify spark.dynamicAllocation.minExecutors & spark.dynamicAllocation.maxExecutors. Also I haven't found a note about Standalone doesn't support FairScheduler.
This answer
YARN directly handles rack and machine locality
How does YARN may know anything about data locality in my job? Suppose, I'm storing file locations at AWS Glue (used by EMR as Hive metastore). Inside Spark job I'm querying some-db.some-table. How YARN may know what executor is better for job assignment?
UPD: found another mention about YARN and data locality https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-data-locality.html. Still doesn't matter in case of S3 for example.

How does spark choose nodes to run executors?(spark on yarn)

How does spark choose nodes to run executors?(spark on yarn)
We use spark on yarn mode, with a cluster of 120 nodes.
Yesterday one spark job create 200 executors, while 11 executors on node1,
10 executors on node2, and other executors distributed equally on the other nodes.
Since there are so many executors on node1 and node2, the job run slowly.
How does spark select the node to run executors?
according to yarn resourceManager?
As you mentioned Spark on Yarn:
Yarn Services choose executor nodes for spark job based on the availability of the cluster resource. Please check queue system and dynamic allocation of Yarn. the best documentation https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/
Cluster Manager allocates resources across the other applications.
I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.
You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
Are all 120 nodes having identical capacity?
Moreover the jobs will be submitted to a suitable node manager based on the health and resource availability of the node manager.
To optimise spark job, You can use dynamic resource allocation, where you do not need to define the number of executors required for running a job. By default it runs the application with the configured minimum cpu and memory. Later it acquires resource from the cluster for executing tasks. It will release the resources to the cluster manager once the job has completed and if the job is idle up to the configured idle timeout value. It reclaims the resources from the cluster once it starts again.

How does spark dynamic resource allocation work on YARN (with regards to NodeManagers)?

Let's assume that I have 4 NM and I have configured spark in yarn-client mode. Then, I set dynamic allocation to true to automatically add or remove a executor based on workload. If I understand correctly, each Spark executor runs as a Yarn container.
So, if I add more NM will the number of executors increase ?
If I remove a NM while a Spark application is running, something will happen to that application?
Can I add/remove executors based on other metrics ? If the answer is yes, there is a function, preferably in python,that does that ?
If I understand correctly, each Spark executor runs as a Yarn container.
Yes. That's how it happens for any application deployed to YARN, Spark including. Spark is not in any way special to YARN.
So, if I add more NM will the number of executors increase ?
No. There's no relationship between the number of YARN NodeManagers and Spark's executors.
From Dynamic Resource Allocation:
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand.
As you may have guessed correctly by now, it is irrelevant how many NMs you have in your cluster and it's by the workload when Spark decides whether to request new executors or remove some.
If I remove a NM while a Spark application is running, something will happen to that application?
Yes, but only when Spark uses that NM for executors. After all, NodeManager gives resources (CPU and memory) to a YARN cluster manager that will in turn give them to applications like Spark applications. If you take them back, say by shutting the node down, the resource won't be available anymore and the process of a Spark executor simply dies (as any other process with no resources to run).
Can I add/remove executors based on other metrics ?
Yes, but usually it's Spark job (no pun intended) to do the calculation and requesting new executors.
You can use SparkContext to manage executors using killExecutors, requestExecutors and requestTotalExecutors methods.
killExecutor(executorId: String): Boolean Request that the cluster manager kill the specified executor.
requestExecutors(numAdditionalExecutors: Int): Boolean Request an additional number of executors from the cluster manager.
requestTotalExecutors(numExecutors: Int, localityAwareTasks: Int, hostToLocalTaskCount: Map[String, Int]): Boolean Update the cluster manager on our scheduling needs.

Spark in fine-grained mode hold resources even it is idle and it doesn't perform any actions

I start Spark in fine-grained mode with Mesos cluster manager.
spark-shell.sh --conf 'spark.mesos.coarse=false' --executor-memory 20g --driver-memory 5g
And i can see on Mesos UI that it doesn't use any resources which is fine. Then i perform some action and during action is performing Spark uses all cluster resources which is also fine.
But when action is done Spark still holds some CPU and memory forever.
Why Spark still need some resources if it is idle and it doesn't perform any actions and how can i release all resources if it is idle?
Try configuring e.g. spark.mesos.mesosExecutor.cores = 0.5 which limits number of cores used by each executor (in fine-grained mode).
You might consider lowering executor-memory, depending how is your job behaving.

Resources