Is it possible to run multiple Spark applications on a mesos cluster? - apache-spark

I have a Mesos cluster with 1 Master and 3 slaves (with 2 cores and 4GB RAM each) that has a Spark application already up and running. I wanted to run another application on the same cluster, as the CPU and Memory utilization isn't high. Regardless, when I try to run the new Application, I get the error:
16/02/25 13:40:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
I guess the new process is not getting any CPU as the old one occupies all 6.
I have tried enabling dynamic allocation, making the spark app Fine grained. Assigning numerous combinations of executor cores and number of executors. What I am missing here? Is it possible to run a Mesos Cluster with multiple Spark Frameworks at all?

You can try setting spark.cores.max to limit the number of CPUs used by each Spark driver, which will free up some resources.
Docs: https://spark.apache.org/docs/latest/configuration.html#scheduling

Related

Can two executors / drivers from different Spark applications run on same node in cluster mode?

I read an article in Medium which claims that the number of executors + 1 (for driver), should be a multiple of 3, to efficiently utilize the core on a machine (16 cores, in this case, i.e, 5 per executor and 1 will be reserved for OS and node manager)
I am unable to validate this statement using experimenting on the cluster due to practical reasons. Did anybody try this? or have reference to code/documentation stating Yarn nodes will/not share cluster resources between another Spark application?
It's a big question, but in short - basing on the title and YARN in the text:
You get resources allocated by YARN that you requested via Spark(submit).
A Node has many Executors.
You cannot share an Executor at the same time, but the Executor can be relinquished if YARN Dynamic Resource Allocation is in effect, after a Stage has completed.
As a Node has many Executors, many Spark Apps can run their Tasks concurrently on the same Node, Worker, if they were granted those resources.

How to know the memory require for the job in spark in java?

I'm trying to run an application in spark (2.3.1) for java,the inconvenient is that every time I try to run the spark throw a message "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resource" after a few tries (in all of those tries spark add and remove executor in the same worker but in the same port). So anyone has an idea about how to solve this?
I'm using a master in the computer A and the worker in the computer B. Set the computer A with driver memory of 3g and the worker with 2g (this is because the app doesn't require too much memory) and 4 cores to use in the executor.
I check other similar questions an most of them were network or memory issues, I discard the network issue because other application I can run with the worker.
If you can run your application with one worker ,then assign less memory to your driver.
Driver should have 1g and the worker with 4g,this should done.
Also what all transformations and actions are you doing ?

Spark job in Dataproc dynamic vs static allocation

I have a Dataproc cluster:
master - 6cores| 32g
worker{0-7} - 6cores| 32g
Maximum allocation: memory:24576, vCores:6
Have two spark-streaming jobs to submit, one after another
In the first place, I tried to submit with default configurations spark.dynamicAllocation.enabled=true
In 30% of cases, I saw that the first job caught almost all available memory and the second was queued and waited for resources for ages. (This is a streaming job which took a small portion of resources every batch ).
My second try was to change a dynamic allocation. I submitted the same two jobs with identical configurations:
spark.dynamicAllocation.enabled=false
spark.executor.memory=12g
spark.executor.cores=3
spark.executor.instances=6
spark.driver.memory=8g
Surprisingly in Yarn UI I saw:
7 Running Containers with 84g Memory allocation for the first job.
3 Running Containers with 36g Memory allocation and 72g Reserved Memory for the second job
In Spark UI there are 6 executors and driver for the first job and 2 executors and driver for the second job
After retrying(deleting previous jobs and submitting the same jobs) without dynamic allocation and same configurations, I got a totally different result:
5 containers 59g Memory allocation for both jobs and 71g Reserved Memory for the second job. In spark UI I see 4 executors and driver in both cases.
I have a couple of questions:
If dynamicAllocation=false, why the number of yarn containers is
different from the number of executors? (Firstly I thought that
additional yarn container is a driver, but it differs in memory.)
If dynamicAllocation=false, Why Yarn doesn't create containers by my
exact requirements- 6 containers(spark executors) for both jobs. Why two different attempts with the same configuration lead to different results?
If dynamicAllocation=true - how may it be possible that low consuming memory spark job takes control of all Yarn resources
Thanks
Spark and YARN scheduling are pretty confusing. I'm going to answer the questions in reverse order:
3) You should not be using dynamic allocation in Spark streaming jobs.
The issue is that Spark continuously asks YARN for more executors as long as there's a backlog of tasks to run. Once a Spark job gets an executor, it keeps it until the executor is idle for 1 minute (configurable, of course). In batch jobs, this is okay because there's generally a large, continuous backlog of tasks.
However, in streaming jobs, there's a spike of tasks at the start of every micro-batch, but executors are actually idle most of the time. So a streaming job will grab a lot of executors that it doesn't need.
To fix this, the old streaming API (DStreams) has its own version of dynamic allocation: https://issues.apache.org/jira/browse/SPARK-12133. This JIRA has more background on why Spark's batch dynamic allocation algorithm isn't a good fit for streaming.
However, Spark Structured Streaming (likely what you're using) does not support dynamic allocation: https://issues.apache.org/jira/browse/SPARK-24815.
tl;dr Spark requests executors based on its task backlog, not based on memory used.
1 & 2) #Vamshi T is right. Every YARN application has an "Application Master", which is responsible for requesting containers for the application. Each of your Spark jobs has an app master that proxies requests for containers from the driver.
Your configuration doesn't seem to match what you're seeing in YARN, so not sure what's going on there. You have 8 workers with 24g given to YARN. With 12g executors, you should have 2 executors per node, for a total of 16 "slots". An app master + 6 executors should be 7 containers per application, so both applications should fit within the 16 slots.
We configure the app master to have less memory, that's why total memory for an application isn't a clean multiple of 12g.
If you want both applications to schedule all their executors concurrently, you should set spark.executor.instances=5.
Assuming you're using structured streaming, you could also just run both streaming jobs in the same Spark application (submitting them from different threads on the driver).
Useful references:
Running multiple jobs in one application: https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
Dynamic allocation: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Spark-on-YARN: https://spark.apache.org/docs/latest/running-on-yarn.html
I have noticed similar behavior in my experience as well and here is what I observed. Firstly the resource allocation by yarn depends on available resources on cluster when the job is submitted. When both jobs are submitted at almost the same time with same config, yarn distributes the available resources equally between the jobs. Now when you throw dynamic allocation in to the mix, things get a little confusing/complex. Now in your case below:
7 Running Containers with 84g Memory allocation for the first job.
--You got 7 containers because you requested 6 executors, one container for each executor and the extra one container is for the application Master
3 Running Containers with 36g Memory allocation and 72g Reserved Memory for the second job
--Since the second job was submitted after some time, Yarn allocated the remaining resources...2 containers, one for each executor and the extra one for your application master.
Your containers will never match the executors you requested and will always be one more than the number of executors you requested because you need one container to run your application master.
Hope that answers part of your question.

How does spark choose nodes to run executors?(spark on yarn)

How does spark choose nodes to run executors?(spark on yarn)
We use spark on yarn mode, with a cluster of 120 nodes.
Yesterday one spark job create 200 executors, while 11 executors on node1,
10 executors on node2, and other executors distributed equally on the other nodes.
Since there are so many executors on node1 and node2, the job run slowly.
How does spark select the node to run executors?
according to yarn resourceManager?
As you mentioned Spark on Yarn:
Yarn Services choose executor nodes for spark job based on the availability of the cluster resource. Please check queue system and dynamic allocation of Yarn. the best documentation https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/
Cluster Manager allocates resources across the other applications.
I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.
You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
Are all 120 nodes having identical capacity?
Moreover the jobs will be submitted to a suitable node manager based on the health and resource availability of the node manager.
To optimise spark job, You can use dynamic resource allocation, where you do not need to define the number of executors required for running a job. By default it runs the application with the configured minimum cpu and memory. Later it acquires resource from the cluster for executing tasks. It will release the resources to the cluster manager once the job has completed and if the job is idle up to the configured idle timeout value. It reclaims the resources from the cluster once it starts again.

Why is the number of cores for driver and executors on YARN different from the number requested?

I deploy a spark job in the cluster mode with the following
Driver core - 1
Executor cores - 2
Number of executors - 2.
My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core) but i dont observe this in the RM and Spark UIs.
On the resource manager UI, i see only 4 cores used for this application.
Even in Spark UI (on click of ApplicationMaster URL from RM), under the executors tab, the driver cores is shown as zero.
Am i missing something?
The cluster manager is YARN.
My understanding is that this application should occupy 5 cores in the cluster (4 executor cores and 1 driver core)
That's the perfect situation in YARN where it could give you 5 cores off the CPUs it manages.
but i dont observe this in the RM and Spark UIs.
Since the perfect situation does not occur often something it's nice to have as many cores as we could get from YARN so the Spark application could ever start.
Spark could just wait indefinitely for the requested cores, but that could not always be to your liking, could it?
That's why Spark on YARN has an extra check (aka minRegisteredRatio) that's the minimum of 80% of cores requested before the application starts executing tasks. You can use spark.scheduler.minRegisteredResourcesRatio Spark property to control the ratio. That would explain why you see less cores in use than requested.
Quoting the official Spark documentation (highlighting mine):
spark.scheduler.minRegisteredResourcesRatio
0.8 for YARN mode
The minimum ratio of registered resources (registered resources / total expected resources) (resources are executors in yarn mode, CPU cores in standalone mode and Mesos coarsed-grained mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] ) to wait for before scheduling begins. Specified as a double between 0.0 and 1.0. Regardless of whether the minimum ratio of resources has been reached, the maximum amount of time it will wait before scheduling begins is controlled by config spark.scheduler.maxRegisteredResourcesWaitingTime.

Resources