Apache Spark running on YARN with fix allocation - apache-spark

What's happening right now is YARN simply gets a number of executor from one spark job and give it to another spark job. As a result, this spark job encounters error and die.
Is there a way or an existing configuration where a certain spark job running on YARN have a fix resource allocation?

Fix resource allocation is an old concept and doesn't give benefit of proper resource utilization. Dynamic resource allocation is an advanced/expected feature of YARN. So, I recommend that you see what is happening actually. If a job is already running on then YARN doesn't take the resources and gives it to others. If resources are not available then the 2nd job will get queued and resources will not be pulled up abruptly from the 1st job. The reason is containers have a combination of memory and CPU. If memory is allocated to other job then basically it means that the JVM of the 1st job is lost for ever. YARN doesn't do what have mentioned.

Related

What will happen if my driver or executor is lost in Spark while running an spark-application?

Three questions of similarity:
what will happen if one my one executor is lost.
what will happen if my driver is lost.
What will happen in case of stage failure.
In all the above cases, are they recoverable? If yes, how to recover. Is there any option in "SparkConf", setting which these can be prevented from?
Thanks.
Spark use job scheduling. DAGScheduler is implemented by cluster managers (Standalone, YARN, Mesos), and your cluster manager can re-schedule the failed task.
For example, if you use YARN, try tweaking spark.yarn.maxAppAttempts and yarn.resourcemanager.am.max-attempts. Also, you can try to manually track jobs using the HTTP API: https://community.hortonworks.com/articles/28070/starting-spark-jobs-directly-via-yarn-rest-api.html
If you want to recover from logical errors, you can try checkpointing (saving records to HDFS for later use): https://mallikarjuna_g.gitbooks.io/spark/content/spark-streaming/spark-streaming-checkpointing.html. (For really long and important pipelines I recommend saving your data in normal files instead of checkpoints!).
Configuring high-available clusters is a more complex task than tweaking 1 setting in SparkConf. You can try to implement different scenarios and return with more detailed questions. As a first step, you can try to run everything on YARN.

Spark job on Kubernetes Under Resource Starvation Wait Indefinitely For SPARK_MIN_EXECUTORS

I am using Spark 3.0.1 and working on a project spark deployment on Kubernetes where Kubernetes acting cluster manager for spark job and spark submits the job using client mode. In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum number of executors , the executors goes in Pending State for indefinite time until the resource gets free.
Suppose, Cluster Configurations are:
total Memory=204Gi
used Memory=200Gi
free memory= 4Gi
SPARK.EXECUTOR.MEMORY=10G
SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
Here job should not be submitted as executors allocated are less than MIN_EXECUTORS.
How can driver abort the job in this scenario?
Firstly would like to mention that, spark dynamic allocation not supported for kubernetes yet(as of version 3.0.1), its in pipeline for future release Link
while for the requirement you have posted, you could address by running a resource monitor code snippet before the job initialized and terminate the initialization pod itself with error.
if you want to run this from CLI you could use kubectl describe nodes/ kube-capacity utility to monitor the resources

Spark Jobs in ACCEPTED state though there are resources

We are running Spark Jobs in AWS EMR, we are facing this issue quite frequently where the jobs are in ACCEPTED state and doesn't move to the RUNNING state even though there are resources or no running jobs. Also its quite strange to see the Memory Used is around 400 GB which should have been released when there are currently no running jobs.
What steps or configuration changes need to be made to resolve this issue.
Note: Jobs are running with Dynamic Allocation as the cluster is scalable.
Scheduler Type : Fair Scheduler
Please do let me know if any additional information is required for the same

Spark job in Dataproc dynamic vs static allocation

I have a Dataproc cluster:
master - 6cores| 32g
worker{0-7} - 6cores| 32g
Maximum allocation: memory:24576, vCores:6
Have two spark-streaming jobs to submit, one after another
In the first place, I tried to submit with default configurations spark.dynamicAllocation.enabled=true
In 30% of cases, I saw that the first job caught almost all available memory and the second was queued and waited for resources for ages. (This is a streaming job which took a small portion of resources every batch ).
My second try was to change a dynamic allocation. I submitted the same two jobs with identical configurations:
spark.dynamicAllocation.enabled=false
spark.executor.memory=12g
spark.executor.cores=3
spark.executor.instances=6
spark.driver.memory=8g
Surprisingly in Yarn UI I saw:
7 Running Containers with 84g Memory allocation for the first job.
3 Running Containers with 36g Memory allocation and 72g Reserved Memory for the second job
In Spark UI there are 6 executors and driver for the first job and 2 executors and driver for the second job
After retrying(deleting previous jobs and submitting the same jobs) without dynamic allocation and same configurations, I got a totally different result:
5 containers 59g Memory allocation for both jobs and 71g Reserved Memory for the second job. In spark UI I see 4 executors and driver in both cases.
I have a couple of questions:
If dynamicAllocation=false, why the number of yarn containers is
different from the number of executors? (Firstly I thought that
additional yarn container is a driver, but it differs in memory.)
If dynamicAllocation=false, Why Yarn doesn't create containers by my
exact requirements- 6 containers(spark executors) for both jobs. Why two different attempts with the same configuration lead to different results?
If dynamicAllocation=true - how may it be possible that low consuming memory spark job takes control of all Yarn resources
Thanks
Spark and YARN scheduling are pretty confusing. I'm going to answer the questions in reverse order:
3) You should not be using dynamic allocation in Spark streaming jobs.
The issue is that Spark continuously asks YARN for more executors as long as there's a backlog of tasks to run. Once a Spark job gets an executor, it keeps it until the executor is idle for 1 minute (configurable, of course). In batch jobs, this is okay because there's generally a large, continuous backlog of tasks.
However, in streaming jobs, there's a spike of tasks at the start of every micro-batch, but executors are actually idle most of the time. So a streaming job will grab a lot of executors that it doesn't need.
To fix this, the old streaming API (DStreams) has its own version of dynamic allocation: https://issues.apache.org/jira/browse/SPARK-12133. This JIRA has more background on why Spark's batch dynamic allocation algorithm isn't a good fit for streaming.
However, Spark Structured Streaming (likely what you're using) does not support dynamic allocation: https://issues.apache.org/jira/browse/SPARK-24815.
tl;dr Spark requests executors based on its task backlog, not based on memory used.
1 & 2) #Vamshi T is right. Every YARN application has an "Application Master", which is responsible for requesting containers for the application. Each of your Spark jobs has an app master that proxies requests for containers from the driver.
Your configuration doesn't seem to match what you're seeing in YARN, so not sure what's going on there. You have 8 workers with 24g given to YARN. With 12g executors, you should have 2 executors per node, for a total of 16 "slots". An app master + 6 executors should be 7 containers per application, so both applications should fit within the 16 slots.
We configure the app master to have less memory, that's why total memory for an application isn't a clean multiple of 12g.
If you want both applications to schedule all their executors concurrently, you should set spark.executor.instances=5.
Assuming you're using structured streaming, you could also just run both streaming jobs in the same Spark application (submitting them from different threads on the driver).
Useful references:
Running multiple jobs in one application: https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
Dynamic allocation: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
Spark-on-YARN: https://spark.apache.org/docs/latest/running-on-yarn.html
I have noticed similar behavior in my experience as well and here is what I observed. Firstly the resource allocation by yarn depends on available resources on cluster when the job is submitted. When both jobs are submitted at almost the same time with same config, yarn distributes the available resources equally between the jobs. Now when you throw dynamic allocation in to the mix, things get a little confusing/complex. Now in your case below:
7 Running Containers with 84g Memory allocation for the first job.
--You got 7 containers because you requested 6 executors, one container for each executor and the extra one container is for the application Master
3 Running Containers with 36g Memory allocation and 72g Reserved Memory for the second job
--Since the second job was submitted after some time, Yarn allocated the remaining resources...2 containers, one for each executor and the extra one for your application master.
Your containers will never match the executors you requested and will always be one more than the number of executors you requested because you need one container to run your application master.
Hope that answers part of your question.

Zeppelin persists job in YARN

When I run a Spark job from Zeppelin, the job finishes with success, but it stays in YARN on mode running.
The problem is the job is taking a resource in YARN. I think that Zeppelin persists the job in YARN.
How can I resolve this problem?
Thank you
There are two solutions.
The quick one is to use the "restart interpreter" functionality, which is misnamed, since it merely stops the interpreter. In this case the Spark job in Yarn.
The elegant one is to configure Zeppelin to use dynamic allocation with Spark. In that case the Yarn application master will continue running, and with it the Spark driver, but all executors (which are the real resource hog) can be freed by Yarn, when they're not in use.
The easiest and straight-forward solution is to restart the spark interpreter.
But as Rick mentioned if you should use the spark dynamic allocation, an additional step of enabling spark shuffle service on all agent nodes is required(this by default is disabled).
Just close your spark context so that the spark job will get the status FINISHED.
Your memory should be released.

Resources