Spark driver pod eviction Kubernetes - apache-spark

What would be the recommended approach to wait for spark driver pod to complete the currently running job before it gets evicted to new nodes while maintenance on the current node is goingon (kernel upgrade,hardware maintenance etc..) using drain command
I don’t think I can use PoDisruptionBudget as Spark pods deployment yaml(s) is taken by Kubernetes

Related

Spark job on Kubernetes Under Resource Starvation Wait Indefinitely For SPARK_MIN_EXECUTORS

I am using Spark 3.0.1 and working on a project spark deployment on Kubernetes where Kubernetes acting cluster manager for spark job and spark submits the job using client mode. In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum number of executors , the executors goes in Pending State for indefinite time until the resource gets free.
Suppose, Cluster Configurations are:
total Memory=204Gi
used Memory=200Gi
free memory= 4Gi
SPARK.EXECUTOR.MEMORY=10G
SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
Here job should not be submitted as executors allocated are less than MIN_EXECUTORS.
How can driver abort the job in this scenario?
Firstly would like to mention that, spark dynamic allocation not supported for kubernetes yet(as of version 3.0.1), its in pipeline for future release Link
while for the requirement you have posted, you could address by running a resource monitor code snippet before the job initialized and terminate the initialization pod itself with error.
if you want to run this from CLI you could use kubectl describe nodes/ kube-capacity utility to monitor the resources

Spark Executors PODS in Pending State on Kubernetes Deployment

I deployed a simple spark application on kubernetes with following configurations:
spark.executor.instances=2;spark.executor.memory=8g;
spark.dynamicAllocation.enabled=true;spark.dynamicAllocation.shuffleTracking.enabled=true;
spark.executor.cores=2;spark.dynamicAllocation.minExecutors=2;spark.dynamicAllocation.maxExecutors=2;
Memory requirements of Executor PODS are more than what is available on Kubernetes cluster and due to this Spark Executor PODS always stay in PENDING state as below.
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/spark-k8sdemo-6e66d576f655b1f5-exec-1 0/1 Pending 0 10m
pod/spark-k8sdemo-6e66d576f655b1f5-exec-2 0/1 Pending 0 10m
pod/spark-master-6d9bc767c6-qsk8c 1/1 Running 0 10m
I know the reason is non-availability of resources as show in Kubectl describe command:
$ kubectl describe pod/spark-k8sdemo-6e66d576f655b1f5-exec-1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s (x12 over 12m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.
On the other hand, driver pods keeps waiting forever for Executor PODS to get ample resources as below.
$ kubectl logs pod/spark-master-6d9bc767c6-qsk8c
21/01/12 11:36:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/01/12 11:37:01 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21/01/12 11:37:16 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Now my question here is if there is some way to make driver wait only for sometime/retries and if Executors still don't get resource, driver POD should auto-die with printing proper message/logs e.g. "application aborted as there were no resources in cluster".
I went through all Spark configurations for above requirement but couldn't fine any. Though in YARN we have spark.yarn.maxAppAttempts but nothing similar was found for Kubernetes.
If no such configuration is available in Spark Is there a way in kubernetes POD definition to achieve the same.
This is Apache Spark 3.0.1 here. No idea if things get any different in the upcoming 3.1.1.
tl;dr I don't think there's a built-in support for "driver wait only for sometime/retries and if Executors still don't get resource, driver POD should auto-die with printing proper message/logs".
My very basic understanding of Spark on Kubernetes lets me claim that there is no such feature to "auto-die" the driver pod when there are no resources for executor pods.
There is podCreationTimeout (based on spark.kubernetes.allocation.batch.delay configuration property) and spark.kubernetes.executor.deleteOnTermination configuration property that make Spark on Kubernetes delete executor pods requested but not created, but that's not really what you want.
Dynamic Allocation of Executors could make things a bit more complex, but it does not really matter in this case.
A workaround could be to use spark-submit --status to request status of a Spark application and check whether it's up and running or not and --kill it after a certain time threshold (you could achieve a similar thing using kubectl directly too).
Just a FYI and to make things a bit more interesting. You should be reviewing the two other Spark on Kubernetes-specific configuration properties:
spark.kubernetes.driver.request.cores
spark.kubernetes.executor.request.cores
There could be others.
After lot of digging, we are finally able to utilize SparkListener to check if Spark Application has started and ample number of executors have been registered. If this condition meets then proceed further with Spark jobs else return with a Warning stating no ample resources in kubernetes cluster to run that Spark Job.
Is there a way in kubernetes POD definition to achieve the same.
You could use an init container in your Spark Driver pods that confirms the Spark Exec pods are available.
The init container could be a simple shell script. The logic could be written so it only tries so many times and or times out.
In a pod definition all init containers must succeed before continuing to any other containers in the pod.

Spark on Kubernetes: Is it possible to keep the crashed pods when a job fails?

I have the strange problem that a Spark job ran on Kubernetes fails with a lot of "Missing an output location for shuffle X" in jobs where there is a lot of shuffling going on. Increasing executor memory does not help. The same job run on just a single node of the Kubernetes cluster in local[*] mode runs fine however so I suspect it has to do with Kubernetes or underlying Docker.
When an executor dies, the pods are deleted immediately so I cannot track down why it failed. Is there an option that keeps failed pods around so I can view their logs?
You can view the logs of the previous terminated pod like this:
kubectl logs -p <terminated pod name>
Also use spec.ttlSecondsAfterFinished field of a Job as mentioned here
Executors are deleted by default on any failures and you cannot do anything with that unless you customize Spark on K8s code or use some advanced K8s tooling.
What you can do (and most probably is the easiest approach to start with) is configuring some external log collectors, eg. Grafana Loki which can be deployed with 1 click to any K8s cluster, or some ELK stack components. These will help you to persist logs even after pods are deleted.
There is a deleteOnTermination setting in the spark application yaml. See the spark-on-kubernetes README.md.
deleteOnTermination - (Optional)
DeleteOnTermination specify whether executor pods should be deleted in case of failure or normal termination. Maps to spark.kubernetes.executor.deleteOnTermination that is available since Spark 3.0.

What's the most elegant/right way to stop a spark job running on a Kubernetes cluster?

I'm new to apache spark and I'm trying to run a spark job using spark-submit on my Kubernetes cluster. I was wondering if there's a right way to stop spark jobs once the driver and executor pods are spawned? Would deleting the pods themselves be enough?
Thanks!
When you will delete executor it will be recreated again and spark application will work. However if you will delete driver pod it will stop application.
So killing driver pod is actually the way to stop the Spark
Application during the execution.
As you are new to Spark and you want to run it on Kubernetes, you should check this tutorial.
At present the only way to stop Spark job running on Kuberentes is to delete the Driver Pod (unless you have an app controlling Spark context which is able to manipulate it). Since all other job-related resources are linked to Spark Driver Pod with such as called ownerReferences, they will be removed automatically by Kubernetes.
It should clean things up when the job completes automatically.

Rest API for Spark2.3 submit on kubernetes(version 1.8.*) cluster

Im using kubernetes cluster on AWS to run spark jobs ,im using spark 2.3 ,now i want to run spark-submit from AWS lambda function to k8s master,would like to know if there is any REST interface to run Spark submit on k8s Master?
Unfortunately, it is not possible for Spark 2.3, in case you are using native Kubernetes support.
Based on description from deployment instruction, submission process contains several steps:
Spark creates a Spark driver running within a Kubernetes pod.
The driver creates executors which are also running within Kubernetes pods
The driver connects to them, and executes application code
When the application completes, executor pods terminate and are cleaned up, but the driver pod persists its logs and remains in “completed” state in the Kubernetes API until it’s eventually garbage collected or manually cleaned up.
So, in fact, you have no place to submit a job until you start a submission process, which will launch the first Spark's pod (driver) for you. Only once application completes, everything is terminated.
Please also see similar answer for this question under the link

Resources