How to run multiple spark jobs on k8s cluster with simple scheduler - apache-spark

My main Intent is to get Spark (3.3) running on k8s with HDFS.
I went though the Spark website and got the spark pi program running on a k8s cluster in the form of spark-submit command. I read that if we submit multiple jobs to k8s cluster the k8s may end up starving all pods -- meaning there is no queueing in place, no scheduler (like yarn) which keeps a check on resources and arranges the tasks across the nodes.
So, my question is: what is the simplest way to write a scheduler in k8s? I read about volcano -- but it's not yet in GA. I read about Gang, Younikorn -- but I don't see much community support.

Related

Monitor Spark with Prometheus when Spark clusters are spined up just when needed

We run spark over Kubernetes and we spin up a spark driver and executors for a lot of our tasks (not a spark task). After the task is finished we spin the cluster (on Kubernetes) down and spin up another one when needed (There could be a lot running simultaneously).
The problem I have is that I can't monitor it with Prometheus because I do not have a diver that is always "alive" that I can pull information on the executors from.
Is there a solution for that kind of architecture?

Spark job on Kubernetes Under Resource Starvation Wait Indefinitely For SPARK_MIN_EXECUTORS

I am using Spark 3.0.1 and working on a project spark deployment on Kubernetes where Kubernetes acting cluster manager for spark job and spark submits the job using client mode. In case Cluster does not have sufficient resource (CPU/ Memory ) for minimum number of executors , the executors goes in Pending State for indefinite time until the resource gets free.
Suppose, Cluster Configurations are:
total Memory=204Gi
used Memory=200Gi
free memory= 4Gi
SPARK.EXECUTOR.MEMORY=10G
SPARK.DYNAMICALLOCTION.MINEXECUTORS=4
SPARK.DYNAMICALLOCATION.MAXEXECUTORS=8
Here job should not be submitted as executors allocated are less than MIN_EXECUTORS.
How can driver abort the job in this scenario?
Firstly would like to mention that, spark dynamic allocation not supported for kubernetes yet(as of version 3.0.1), its in pipeline for future release Link
while for the requirement you have posted, you could address by running a resource monitor code snippet before the job initialized and terminate the initialization pod itself with error.
if you want to run this from CLI you could use kubectl describe nodes/ kube-capacity utility to monitor the resources

How to run a Spark Standalone master on Kubernetes that will use the Kubernetes Cluser Manager to start workers

I have an application that currently uses Standalone Mode locally to use spark functionality via the SparkContext. We are not using spark-submit to upload our jobs, we are running our application in a container on kubernetes so we would like to take advantage of the dynamic scheduling that kubernetes provides to run the jobs.
We started out looking for a helm chart to create stand alone cluster running on kubernetes similar to how you would have run a standalone cluster on machines ( vms or actual machines ) a few years ago and came across the following
https://github.com/helm/charts/tree/master/stable/spark
Issues:
very old instances of spark
not using the containers provided by spark
this setup wastes a bunch of resources if you need to have large worker nodes reserved and running all the time regardless of your need
Next we started looking at the spark-operator approach here https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
Issues:
Doesn't support the way we interact with spark, takes the approach that all the apps are standalone apps that are pushed to the cluster to run
No longstanding master that allows us to take advantage of cached resources in the cluster
Along this journey we discovered that spark now supports a kubernetes cluster manager ( similar to the way it does with yarn, mesos ) so we are looking that this might be the best approach, but this still does not provide a standalone master that would allow for the in memory caching. I have looked to see if there was a way that I could get the org.apache.spark.deploy.master.Master to start and use the
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager
So I guess what I'm trying to ask is does anyone have any experience in trying to run a Standalone Master, that would use the kubernetes backend such as "KubernetesClusterManager" in order to have the worker nodes dynamically created as pods and running executors while having a permanent Standalone Master that would allow a SparkContext to connect to it remotely in client mode.

Kill Spark Job or terminate EMR Cluster if job takes longer than expected

I have a spark job that periodically hangs, leaving my AWS EMR cluster in a state where an application is RUNNING but really the cluster is stuck. I know that if my job doesn't get stuck, it'll finish in 5 hours or less. If it's still running after that, it's a sign that the job is stuck. Yarn and the Spark UI is still responsive, the it's just that an executor gets stuck on a task.
Background: I'm using an ephemeral EMR cluster that performs only one step before terminating, so it's not a problem to kill it off if I notice this job is hanging.
What's the easiest way to kill the task, job, or cluster in this case? Ideally this would not involve setting up some extra service to monitor the job -- ideally there would be some kind of spark / yarn / emr setting I could use.
Note: I've tried using spark speculation to unblock the stuck spark job, but that doesn't help.
EMR has a Bootstrap Actions feature where you can run scripts that start up when initializing the cluster. I've used this feature along with a startup script that monitors how long the cluster has been online and terminates itself after a certain time.
I use a script based off this one for the bootstrap action. https://github.com/thomhopmans/themarketingtechnologist/blob/master/6_deploy_spark_cluster_on_aws/files/terminate_idle_cluster.sh
Basically make a script that checks /proc/uptime to see how long the EC2 machine has been online and after uptime surpasses your time limit you can send a shutdown command to the cluster.

running multiple Spark jobs on a Mesos cluster

I would like to run multiple spark jobs on my Mesos cluster, and have all spark jobs share the same spark framework. Is this possible?
I have tried running the MesosClusterDispatcher and have the spark jobs connect to the dispatcher, but each spark job launches its own "Spark Framework" (I have tried running both client-mode and cluster-mode).
Is this the expected behaviour?
Is it possible to share the same spark-framework among multiple spark jobs?
It is normal and it's the expected behaviour.
In Mesos as far as I know, SparkDispatcher is in charge of allocate resources for your Spark Driver which will act as a framework. Once Spark driver has been allocated, it is responsible for talk to Mesos and accept offers to allocate the executors where tasks will be executed.

Resources