Kubernetes: shutdown node between jobs runs - cron

I run a Kubernetes CronJob every week, on a Kubernetes cluster that have a single node in it. It runs on Google Compute Engine.
I would like to shutdown the node completely between two jobs, for billing purposes (we pay the price as if the machine was used for the whole week but it is actually useful a few hours)
Is it possible boot the node, run the job, then shutdown the node?

The cluster autoscaler can help with this:
https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler

Related

Kubernetes Jobs or Pods for completion Jobs with auto scaling

I have CPU Intensive Jobs/tasks,
Need to run them in kubernetes, below is the process of job/task
We get request in terms queue or API Call
POd should be created and process the task ( few Jobs may run in minutes, few in hours)
delete pod once task completed
This should happen in scale, if more jobs in queue, create more jobs (Max 10, 20, 30 2e should define it)
I am used KEDA, POD will be created and after Job completion it is going crashloopbback, It is default behaviour in POD life cycle, because it try to recreate pod since restart policy is set to Always. We have other options like OnFailure, Never, But I read it Kubernetes Jobs are more suitable
Which is the better option Kubernetes Pods or Jobs for above task, we should consider scaling POds and also required scale kubernetes nodes (Cloud vendors supports it) based on usage and numbers of tasks in queue.
KEDA ScaledJobs are best for such scenarios and can be triggered through Queue, Storage, etc. (the currently available scalers can be found here)

Scale up the spark worker nodes using code

I want to scale up the spark cluster to make all the worker nodes up and running before I start my processing. The issue is because the autoscaling of worker nodes is not happening immediately on load and is leading to worker node crashes. The cluster has 32 nodes but is overloading only 4 nodes and crashing so what I am trying to do is write some lines of code in the start of the python notebook which will kick start the remaining nodes and have 24 nodes up and running and then do the actual data processing. Is this possible using code ? Please advise.
In general, autoscale is for interactive workloads. I've rarely seen it provide benefits in jobs, though marketing makes a good job of selling it as a cost saving feature.
You can use Databricks jobs to create an automated cluster. When you run a job on a new automated cluster and terminates the cluster when the job is complete.
If you know when scaling up should happen better than auto scale then you can use this resize API: https://docs.databricks.com/dev-tools/api/latest/clusters.html#resize

How to run a Spark Standalone master on Kubernetes that will use the Kubernetes Cluser Manager to start workers

I have an application that currently uses Standalone Mode locally to use spark functionality via the SparkContext. We are not using spark-submit to upload our jobs, we are running our application in a container on kubernetes so we would like to take advantage of the dynamic scheduling that kubernetes provides to run the jobs.
We started out looking for a helm chart to create stand alone cluster running on kubernetes similar to how you would have run a standalone cluster on machines ( vms or actual machines ) a few years ago and came across the following
https://github.com/helm/charts/tree/master/stable/spark
Issues:
very old instances of spark
not using the containers provided by spark
this setup wastes a bunch of resources if you need to have large worker nodes reserved and running all the time regardless of your need
Next we started looking at the spark-operator approach here https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
Issues:
Doesn't support the way we interact with spark, takes the approach that all the apps are standalone apps that are pushed to the cluster to run
No longstanding master that allows us to take advantage of cached resources in the cluster
Along this journey we discovered that spark now supports a kubernetes cluster manager ( similar to the way it does with yarn, mesos ) so we are looking that this might be the best approach, but this still does not provide a standalone master that would allow for the in memory caching. I have looked to see if there was a way that I could get the org.apache.spark.deploy.master.Master to start and use the
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager
So I guess what I'm trying to ask is does anyone have any experience in trying to run a Standalone Master, that would use the kubernetes backend such as "KubernetesClusterManager" in order to have the worker nodes dynamically created as pods and running executors while having a permanent Standalone Master that would allow a SparkContext to connect to it remotely in client mode.

Kill Spark Job or terminate EMR Cluster if job takes longer than expected

I have a spark job that periodically hangs, leaving my AWS EMR cluster in a state where an application is RUNNING but really the cluster is stuck. I know that if my job doesn't get stuck, it'll finish in 5 hours or less. If it's still running after that, it's a sign that the job is stuck. Yarn and the Spark UI is still responsive, the it's just that an executor gets stuck on a task.
Background: I'm using an ephemeral EMR cluster that performs only one step before terminating, so it's not a problem to kill it off if I notice this job is hanging.
What's the easiest way to kill the task, job, or cluster in this case? Ideally this would not involve setting up some extra service to monitor the job -- ideally there would be some kind of spark / yarn / emr setting I could use.
Note: I've tried using spark speculation to unblock the stuck spark job, but that doesn't help.
EMR has a Bootstrap Actions feature where you can run scripts that start up when initializing the cluster. I've used this feature along with a startup script that monitors how long the cluster has been online and terminates itself after a certain time.
I use a script based off this one for the bootstrap action. https://github.com/thomhopmans/themarketingtechnologist/blob/master/6_deploy_spark_cluster_on_aws/files/terminate_idle_cluster.sh
Basically make a script that checks /proc/uptime to see how long the EC2 machine has been online and after uptime surpasses your time limit you can send a shutdown command to the cluster.

Sudden surge in number of YARN apps on HDInsight cluster

For some reason sometimes the cluster seems to misbehave for I suddenly see surge in number of YARN jobs.We are using HDInsight Linux based Hadoop cluster. We run Azure Data Factory jobs to basically execute some hive script pointing to this cluster. Generally average number of YARN apps at any given time are like 50 running and 40-50 pending. None uses this cluster for ad-hoc query execution. But once in few days we notice something weird. Suddenly number of Yarn apps start increasing, both running as well as pending, but especially pending apps. So this number goes more than 100 for running Yarn apps and as for pending it is more than 400 or sometimes even 500+. We have a script that kills all Yarn apps one by one but it takes long time, and that too is not really a solution. From our experience we found that the only solution, when it happens, is to delete and recreate the cluster. It may be possible that for some time cluster's response time is delayed (Hive component especially) but in that case even if ADF keeps retrying several times if a slice is failing, is it possible that the cluster is storing all the supposedly failed slice execution requests (according to ADF) in a pool and trying to run when it can? That's probably the only explanation why it could be happening. Has anyone faced this issue?
Check if all the running jobs in the default queue are Templeton jobs. If so, then your queue is deadlocked.
Azure Data factory uses WebHCat (Templeton) to submit jobs to HDInsight. WebHCat spins up a parent Templeton job which then submits a child job which is the actual Hive script you are trying to run. The yarn queue can get deadlocked if there are too many parents jobs at one time filling up the cluster capacity that no child job (the actual work) is able to spin up an Application Master, thus no work is actually being done. Note that if you kill the Templeton job this will result in Data Factory marking the time slice as completed even though obviously it was not.
If you are already in a deadlock, you can try adjusting the Maximum AM Resource from the default 33% to something higher and/or scaling up your cluster. The goal is to be able to allow some of the pending child jobs to run and slowly draining the queue.
As a correct long term fix, you need to configure WebHCat so that parent templeton job is submitted to a separate Yarn queue. You can do this by (1) creating a separate yarn queue and (2) set templeton.hadoop.queue.name to the newly created queue.
To create queue you can do this via the Ambari > Yarn Queue Manager.
To update WebHCat config via Ambari go to Hive tab > Advanced > Advanced WebHCat-site, and update the config value there.
More info on WebHCat config:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure

Resources