Is it possible to add Fair scheduler pools programmatically in Spark? - apache-spark

I'm developing an application where several users use the same SparkContext to launch their queries to a Spark Cluster.
As the Spark documentation states (https://spark.apache.org/docs/2.2.0/job-scheduling.html#fair-scheduler-pools), with the Fair scheduler, you can assign a different pool for every user and they'll get a fair share of the cluster resources but every pool will be set up with the default pool configuration (scheduling mode FIFO, weight 1, and minShare 0).
Given that we don't know in advance which users can connect to the application, we can't set up a configuration file for the fair scheduler pools for all the users.
So, in order to give a pool to every user dynamically and set up every pool with a FAIR scheduling mode, I think there might be 2 options:
Change the default pool behaviour in order to change the scheduling mode to FAIR. Is it possible? How?
Generate a scheduler pool dynamically and programmatically in order to add a scheduler pool when a user connects to the application for first time and that pool should be created with a FAIR scheduling mode. Is it possible? How?
Thanks in advance

Related

Optimizing Apache Spark on Kubernetes using custom plugins and the scheduling framework

My goal is to optimally run Spark applications alongside the stateless workload in my cluster to make the best use of my cluster resources.
Since Spark applications can suffer from partial scheduling (drivers blocking the executors as the driver pods are started first which then request for the executor pods), a simple strategy to prevent this would be to implement the much talked about gang/co-scheduling to make sure that we only start the driver pod if we can guarantee that all the executors can be started in the future by implementing some kind of a reservations design such that the driver can reserve resources for the executors that will be started in the future.
Also, this reservation definition/implementation must be visible to all the other non-spark pods as well since they would also have to log their resource requests like the Spark pods so we have a clear picture of the cluster resource utilization.
The current implementations include running a new custom scheduler or implementing a scheduler extender to do so but I was wondering if we can achieve this by writing custom scheduler plugins. Additionally, what extension points in the scheduling framework would the plugins have to take advantage of to optimize the scheduling of Spark jobs in a multi-tenant environment (with different kinds of workload) so that my default profile can continue to schedule the stateless workload while the custom profile that uses these plugins can schedule Spark applications?
Finally, would this be the best way to optimize scheduling Spark and Stateless workload in a multi-tenant environment? What would the drawbacks of this approach (using custom plugins) be since we only have a single queue that all the profiles must share?
It sounds like what you would like to have is Gang Scheduling 📆. If you'd like to have that capability, I suggest you use Volcano to schedule/run 🏃 your jobs in Kubernetes with Gang Scheduling.
Another approach is to create your own scheduler using the scheduler extender as described here or use the Palantir gang scheduler extender.
✌️

Create a Spark pool by user by default on Zeppelin Notebook

I am working with Spark inside Zeppelin in a collaborative environment. So we have only one interpreter and many users are using this interpreter. For this reason, I defined it using instantiation per user in scoped mode.
With this configuration, a user job X await the resource allocated by jobs of another users.
To change this behavior and allow jobs from different users to execute at the same time, I defined the Spark configuration (on Zeppelin interpreter configurations) spark.scheduler.mode equal to FAIR. To make desired effect, the user need to define manually, on your notebook, your own Spark pool (jobs from different pools can be executed at same time: https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application) with this code:
sc.setLocalProperty("spark.scheduler.pool", "pool1")
Ps.: After one hour, the interpreter shutdown. If users forget to execute this command on next time, they fall in default pool, what is not good.
What I want to know: Is possible to set a Spark user pool automatically when he executes your paragraphs without manual efforts every time?
If there is another way to do this, please let me know if it's possible.

Airflow: how to specify quantitative usage of a resource pool?

I am looking at several open source workflow schedulers for a DAG of jobs with heterogeneous RAM usage. The scheduler should not only schedule less than a maximum number of threads, but should also keep the total amount of RAM of all concurrent tasks below the available memory.
In this Luigi Q&A, it was explained that
You can set how many of the resource is available in the config, and
then how many of the resource the task consumes as a property on the
task. This will then limit you to running n of that task at a
time.
in config:
[resources]
api=1
in code for Task:
resources = {"api": 1}
For Airflow, I haven't been able to find the same functionality in its docs. The best that seems possible is to specify a number of available slots in a resource pool, and to also specify that a task instance uses a single slot in a resource pool. However, it appears there is no way to specify that a task instance uses more than one slot in a pool.
Question: specifically for Airflow, how can I specify a quantitative resource usage of a task instance?
Assuming you're using CeleryExecutor, then starting from airflow version 1.9.0 you can manage Celery's tasks concurrency. This is not exactly memory management you've been asking about but number of concurrent worker's threads executing tasks.
Tweakable parameter is called CELERYD_CONCURRENCY and here is very nicely explained how to manage celery related config in Airflow.
[Edit]
Actually, Pools could also be used to limit concurrency.
Let's say you want to limit resource hungry task_id so that only 2 instances will be run at the same time. The only thing you need to do is:
create pool (in UI: Admin -> Pools) assign it name e.g. my_pool and define task's concurrency in field Slots (in this case 2)
when instantiating your Operator that will execute this task_id, pass defined pool name (pool=my_pool)

How to enable Fair scheduler?

I'd like to understand the internals of Spark's FAIR scheduling mode. The thing is that it seems not so fair as one would expect according to the official Spark documentation:
Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
It seems like jobs are not handled equally and actually managed in fifo order.
To give more information on the topic:
I am using Spark on YARN. I use the Java API of Spark. To enable the fair mode, The code is :
SparkConf conf = new SparkConf();
conf.set("spark.scheduler.mode", "FAIR");
conf.setMaster("yarn-client").setAppName("MySparkApp");
JavaSparkContext sc = new JavaSparkContext(conf);
Did I miss something?
It appears that you didn't set up the pools and all your jobs end up in a single default pool as described in Configuring Pool Properties:
Specific pools’ properties can also be modified through a configuration file.
and later
A full example is also available in conf/fairscheduler.xml.template. Note that any pools not configured in the XML file will simply get default values for all settings (scheduling mode FIFO, weight 1, and minShare 0).
It can also be that you didn't set up the local property to set up the pool to use for a given job(s) as described in Fair Scheduler Pools:
Without any intervention, newly submitted jobs go into a default pool, but jobs’ pools can be set by adding the spark.scheduler.pool “local property” to the SparkContext in the thread that’s submitting them.
It can finally mean that you use a single default FIFO pool so one pool in FIFO mode changes nothing comparing to FIFO without pools.
It's only you to know the real answer :)

Does Spark's Fair Scheduler pool provides inter- or intra-application scheduling?

I am quite confused,because these pools are getting created for each spark application, and also if I keep minshare for a pool greater than the total cores of the cluster, the pool got created.
So if these pools are intra application do I need to, assign different pools to different spark jobs manually, because if I use sparkcontext.setlocalproperty for setting the pool, then all the stages of that application goes to that pool.
Point is that can we have jobs from two different application, to go in the same pool, so if I have application a1 and used sparkcontext.(pool,p1), and other application a2 and used sparkcontext.(pool,p1), would jobs for both applocation will go to the same pool p1 or p1 for a1 is different from p1 for a2.
As described in Spark's official documentation in Scheduling Within an Application:
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads.
and later in the same document:
Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings.
With that, the scheduling happens within resources given to a Spark application and how much it gets depends on CPUs/vcores and memory available in a cluster manager.
The Fair Scheduler mode is essentially for Spark applications with parallel jobs.

Resources