Which YARN Scheduler to use for streaming queries? - apache-spark

I have about 5 Spark structured streaming applications on Yarn. But I don't know what is the best scheduler for this?
FIFO Scheduler
Capacity Scheduler
Fair Scheduler
I think Capacity Scheduler more suitable for this, but I don't know how can I create queues for all streaming applications.
Should I create queue/subqueue for each streaming application using Capacity Scheduler?
Also, later, I will add spark batch applications on the same cluster/resources. Thus, I want to manage all cluster resources with most effective way using schedulers.

Related

Optimizing Apache Spark on Kubernetes using custom plugins and the scheduling framework

My goal is to optimally run Spark applications alongside the stateless workload in my cluster to make the best use of my cluster resources.
Since Spark applications can suffer from partial scheduling (drivers blocking the executors as the driver pods are started first which then request for the executor pods), a simple strategy to prevent this would be to implement the much talked about gang/co-scheduling to make sure that we only start the driver pod if we can guarantee that all the executors can be started in the future by implementing some kind of a reservations design such that the driver can reserve resources for the executors that will be started in the future.
Also, this reservation definition/implementation must be visible to all the other non-spark pods as well since they would also have to log their resource requests like the Spark pods so we have a clear picture of the cluster resource utilization.
The current implementations include running a new custom scheduler or implementing a scheduler extender to do so but I was wondering if we can achieve this by writing custom scheduler plugins. Additionally, what extension points in the scheduling framework would the plugins have to take advantage of to optimize the scheduling of Spark jobs in a multi-tenant environment (with different kinds of workload) so that my default profile can continue to schedule the stateless workload while the custom profile that uses these plugins can schedule Spark applications?
Finally, would this be the best way to optimize scheduling Spark and Stateless workload in a multi-tenant environment? What would the drawbacks of this approach (using custom plugins) be since we only have a single queue that all the profiles must share?
It sounds like what you would like to have is Gang Scheduling 📆. If you'd like to have that capability, I suggest you use Volcano to schedule/run 🏃 your jobs in Kubernetes with Gang Scheduling.
Another approach is to create your own scheduler using the scheduler extender as described here or use the Palantir gang scheduler extender.
✌️

Is it possible to run Hive on Spark with YARN capacity scheduler?

I use Apache Hive 2.1.1-cdh6.2.1 (Cloudera distribution) with MR as execution engine and YARN's Resource Manager using Capacity scheduler.
I'd like to try Spark as an execution engine for Hive. While going through the docs, I found a strange limitation:
Instead of the capacity scheduler, the fair scheduler is required. This fairly distributes an equal share of resources for jobs in the YARN cluster.
Having all the queues set up properly, that's very undesirable for me.
Is it possible to run Hive on Spark with YARN capacity scheduler? If not, why?
I'm not sure you can execute Hive using spark Engines. I highly recommend you configure Hive to use Tez https://cwiki.apache.org/confluence/display/Hive/Hive+on+Tez which is faster than MR and it's pretty similar to Spark due to it uses DAG as the task execution engine.
We are running it at work using the command on Beeline as described https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started just writing it at the beginning of the sql file to run
set hive.execution.engine=spark;
select ... from table....
We are not using capacity scheduler because there are hundreds of jobs run per yarn queue, and when jobs are resource avid, we have other queues to let them run. That also allows designing a configuration based on job consumption per queue more realistic based on the actual need of the group of jobs
Hope this helps

how does resource offer works with mesos for a spark streaming application?

As I understand reading through mesos documentation, is a resource offer is done to a application/framework and it is upto the application to accept/reject offer.
I have a "never-ending" spark streaming app where I configured the executors/cores/memory I need parallelism. Aren't these resources acquired only once when my spark-app starts-up. That is, lets say, if my executors are idle are they handed back to mesos?
Does resource offer and acceptance happens only once in case of spark-streaming?
The same question can be extended for other long-running framework such as cassandra or YARN on mesos.
my understanding is that when spark-streaming is run on coarse-grained model resource exchanges happens once and resources are dedicated to executors for lifetime of spark app
The best source for Spark on Mesos would be the spark docs site here.
In the coarse-grained section you can see the following which answers your question:
The benefit of coarse-grained mode is much lower startup overhead, but at the cost of reserving Mesos resources for the complete duration of the application. To configure your job to dynamically adjust to its resource requirements, look into Dynamic Allocation.
If you look into Dynamic resource allocation you can potentially move around executor resources via a Spark Shuffle service. This can be achieved via the provided script by the Spark service, or via Marathon.

Spark on Mesos - running multiple Streaming jobs

I have 2 spark streaming jobs that I want to run, as well as keeping some available resources for batch jobs and other operations.
I evaluated Spark Standalone cluster manager, but I realized that I would have to fix the resources for two jobs, which would leave almost no computing power to batch jobs.
I started evaluating Mesos, because it has "fine grained" execution model, where resources are shifted between Spark applications.
1) Does it mean that a single core can be shifted between 2 streaming applications?
2) Although I have spark & cassandra, in order to exploit data locality, do I need to have dedicated core on each of the slave machines to avoid shuffling?
3) Would you recommend running Streaming jobs in "fine grained" or "course grained" mode. I know that logical answer is course grained (in order to minimize the latency of streaming apps) but what when resource in total cluster are limited (cluster of 3 nodes, 4 cores each - there are 2 streaming applications to run and multiple time to time batch jobs)
4) In Mesos, when I run spark streaming job in cluster mode, will it occupy 1 core permanently (like Standalone cluster manager is doing), or will that core execute driver process and sometimes act as executor?
Thank you
Fine grained mode is actually deprecated now. Even with it, each core is allocated to task until completion, but in Spark Streaming, each processing interval is a new job, so tasks only last as long the time it takes to process each interval's data. Hopefully that time is less than the interval time or your processing will back up, eventually running out of memory to store all those RDDs waiting for processing.
Note also that you'll need to have one core dedicated to each stream's Reader. Each will be pinned for the life of the stream! You'll need extra cores in case the stream ingestion needs to be restarted; Spark will try to use a different core. Plus you'll have a core tied up by your driver, if it's also running on the cluster (as opposed to on your laptop or something).
Still, Mesos is a good choice, because it will allocate the tasks to nodes that have capacity to run them. Your cluster sounds pretty small for what you're trying to do, unless the data streams are small themselves.
If you use the Datastax connector for Spark, it will try to keep input partitions local to the Spark tasks. However, I believe that connector assumes it will manage Spark itself, using Standalone mode. So, before you adopt Mesos, check to see if that's really all you need.

Multiple spark streaming contexts on one worker

I have single node cluster with 2 CPUs, where I want to run 2 spark streaming jobs.
I also want to use submit mode "cluster". I am using Standalone cluster manager.
When I submit one application, I see that driver consumes 1 core, and worker 1 core.
Does it mean that there are no cores available for other streaming job? Can 2 streaming jobs reuse executors?
It is totally confusing me, and I don't find it really clear in documentation.
Srdjan
Does it mean that there are no cores available for other streaming job?
If you have a single worker with 2 CPU's and you're deploying in Cluster mode, than you'll have no available cores as the worker has to use a dedicated core for tge driver process to run on your worker machine.
Can 2 streaming jobs reuse executors?
No, each job needs to allocate dedicated resources given by the cluster manager. If one job is running with all available resources, the next scheduled job will be in WAITING state until the first completes. You can see it in the Spark UI.

Resources