Can the number of executor cores be greater than the total number of spark tasks? [duplicate] - apache-spark

What happens when number of spark tasks be greater than the executor core? How is this scenario handled by Spark

Is this related to this question?
Anyway, you can check this Cloudera How-to. In "Tuning Resource Allocation" section, It's explained that a spark application can request executors by turning on the dynamic allocation property. It's also important to set cluster properties such as num-executors, executor-cores, executor-memory... so that spark requests fit into what your resource manager has available.

yes, this scenario can happen. In this case some of the cores will be idle. Scenarios where this can happen:
You call coalesce or repartition with a number of partitions < number of cores
you use the default number of spark.sql.shuffle.partitions (=200)
and you have more than 200 cores available. This will be an issue for
joins, sorting and aggregation. In this case you may want to increase spark.sql.shuffle.partitions
Note that even if you have enough tasks, some (or most) of them could be empty. This can happen if you have a large data skew or you do something like groupBy() or Window without a partitionBy. In this case empty partitions will be finished immediately, turning most of your cores idle

I think the question is a little off beam. It is unlikely what you ask. Why?
With a lot of data you will have many partitions and you may repartition.
Say you have 10,000 partitions which equates to 10,000 tasks.
An executor (core) will serve a partition effectively a task (1:1 mapping) and when finished move on to the next task, until all tasks finished in the stage and then next will start (if it is in plan / DAG).
It's more likely you will not have a cluster of 10,000 executor cores at most places (for your App), but there are sites that have that, that is true.
If you have more cores allocated than needed, then they remain idle and non-usable for others. But with dynamic resource allocation, executors can be relinquished. I have worked with YARN and Spark Standalone, how this is with K8 I am not sure.
Transformations alter what you need in terms of resources. E.g. an order by may result in less partitions and thus may contribute to idleness.

Related

Can Spark executor be enabled for multithreading more than CPU cores?

I understand if executor-cores is set to more than 1, then the executor will run in parallel. However, from my experience, the number of parallel processes in the executor is always equal to the number of CPUs in the executor.
For example, suppose I have a machine with 48 cores and set executor-cores to 4, and then there will be 12 executors.
What we need is to run 8 threads or more for each executor (so 2 or more threads per CPU). The reason is that the task is quite light weight and CPU usage is quite low around 10%, so we want to boost CPU usage through multiple threads per CPU.
So asking if we could possibly achieve this in the Spark configuration. Thanks a lot!
Spark executors are processing tasks, which are derived from the execution plan/code and partitions of the dataframe. Each core on an executor is always processing only one task, so each executor only get the number of tasks at most the amount of cores. Having more tasks in one executor as you are asking for is not possible.
You should look for code changes, minimize amount of shuffles (no inner joins; use windows instead) and check out for skew in your data leading to non-uniformly distributed partition sizing (dataframe partitions, not storage partitions).
WARNING:
If you are however alone on your cluster and you do not want to change your code, you can change the YARN settings for the server and represent it with more than 48 cores, even though there are just 48. This can lead to severe instability of the system, since executors are now sharing CPUs. (And your OS also needs CPU power.)
This answer is meant as a complement to #Telijas' answer, because in general I agree with it. It's just to give that tiny bit of extra information.
There are some configuration parameters in which you can set the number of thread for certain parts of Spark. There is, for example, a section in the Spark docs that discusses some of them (for all of this I'm looking at the latest Spark version at the time of writing this post: version 3.3.1):
Depending on jobs and cluster configurations, we can set number of threads in several places in Spark to utilize available resources efficiently to get better performance. Prior to Spark 3.0, these thread configurations apply to all roles of Spark, such as driver, executor, worker and master. From Spark 3.0, we can configure threads in finer granularity starting from driver and executor. Take RPC module as example in below table. For other modules, like shuffle, just replace “rpc” with “shuffle” in the property names except spark.{driver|executor}.rpc.netty.dispatcher.numThreads, which is only for RPC module.
Property Name
Default
Meaning
Since Version
spark.{driver
executor}.rpc.io.serverThreads
Fall back on spark.rpc.io.serverThreads
Number of threads used in the server thread pool
spark.{driver
executor}.rpc.io.clientThreads
Fall back on spark.rpc.io.clientThreads
Number of threads used in the client thread pool
spark.{driver
executor}.rpc.netty.dispatcher.numThreads
Fall back on spark.rpc.netty.dispatcher.numThreads
Number of threads used in RPC message dispatcher thread pool
Then here follows a (non-exhaustive in no particular order, just been looking through the source code) list of some other number-of-thread-related configuration parameters:
spark.sql.streaming.fileSource.cleaner.numThreads
spark.storage.decommission.shuffleBlocks.maxThreads
spark.shuffle.mapOutput.dispatcher.numThreads
spark.shuffle.push.numPushThreads
spark.shuffle.push.merge.finalizeThreads
spark.rpc.connect.threads
spark.rpc.io.threads
spark.rpc.netty.dispatcher.numThreads (will be overridden by the driver/executor-specific ones from the table above)
spark.resultGetter.threads
spark.files.io.threads
I didn't add the meaning of these parameters to this answer because that's a different question and quite "Googleable". This is just meant as an extra bit of info.

can number of Spark task be greater than the executor core?

What happens when number of spark tasks be greater than the executor core? How is this scenario handled by Spark
Is this related to this question?
Anyway, you can check this Cloudera How-to. In "Tuning Resource Allocation" section, It's explained that a spark application can request executors by turning on the dynamic allocation property. It's also important to set cluster properties such as num-executors, executor-cores, executor-memory... so that spark requests fit into what your resource manager has available.
yes, this scenario can happen. In this case some of the cores will be idle. Scenarios where this can happen:
You call coalesce or repartition with a number of partitions < number of cores
you use the default number of spark.sql.shuffle.partitions (=200)
and you have more than 200 cores available. This will be an issue for
joins, sorting and aggregation. In this case you may want to increase spark.sql.shuffle.partitions
Note that even if you have enough tasks, some (or most) of them could be empty. This can happen if you have a large data skew or you do something like groupBy() or Window without a partitionBy. In this case empty partitions will be finished immediately, turning most of your cores idle
I think the question is a little off beam. It is unlikely what you ask. Why?
With a lot of data you will have many partitions and you may repartition.
Say you have 10,000 partitions which equates to 10,000 tasks.
An executor (core) will serve a partition effectively a task (1:1 mapping) and when finished move on to the next task, until all tasks finished in the stage and then next will start (if it is in plan / DAG).
It's more likely you will not have a cluster of 10,000 executor cores at most places (for your App), but there are sites that have that, that is true.
If you have more cores allocated than needed, then they remain idle and non-usable for others. But with dynamic resource allocation, executors can be relinquished. I have worked with YARN and Spark Standalone, how this is with K8 I am not sure.
Transformations alter what you need in terms of resources. E.g. an order by may result in less partitions and thus may contribute to idleness.

Number of Executor Cores and benefits or otherwise - Spark

Some run-time clarifications are requested.
In a thread elsewhere I read, it was stated that a Spark Executor should only have a single Core allocated. However, I wonder if this is really always true. Reading the various SO-questions and the likes of, as well as Karau, Wendell et al, it is clear that there are equal and opposite experts who state one should in some cases specify more Cores per Executor, but the discussion tends to be more technical than functional. That is to say, functional examples are lacking.
My understanding is that a Partition of an RDD or DF, DS, is serviced by a single Executor. Fine, no issue, makes perfect sense. So, how can the Partition benefit from multiple Cores?
If I have a map followed by, say a, filter, these are not two Tasks that can be interleaved - as in what Informatica does, as my understanding is they are fused together. This being so, then what is an example of benefit from an assigned Executor running more Cores?
From JL: In other (more technical) words, a Task is a computation on the records in a RDD partition in a Stage of a RDD in a Spark Job. What does it mean functionally speaking, in practice?
Moreover, can Executor be allocated if not all Cores can be acquired? I presume there is a wait period and that after a while it may be allocated in a more limited capacity. True?
From a highly rated answer on SO, What is a task in Spark? How does the Spark worker execute the jar file?, the following is stated: When you create the SparkContext, each worker starts an executor. From another SO question: When a SparkContext is created, each worker node starts an executor.
Not sure I follow these assertions. If Spark does not know the number of partitions etc. in advance, why allocate Executors so early?
I ask this, as even this excellent post How are stages split into tasks in Spark? does not give a practical example of multiple Cores per Executor. I can follow the post clearly and it fits in with my understanding of 1 Core per Executor.
My understanding is that a Partition (...) serviced by a single Executor.
That's correct, however the opposite is not true - a single executor can handle multiple partitions / tasks across multiple stages or even multiple RDDs).
then what is an example of benefit from an assigned Executor running more Cores?
First and foremost processing multiple tasks at the same time. Since each executor is a separate JVM, which is a relatively heavy process, it might preferable to keep only instance for a number of threads. Additionally it can provide further advantages, like exposing shared memory that can be used across multiple tasks (for example to store broadcast variables).
Secondary application is applying multiple threads to a single partition when user invokes multi-threaded code. That's however not something that is done by default (Number of CPUs per Task in Spark)
See also What are the benefits of running multiple Spark tasks in the same JVM?
If Spark does not know the number of partitions etc. in advance, why allocate Executors so early?
Pretty much by extension of the points made above - executors are not created to handle specific task / partition. There are long running processes, and as long as dynamic allocation is not enabled, there are intended to last for the full lifetime of the corresponding application / driver (preemption or failures, as well as already mentioned dynamic allocation, can affect that, but that's the basic model).

How to rebalance RDD during processing time for unbalanced executor workloads

Suppose I have an RDD with 1,000 elements and 10 executors. Right now I parallelize the RDD with 10 partitions and process 100 elements by each executor (assume 1 task per executor).
My difficulty is that some of these partitioned tasks may take much longer than others, so say 8 executors will be done quickly, while the remaining 2 will be stuck doing something for longer. So the master process will be waiting for the 2 to finished before moving on, and 8 will be idling.
What would be a way to make the idling executors 'take' some work from the busy ones? Unfortunately I can't anticipate ahead of time which ones will end up 'busier' than others, so can't balance the RDD properly ahead of time.
Can I somehow make executors communicate with each other programmatically? I was thinking of sharing a DataFrame with the executors, but based on what I see I cannot manipulate a DataFrame inside an executor?
I am using Spark 2.2.1 and JAVA
Try using spark dynamic resource allocation, which scales the number of executors registered with the application up and down based on the workload.
You can endable the below properties
spark.dynamicAllocation.enabled = true
spark.shuffle.service.enabled = true
You can consider to configure the below properties as well
spark.dynamicAllocation.executorIdleTimeout
spark.dynamicAllocation.maxExecutors
spark.dynamicAllocation.minExecutors
Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. This means that your application may give resources back to the cluster if they are no longer used and request them again later when there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.

what factors affect how many spark job concurrently

We recently have set up the Spark Job Server to which the spark jobs are submitted.But we found out that our 20 nodes(8 cores/128G Memory per node) spark cluster can only afford 10 spark jobs running concurrently.
Can someone share some detailed info about what factors would actually affect how many spark jobs can be run concurrently? How can we tune the conf so that we can take full advantage of the cluster?
Question is missing some context, but first - it seems like Spark Job Server limits the number of concurrent jobs (unlike Spark itself, which puts a limit on number of tasks, not jobs):
From application.conf
# Number of jobs that can be run simultaneously per context
# If not set, defaults to number of cores on machine where jobserver is running
max-jobs-per-context = 8
If that's not the issue (you set the limit higher, or are using more than one context), then the total number of cores in the cluster (8*20 = 160) is the maximum number of concurrent tasks. If each of your jobs creates 16 tasks, Spark would queue the next incoming job waiting for CPUs to be available.
Spark creates a task per partition of the input data, and the number of partitions is decided according to the partitioning of the input on disk, or by calling repartition or coalesce on the RDD/DataFrame to manually change the partitioning. Some other actions that operate on more than one RDD (e.g. union) may also change the number of partitions.
Some things that could limit the parallelism that you're seeing:
If your job consists of only map operations (or other shuffle-less operations), it will be limited to the number of partitions of data you have. So even if you have 20 executors, if you have 10 partitions of data, it will only spawn 10 task (unless the data is splittable, in something like parquet, LZO indexed text, etc).
If you're performing a take() operation (without a shuffle), it performs an exponential take, using only one task and then growing until it collects enough data to satisfy the take operation. (Another question similar to this)
Can you share more about your workflow? That would help us diagnose it.

Resources