I'm working with a Spark/YARN cluster that limits the resources I can allocate to 8GB memory and 1 core per container, but I can allocate hundreds, perhaps even thousands of executors to run my application on.
However since the driver has similar resource limitations (8GB memory, 4 cores), I'm concerned that too many executors may overwhelm the driver and cause timeouts.
Is there a rule of thumb for sizing the driver memory and cores to handle large numbers of executors?
There are rules on how to size your "executors".
For driver with 8GB and 4 core it should be able to handle thousands of executors easily as it only maintains bookkeeping metadata of the executors.
Given the assumption you are not having functions like collect() in your spark code.
Spark code analysis will help you to understand which actions in spark are performed where : http://bytepadding.com/big-data/spark/spark-code-analysis/
Related
There are several threads with significant votes that I am having difficulty interpreting, perhaps due to jargon of 2016 being different of that today? (or I am just not getting it, too)
Apache Spark: The number of cores vs. the number of executors
How to tune spark executor number, cores and executor memory?
Azure/Databricks offers some best practices on cluster sizing: https://learn.microsoft.com/en-us/azure/databricks/clusters/cluster-config-best-practices
So for my workload, lets say I am interested in (using Databricks current jargon):
1 Driver: Comprised of 64gb of memory and 8 cores
1 Worker: Comprised of 256gb of memory and 64 cores
Drawing on the above Microsoft link, fewer workers should in turn lead to less shuffle; among the most costly Spark operations.
So, I have 1 driver and 1 worker. How, then, do I translate these terms into what is discussed here on SO in terms of "nodes" and "executors".
Ultimately, I would like to set my Spark config "correctly" such that cores and memory are, as optimized as possible.
As per my research whenever we run the spark job we should not run the executors with more than 5 cores, if we increase the cores beyond the limit job will suffer due to bad I/O throughput.
my doubt is if we increase the number of executors and reduce the cores, even then these executors will be ending up in the same physical machine and those executors will be reading from the same disk and writing to the same disk, why will this not cause I/O throughput issue.
can consider
Apache Spark: The number of cores vs. the number of executors
use case for reference.
The core within the executor are like threads. So just like how more work is done if we increase parallelism, we should always keep in mind that there is a limit to it. Because we have to gather the results from those parallel tasks.
As is well known, It is possible to increase the number of cores when submitting our application. Actually, I'm trying to allocate all available cores on the server for the Spark application. I'm wondering what will happen to the performance? will it reduce or be better than usual?
The first thing about allocating cores (--executor-cores) might come in mind that more cores in an executor means more parallelism, more tasks will be executed concurrently, better performance. But it's not true for spark ecosystem. After leaving 1 core for os and other application running in the worker, Study has shown that it's optimal to allocate 5 cores for each executor.
For example, if you have a worker node with 16 cores, the optimal total executors and cores per executor will be --num-executors 3 and --executor-cores 5 (as 5*3=15) respectively.
Not only optimal resource allocation brings better performance, it also depends on how the transformations and actions are done on dataframes. More shuffling of data between different executors hampers in performance.
your operating system always need resource for its bare need.
It good to keep 1 core and 1 GB memory for operating system and for other applications.
If you allocate all resource to spark then it will not going to improve your performance, your other applications starve for resources.
I think its not better idea to allocate all resources to spark only.
Follow below post if you want to tune your spark cluster
How to tune spark executor number, cores and executor memory?
I am talking about the standalone mode of spark.
Lets say the value of SPARK_WORKER_CORES=200 and there are only 4 cores available on the node where I am trying to start the worker. Will the worker get 4 cores and continue or will the worker not start at all ?
A similar case, If I set SPARK_WORKER_MEMORY=32g and there is only 2g of memory actually available on that node ?
"Cores" in Spark is sort of a misnomer. "Cores" actually corresponds to the number of threads created to process data. So, you could have an arbitrarily large number of cores without an explicit failure. That being said, overcommitting by 50x will likely lead to incredibly poor performance due to context switching and overhead costs. This means that for both workers and executors you can arbitrarily increase this number. In practice in Spark Standalone, I've generally seen this overcommitted no more than 2-3x the number of logical cores.
When it comes to specifying worker memory, once again, you can in theory increase it to an arbitrarily large number. This is because, for a worker, the memory amount specifies how much it is allowed to allocate for executors, but it doesn't explicitly allocate that amount when you start the worker. Therefore you can make this value much larger than physical memory.
The caveat here is that when you start up an executor, if you set the executor memory to be greater than the amount of physical memory, your executors will fail to start. This is because executor memory directly corresponds to the -Xmx setting of the java process for that executor.
When I submit job spark in yarn cluster I see spark-UI I get 4 stages of jobs but, memory used is very low in all nodes and it says 0 out of 4 gb used. I guess that might be because I left it in default partition.
Files size ranges are betweenr 1 mb to 100 mb in s3. There are around 2700 files with size of 26 GB. And exactly same 2700 jobs were running in stage 2.
Is it worth to repartition something around 640 partitons, would it improve the performace? or
It doesn't matter if partition is granular than actually required? or
My submit parameters needs to be addressed?
Cluster details,
Cluster with 10 nodes
Overall memory 500 GB
Overall vCores 64
--excutor-memory 16 g
--num-executors 16
--executor-cores 1
Actually it runs on 17 cores out of 64. I dont want to increase the number of cores since others might use the cluster.
You partition, and repartition for following reasons:
To make sure we have enough work to distribute to the distinct cores in our cluster (nodes * cores_per_node). Obviously we need to tune the number of executors, cores per executor, and memory per executor to make that happen as intended.
To make sure we evenly distribute work: the smaller the partitions, the lesser the chance than one core might have much more work to do than all other cores. Skewed distribution can have a huge effect on total lapse time if the partitions are too big.
To keep partitions in managable sizes. Not to big, and not to small so we dont overtax GC. Also bigger partitions might have issues when we have non-linear O.
To small partitions will create too much process overhead.
As you might have noticed, there will be a goldilocks zone. Testing will help you determine ideal partition size.
Note that it is ok to have much more partitions than we have cores. Queuing partitions to be assigned a task is something that I design for.
Also make sure you configure your spark job properly otherwise:
Make sure you do not have too many executors. One or Very Few executors per node is more than enough. Fewer executors will have less overhead, as they work in shared memory space, and individual tasks are handled by threads instead of processes. There is a huge amount of overhead to starting up a process, but Threads are pretty lightweight.
Tasks need to talk to each other. If they are in the same executor, they can do that in-memory. If they are in different executors (processes), then that happens over a socket (overhead). If that is over multiple nodes, that happens over a traditional network connection (more overhead).
Assign enough memory to your executors. When using Yarn as the scheduler, it will fit the executors by default by their memory, not by the CPU you declare to use.
I do not know what your situation is (you made the node names invisible), but if you only have a single node with 15 cores, then 16 executors do not make sense. Instead, set it up with One executor, and 16 cores per executor.