Tuning Spark: number of executors per node when cores available are uneven - apache-spark

I have read that having 5 cores per Executor in Spark achieves the optimal read/write throughput - so setting spark.executor.cores = 5 is usually desired. And also that you should subtract one core per node to allow for the underlying daemon processes to run.
So, determining the number of executors per node follows this formula:
executors per node = (cores per node - 1) / 5 cores per executor
However, what is the best approach in a scenario where you have 8 cores in each node machine?
1.4 executors per node = (8 - 1) / 5
First question - will Spark/yarn have an executor spanning multiple nodes?
If not - then I need to round. Which way should I go? It seems my options are:
1.) round down to 1 - meaning I'd only have 1 executor per node. I could increase the cores per executor, though don't know if I would get any benefit to that.
2.) round up to 2 - that means I'd have to decrease the cores per executor to 3 (8 cores available, - 1 for the daemons, and can't have 1/2 a core), which could decrease their efficiency.

Here spark.executor.cores = 5 is not a hard-lined value. Thumb rule is # of cores equal to or less than 5.
We need 1 core for OS & other Hadoop daemons. We are left with 7 cores per node.
Remember we need 1 executor for YARN out of all the executors.
When spark.executor.cores = 4 we cannot leave 1 executor for YARN, so I suggest I not take up this value.
When spark.executor.cores = 3 or spark.executor.cores = 2 after leaving one node for YARN we will always be left out with 1 executor per node.
Now which one is efficient for your code. Well that cannot be interpreted , it depends on multiple other factors like the amount of data used, # of joins used etc.
This is based on my understanding. It provides a start to explore multiple other options.
NOTE: If you are using some outer Java libraries & Datasets in your code, you might need to have 1 core per executor for preserving the type safety.
Hope it helps ...

Related

EMR and Spark tuning

Currently, I am running Spark job through EMR and working on Spark tuning. I read about number of executor per instance and memory calculation etc, and I got confused based on the current setup.
So currently it uses spark.dynamicAllocation.enabled as true by default from EMR and spark.executor.cores as 4 (not set by me, I assume it is by default). Also use one r6.xlarge (32 GiB of Memory, 4 vCPUs) for master and two for cores.
In this case, based on the formula: Number of executors per instance = (total number of virtual cores per instance - 1)/ spark.executors.cores , (4 - 1) / 4 = 0. Would it be correct?
When I check spark UI, it added many executors. What information did I miss from this..?

What is the theoretical max parallelism of tasks in my Foundry Job?

I know there's indications of parallelism (Task Concurrency) in my job's Spark Details page, but I'm wondering how this number is calculated since it doesn't match the number of Executors my job is running with?
There are 3 things that influence this:
TASK_CPUs (T)
Task CPUs control the number of cores given to an individual task. A typical setup will have 1 for this setting, meaning each task will operate with a single core.
EXECUTOR_CORES (C)
The number of cores allocated to each Executor running your job. A typical setup will have 2 cores per Executor for your job
NUM_EXECUTORS (E)
The number of executors allocated to your job. A typical setup will have 2 Executors for your job.
These are used together in your Foundry job like so:
The total number of cores available to do work in your cluster is C * E, therefore is typically 4 (2 * 2).
The amount of parallelism is the total number of cores / cores per task, or (C * E) / T, therefore typically (2 * 2) / 1, or 4
You therefore will typically see a max parallelism of 4 in your jobs; increasing your Executor count will therefore boost your max parallelism. Be wary of boosting your cores per executor, you may encounter problems

In spark, can I define more executors than available cores?

Say I have a total of 4 cores,
What happens if I define num of executors as 8..
Can we share a single core among 2 executors ?
Can the num of cores for a executor be a fraction ?
What is the impact on performance with this kind of config.
This is what I observed in spark standalone mode:
The total cores of my system are 4
if I execute spark-shell command with spark.executor.cores=2
Then 2 executors will be created with 2 core each.
But if I configure the no of executors more than available cores,
Then only one executor will be created, with the max core of the system.
The number of the core will never be of fraction value.
If you assign fraction value in the configuration, you will end up with exception:
Feel free to edit/correct the post if anything is wrong.

Spark UI on Google Dataproc: numbers interpretation

I'm running a spark job on a Google Dataproc cluster (3 nodes n1-highmem-4 so 4 cores and 26GB each, same type for the master).
I have a few questions about informations displayed on the Hadoop and the spark UI:
When I check the Hadoop UI I get this:
My question here is : my total RAM is supposed to be 84 (3x26) so why only 60Gb displayed here ? Is 24GB used for something else ?
2)
This is the screen showing currently launched executors.
My questions are:
Why only 10 cores are used ? Shouldn't we be able to launch a 6th executor using the 2 remaining cores since we have 12, and 2 seem be used per executor ?
Why 2 cores per executor ? Does it change anything if we run 12 executor with 1 core each instead ?
What is the "Input" column ? The total volume each executor received to analyze ?
3)
This is a screenshot of the "Storage" panel. I see the dataframe I'm working on.
I don't understand the "size in memory" column. Is it the total RAM used to cache the dataframe ? It seems very low compared to the size of row files I load into the Dataframe ( 500GB+ ). Is it a wrong interpretation ?
Thanks to anyone who will read this !
If you can take a look at this answer, it mostly answers your question 1 and 2.
To sum up, the total memory is less because some memory are reserved to run OS and system daemons or Hadoop daemons itself, e.g.Namenode, NodeManager.
Similar to cores, in your case it would be 3 nodes and each node runs 2 executors and each executor uses up 2 cores, except for the application master. For the node that application master lives in, there will be only one executor and the cores left are given to master. That's why you see only 5 executor and 10 cores.
For your 3rd question, that number should be the memory used up by the partitions in that RDD, which is approximately equal to memory allocated to each executor in your case it's ~13G.
Note that Spark doesn't load your 500G data at once instead it loads in data in partitions, the number of concurrently loaded partitions depend on the number of cores you have available.

How to tune spark executor number, cores and executor memory?

Where do you start to tune the above mentioned params. Do we start with executor memory and get number of executors, or we start with cores and get the executor number. I followed the link. However got a high level idea, but still not sure how or where to start and arrive to a final conclusion.
The following answer covers the 3 main aspects mentioned in title - number of executors, executor memory and number of cores. There may be other parameters like driver memory and others which I did not address as of this answer, but would like to add in near future.
Case 1 Hardware - 6 Nodes, and Each node 16 cores, 64 GB RAM
Each executor is a JVM instance. So we can have multiple executors in a single Node
First 1 core and 1 GB is needed for OS and Hadoop Daemons, so available are 15 cores, 63 GB RAM for each node
Start with how to choose number of cores:
Number of cores = Concurrent tasks as executor can run
So we might think, more concurrent tasks for each executor will give better performance. But research shows that
any application with more than 5 concurrent tasks, would lead to bad show. So stick this to 5.
This number came from the ability of executor and not from how many cores a system has. So the number 5 stays same
even if you have double(32) cores in the CPU.
Number of executors:
Coming back to next step, with 5 as cores per executor, and 15 as total available cores in one Node(CPU) - we come to
3 executors per node.
So with 6 nodes, and 3 executors per node - we get 18 executors. Out of 18 we need 1 executor (java process) for AM in YARN we get 17 executors
This 17 is the number we give to spark using --num-executors while running from spark-submit shell command
Memory for each executor:
From above step, we have 3 executors per node. And available RAM is 63 GB
So memory for each executor is 63/3 = 21GB.
However small overhead memory is also needed to determine the full memory request to YARN for each executor.
Formula for that over head is max(384, .07 * spark.executor.memory)
Calculating that overhead - .07 * 21 (Here 21 is calculated as above 63/3)
= 1.47
Since 1.47 GB > 384 MB, the over head is 1.47.
Take the above from each 21 above => 21 - 1.47 ~ 19 GB
So executor memory - 19 GB
Final numbers - Executors - 17, Cores 5, Executor Memory - 19 GB
Case 2 Hardware : Same 6 Node, 32 Cores, 64 GB
5 is same for good concurrency
Number of executors for each node = 32/5 ~ 6
So total executors = 6 * 6 Nodes = 36. Then final number is 36 - 1 for AM = 35
Executor memory is : 6 executors for each node. 63/6 ~ 10 . Over head is .07 * 10 = 700 MB. So rounding to 1GB as over head, we get 10-1 = 9 GB
Final numbers - Executors - 35, Cores 5, Executor Memory - 9 GB
Case 3
The above scenarios start with accepting number of cores as fixed and moving to # of executors and memory.
Now for first case, if we think we dont need 19 GB, and just 10 GB is sufficient, then following are the numbers:
cores 5
# of executors for each node = 3
At this stage, this would lead to 21, and then 19 as per our first calculation. But since we thought 10 is ok (assume little overhead), then we cant switch # of executors
per node to 6 (like 63/10). Becase with 6 executors per node and 5 cores it comes down to 30 cores per node, when we only have 16 cores. So we also need to change number of
cores for each executor.
So calculating again,
The magic number 5 comes to 3 (any number less than or equal to 5). So with 3 cores, and 15 available cores - we get 5 executors per node. So (5*6 -1) = 29 executors
So memory is 63/5 ~ 12. Over head is 12*.07=.84
So executor memory is 12 - 1 GB = 11 GB
Final Numbers are 29 executors, 3 cores, executor memory is 11 GB
Dynamic Allocation:
Note : Upper bound for the number of executors if dynamic allocation is enabled. So this says that spark application can eat away all the resources if needed. So in
a cluster where you have other applications are running and they also need cores to run the tasks, please make sure you do it at cluster level. I mean you can allocate
specific number of cores for YARN based on user access. So you can create spark_user may be and then give cores (min/max) for that user. These limits are for sharing between spark and other applications which run on YARN.
spark.dynamicAllocation.enabled - When this is set to true - We need not mention executors. The reason is below:
The static params number we give at spark-submit is for the entire job duration. However if dynamic allocation comes into picture, there would be different stages like
What to start with :
Initial number of executors (spark.dynamicAllocation.initialExecutors) to start with
How many :
Then based on load (tasks pending) how many to request. This would eventually be the numbers what we give at spark-submit in static way. So once the initial executor numbers are set, we go to min (spark.dynamicAllocation.minExecutors) and max (spark.dynamicAllocation.maxExecutors) numbers.
When to ask or give:
When do we request new executors (spark.dynamicAllocation.schedulerBacklogTimeout) - There have been pending tasks for this much duration. so request. number of executors requested in each round increases exponentially from the previous round. For instance, an application will add 1 executor in the first round, and then 2, 4, 8 and so on executors in the subsequent rounds. At a specific point, the above max comes into picture
when do we give away an executor (spark.dynamicAllocation.executorIdleTimeout) -
Please correct me if I missed anything. The above is my understanding based on the blog i shared in question and some online resources. Thank you.
References:
http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
http://spark.apache.org/docs/latest/job-scheduling.html#resource-allocation-policy
Also, it depends on your use case, an important config parameter is:
spark.memory.fraction(Fraction of (heap space - 300MB) used for execution and storage) from http://spark.apache.org/docs/latest/configuration.html#memory-management.
If you dont use cache/persist, set it to 0.1 so you have all the memory for your program.
If you use cache/persist, you can check the memory taken by:
sc.getExecutorMemoryStatus.map(a => (a._2._1 - a._2._2)/(1024.0*1024*1024)).sum
Do you read data from HDFS or from HTTP?
Again, a tuning depend on your use case.

Resources