Managing cluster and yarn utilization - apache-spark

In our cluster, we have minimum container size as 8 GB
most of the hive queries use 1 container. ( but surely may not use all the memory allocated )
some of the spark jobs just use 2GB or 4GB
we don't use that much memory for all our queries as per observation.
still all containers are used up.
So, is there anyway we can manage effectively.
We have total of 30 vcores total of 275 GB of memory
as we have to allocate 1 vcore per container, that bottles to 30 containers
is there a way i can efficiently leverage all 8gb of container?
or increase container number or do some other things.
any suggestions will help

Related

In Spark, is it better to have many small workers or few bigger workers

A Spark cluster consists of a driver that distributes tasks to multiple worker nodes. Each worker can take up a number of tasks equal to the amount of cores available. So I'd think that the speed at which a task finishes depends on the total available cores.
Consider the following cluster configurations, using AWS EC2 as an example:
2 m5.4xlarge (16 vCPU/cores, 64GB RAM) workers for a total of 32 cores / 128GB RAM
OR
8 m5.xlarge (4 vCPU/cores, 16GB RAM) workers for a total of 32 cores / 128GB RAM
I'm using those instances as an example; it's not about those instances specifically but about the general idea that you can have the same total amount of cores + RAM with different configurations. Would there be any difference between the performance of those two cluster configurations? Both would have the same total amount of cores and RAM, and same ratio of RAM/core. For what kind of job would you choose one and for what the other? Some thoughts I have on this myself:
The configuration with 8 smaller instances might have a higher total network bandwidth since each workers has it's own connection
The configuration with 2 bigger instances might be more efficient when shuffling, since more cores can share the memory on a worker instead of having to shuffle across the network, so lower network overhead
The configuration with 8 smaller instances has better resiliency, since if one worker fails it's only one out of eight failing rather than one out of two.
Do you agree with the statements above? What other considerations would you make when choosing between different configurations with equal amount of total RAM / cores?

How to calculate the executors, cores and memory based on a given input file size in spark?

Lets say I have 5 GB Input File and I have Cluster Setup of 3 Data Nodes with each 25 cores (Total - 75 cores) and 72GB memory (Total - 216GB Memory).
How to calculate number of executors, number of cores and executor memory for this particular file size and memory configuration.
How many blocks will create in HDFS for this file?
Method for executor resource calculation:
Allocate 1 core and 1 GB per node for yarn. So, we are left with 72 core and 213 GB memory.
The remaining resource have ~1:2.95 core to (GB) memory ratio.
The optimal CPU count per executor is 5. So, to prevent underutilisation of CPU or memory resource, the executor’s optimal resource per executor will be 14.7GB(5*2.95) memory and 5 CPU.
You should keep block size as 128MB and use same as spark parameter:
spark.sql.files.maxPartitionBytes=134217728.
It will result in 40 blocks or partitions.
You can refer this article for more details. https://devendraparhate.medium.com/apache-spark-job-aws-emr-cluster-s3-yarn-and-hdfs-tuning-77514afb9ce8

Spark Shuffle partition - if I have shuffle partition less than number of cores what would happen?

I am using databricks with Azure, so I don't have a way to provide the number of executors and memory per executors.
Let's consider I have the following configuration.
10 Worker nodes, each with 4 cores and 10 GB of memory.
it's a standalone configuration
input read size is 100 GB
now if I set my shuffle partition to 10, (less than total cores, 40). What would happen?
will it create total of 10 executors, one per node, with each executor occupying all the cores and all the memory?
If you don't use dynamic allocation, you will end up leaving most cores unused during execution. Think about you have 40 "slots" for computation available, but only 10 tasks to process, so 30 "slots" will be empty (just idle).
I have to add that the above is a very simplified situation. In reality, you can have multiple stages running in parallel, so depending on your query, you will still have all 40 cores utilized (see e.g. Does stages in an application run parallel in spark?)
Note also that spark.sql.shuffle.partitions is not the only parameter which determines the number of tasks/partitions. You can have different number of partitions for
reading files
if you modify your query using repartition, e.g. when using :
df
.repartition(100,$"key")
.groupBy($"key").count
your value of spark.sql.shuffle.partitions=10 will be overwritten by 100 in this exchange step
What your discribing as an expectation is named dynamic allocation on Spark. You can provide min and max allocation and then depending on amount of partiton the framework will be scaled. https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
But with only 10 partition on a 100 gb file you will have outOfMemoryErrors

EMR Cluster utilization

I have a 20 mode c4.4xlarge cluster to run a spark job. Each node is a 16 vCore, 30 GiB memory, EBS only storage EBS Storage:32 GiB machine.
Since each node has 16 vCore, I understand that maximum number of executors are 16*20 > 320 executors. Total memory available is 20(#nodes)*30 ~ 600GB. Assigning 1/3rd to system operations, I have 400 GB of Memory to process my data in-memory. Is this the right understanding.
Also, Spark History shows non-uniform distribution of input and shuffle. I believe the processing is not distributed evenly across executors. I pass these config parameters in my spark-submit -
> —-conf spark.dynamicAllocation.enabled=true  —-conf spark.dynamicAllocation.minExecutors=20
Executor summary from spark history UI also shows that data distribution load is completely skewed, and I am not using the cluster in the best way. How can I distribute my load in a better way -

Max possible number of executors in cluster

Let's say I have 5 worker nodes in a cluster and each node has 48 cores and 256 GB RAM.
Then what are the maximum number of executors possible in the clusters?
will cluster have 5*48 = 240 executors or only 5 executors?
Or there are some other factors that will decide the number of executors in a cluster, then what are they?
Thanks.
The number of executors is related to the amount of parallelism your application need. You can create 5*48 executor with 1 core each, but there's others processes that should be considered, like memory overhead, cluster management process, scheduler, so you may need to reserve 2-5 cores/node to management processes.
I don't know what architecture your cluster use, but this article is a good start if you are using hadoop , https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html

Resources