Understanding spark.yarn.executor.memoryOverhead - apache-spark

When I am running a spark application on yarn, with driver and executor memory settings as --driver-memory 4G --executor-memory 2G
Then when I run the application, an exceptions throws complaining that Container killed by YARN for exceeding memory limits. 2.5 GB of 2.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
What does this 2.5 GB mean here? (overhead memory, executor memory or overhead+executor memory?)I ask so because when I change the the memory settings as:
--driver-memory 4G --executor-memory 4G --conf --driver-memory 4G --conf spark.yarn.executor.memoryOverhead=2048,then the exception disappears.
I would ask, although I have boosted the overhead memory to 2G, it is still under 2.5G, why does it work now?

Let us understand how memory is divided among various regions in spark.
Executor MemoryOverhead :
spark.yarn.executor.memoryOverhead = max(384 MB, .07 * spark.executor.memory).
In your first case, memoryOverhead = max(384 MB, 0.07 * 2 GB) = max(384 MB, 143.36 MB) Hence, memoryOverhead = 384 MB is reserved in each executer assuming you have assigned single core per executer.
Execution and Storage Memory :
By default spark.memory.fraction = 0.6, which implies that execution and storage as a unified region occupy 60% of the remaining memory i.e. 998 MB. There is no strict boundary that is allocated to each region unless you enable spark.memory.useLegacyMode. Otherwise they share a moving boundary.
User Memory :
Memory pool that remains after the allocation of Execution and Storage Memory, and it is completely up to you to use it in a way you like. You can store your own data structures there that would be used in RDD transformations. For example, you can rewrite Spark aggregation by using mapPartitions transformation maintaining hash table for this aggregation to run. This comprises the rest of 40% memory left after MemoryOverhead. In your case it is ~660 MB.
If any of the above allocations are not met by your job, then it is highly likely to end up in OOM problems.

Related

Allocated Memory and Reserved Memory of Application Master

I am trying to understand the 'Allocated Memory' and 'Reserve Memory' columns that are present in the screenshot. Screenshot from Application Master in YARN UI
The cluster settings that I have done in YARN are:
yarn_nodemanager_resource_memory-mb: 16GB
yarn_scheduler_minimum-allocation-mb: 256MB
yarn_scheduler_increment-allocation-mb: 500MB
yarn_scheduler_maximum-allocation-mb: 16GB
It is a single node cluster having 32GB of memory in total and 6 vCores.
Now, you can see from the screenshot that the 'Allocated Memory' is 8500MB. I would like to know how this is getting calculated.
One more thing - the driver memory specified is spark.driver.memory=10g
Allocated memory is either determined by:
The memory available in your cluster
The memory available in your queue
vCores * (executor memory + executor overhead)
In your case it looks like your allocated memory is limited by the third option. I'm guessing you didn't set spark.executor.memory or spark.executor.memoryOverhead because the memory you are getting is right in line with the default values. The Spark docs show that default values are:
spark.executor.memory = 1g
spark.executor.memoryOverhead = 0.1 * executor memory with a minimum of 384mb
This gives about 1400mb per core which multiple by your 6 cores lines up with the Allocated Memory you are seeing

The actual executor memory does not match the executoy-memory I set

I hava a spark2.0.1 cluster with 1 Master(slaver1) and 2 worker(slaver2,slaver3),every machine has 2GB RAM.when I run the command
./bin/spark-shell --master spark://slaver1:7077 --executor-memory 500m
when I check the executor memory in the web (slaver1:4040/executors/). I found it is 110MB.
The memory you are talking about is Storage memory Actually Spark Divides the memory [Called Spark Memory] into 2 Region First is Storage Memory and Second is Execution Memory
The Total Memory can Be calculated by this Formula
(“Java Heap” – “Reserved Memory”) * spark.memory.fraction
Just to give you an overview Storage Memory is This pool is used for both storing Apache Spark cached data and for temporary space serialized data “unroll”. Also all the “broadcast” variables are stored there as cached blocks
If you want to check total memory provided you can go to Spark UI Spark-Master-Ip:8080[default port] in the start you can find Section called MEMORY that is total memory used by spark.
Thanks
From Spark 1.6 version, The memory is divided according to the following picture
There is no hard boundary between execution and storage memory. The storage memory is required more then it takes from execution memory and viceversa. The
Execution and storage memory is given by (ExecutorMemory-300Mb)* spark.memory.fraction
In your case (500-300)*).75 = 150mb there will be 3 to 5% error in Executor memory that is allocated.
300Mb is the reserved memory
User memory = (ExecutorMemory-300)*).(1-spark.memory.fraction).
In your case (500-300)*).25 = 50mb
Java Memory : Runtime.getRuntime().maxMemory()

Spark: executor memory exceeds physical limit

My input dataset is about 150G.
I am setting
--conf spark.cores.max=100
--conf spark.executor.instances=20
--conf spark.executor.memory=8G
--conf spark.executor.cores=5
--conf spark.driver.memory=4G
but since data is not evenly distributed across executors, I kept getting
Container killed by YARN for exceeding memory limits. 9.0 GB of 9 GB physical memory used
here are my questions:
1. Did I not set up enough memory in the first place? I think 20 * 8G > 150G, but it's hard to make perfect distribution, so some executors will suffer
2. I think about repartition the input dataFrame, so how can I determine how many partition to set? the higher the better, or?
3. The error says "9 GB physical memory used", but i only set 8G to executor memory, where does the extra 1G come from?
Thank you!
When using yarn, there is another setting that figures into how big to make the yarn container request for your executors:
spark.yarn.executor.memoryOverhead
It defaults to 0.1 * your executor memory setting. It defines how much extra overhead memory to ask for in addition to what you specify as your executor memory. Try increasing this number first.
Also, a yarn container won't give you memory of an arbitrary size. It will only return containers allocated with a memory size that is a multiple of it's minimum allocation size, which is controlled by this setting:
yarn.scheduler.minimum-allocation-mb
Setting that to a smaller number will reduce the risk of you "overshooting" the amount you asked for.
I also typically set the below key to a value larger than my desired container size to ensure that the spark request is controlling how big my executors are, instead of yarn stomping on them. This is the maximum container size yarn will give out.
nodemanager.resource.memory-mb
The 9GB is composed of the 8GB executor memory which you add as a parameter, spark.yarn.executor.memoryOverhead which is set to .1, so the total memory of the container is spark.yarn.executor.memoryOverhead + (spark.yarn.executor.memoryOverhead * spark.yarn.executor.memoryOverhead) which is 8GB + (.1 * 8GB) ≈ 9GB.
You could run the entire process using a single executor, but this would take ages. To understand this you need to know the notion of partitions and tasks. The number of partition is defined by your input and the actions. For example, if you read a 150gb csv from hdfs and your hdfs blocksize is 128mb, you will end up with 150 * 1024 / 128 = 1200 partitions, which maps directly to 1200 tasks in the Spark UI.
Every single tasks will be picked up by an executor. You don't need to hold all the 150gb in memory ever. For example, when you have a single executor, you obviously won't benefit from the parallel capabilities of Spark, but it will just start at the first task, process the data, and save it back to the dfs, and start working on the next task.
What you should check:
How big are the input partitions? Is the input file splittable at all? If a single executor has to load a massive amount of memory, it will run out of memory for sure.
What kind of actions are you performing? For example, if you do a join with very low cardinality, you end up with a massive partitions because all the rows with a specific value, end up in the same partitions.
Very expensive or inefficient actions performed? Any cartesian product etc.
Hope this helps. Happy sparking!

Spark on YARN: Less executor memory than set via spark-submit

I'm using Spark in a YARN cluster (HDP 2.4) with the following settings:
1 Masternode
64 GB RAM (48 GB usable)
12 cores (8 cores usable)
5 Slavenodes
64 GB RAM (48 GB usable) each
12 cores (8 cores usable) each
YARN settings
memory of all containers (of one host): 48 GB
minimum container size = maximum container size = 6 GB
vcores in cluster = 40 (5 x 8 cores of workers)
minimum #vcores/container = maximum #vcores/container = 1
When I run my spark application with the command spark-submit --num-executors 10 --executor-cores 1 --executor-memory 5g ... Spark should give each executor 5 GB of RAM right (I set memory only to 5g due to some overhead memory of ~10%).
But when I had a look in the Spark UI, I saw that each executor only has 3.4 GB of memory, see screenshot:
Can someone explain why there's so less memory allocated?
The storage memory column in the UI displays the amount of memory used for execution and RDD storage. By default, this equals (HEAP_SPACE - 300MB) * 75%. The rest of the memory is used for internal metadata, user data structures and other stuffs.
You can control this amount by setting spark.memory.fraction (not recommended). See more in Spark's documentation

How does Spark occupy the memory

If my server has 50GB memory, Hbase is using 40GB. And when I run Spark I set the memory as --executor-memory 30G. So will Spark grab some memory from Hbase since there only 10GB left.
Another question, if Spark only need 1GB memory, but I gave Spark 10G memory, will Spark occupy 10GB memory.
The behavior will be different depending upon the deployment mode. In case you are using local mode, then --executor-memory will not change anything as you only have 1 Executor and that's your driver, so you need to increase the memory of your driver.
In case you are using Standalone mode and submitting your job in cluster mode then following would be applicable: -
--executor-memory is the memory required by per executor. It is the executors Heap Size. By Default 60% of the configured --executor-memory is used to cache RDDs. The remaining 40% of memory is available for any objects created during task execution. this is equivalent to -Xms and -Xmx. so in case you provide more memory then available then your executors will show errros regarding insufficient memory.
When you give Spark executor 30G memory, OS will not give it actual physical memory. But As and when your executor requires actual memory to either cache or processing this will cause your other processes like hbase to go on to swap. If your system's swap is set to zero then you will face OOM Error.
OS Swaps out idle part of the process which could make your process behave very slow.

Resources