Where to set "spark.yarn.executor.memoryOverhead" - apache-spark

I am getting following error while running my spark-scala program.
YarnSchedulerBackends$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 2.6GB of 2.5GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.
I have set spark.yarn.executor.memoryOverhead in the program while creating SparkSession.
My question is - is it ok to set "spark.yarn.executor.memoryOverhead" while creating SparkSession or should it be passed during runtime with spark-submit?

You have to set the spark.yarn.executor.memoryOverhead at the time of sparkSession creation. This parameter is used as amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
Now this allocation can only be done at the time of allocation of the executor not at the runtime.

Related

In spark what is the meaning of spark.executor.pyspark.memory configuration option?

Documentation explanation is given as:
The amount of memory to be allocated to PySpark in each executor, in MiB unless otherwise specified. If set, PySpark memory for an executor will be limited to this amount. If not set, Spark will not limit Python's memory use, and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes. When PySpark is run in YARN or Kubernetes, this memory is added to executor resource requests.
Note: This feature is dependent on Python's resource module; therefore, the behaviours and limitations are inherited. For instance, Windows does not support resource limiting, and actual resource is not limited on macOS.
There are two other configuration options. One controlling the amount of memory allocated to each executor - spark.executor.memory and, another controlling the amount of memory that each python process within an executor can use before it starts to spill memory over to disk - spark.python.worker.memory
Can someone please explain what then is the behaviour and use of spark.executor.pyspark.memory configuration and in what ways is it different from spark.executor.memory and spark.python.worker.memory?
I extended my answer a little bit. And please, follow the links, at the end of the article, they are pretty useful and have some pictures that help to understand the whole picture of spark memory management.
We should dig into spark memory management(mm) to figure out what is spark.execution.pyspark.memory.
So, first of all, there are two big parts of spark mm:
Memory inside JVM;
Memory outside JVM.
Memory inside JVM is divided into 4 parts:
Storage memory - this memory is for spark cached data, broadcast variables, etc;
Execution memory - this memory is for storing data required during execution spark tasks;
User memory - this memory is for user purposes. You can store here your custom data structure, UDFs, UDAFs, etc;
Reserved memory - this memory is for spark purposes and it hardcoded to 300MB as of spark 1.6.
Memory outside JVM is divided into 2 parts:
OffHeap memory - this memory of things outside JVM, but for JVM purposes or this memory is used for Project Tungsten;
External process memory - this memory is specific for SparkR or PythonR and used by processes that resided outside of JVM.
So, the parameter spark.executor.memory(or --executor-memory for spar-submit) responds how much memory will allocate inside JVM Heap per exectuor. This memory will split between: reserved memory, user memory, execution memory, storage memory. To control this splitting we need 2 more parameters: spark.memory.fraction and spark.memory.storageFraction
According to spark documentation:
spark.memory.fraction is responsible for fraction of heap used for execution and storage;
spark.memory.storageFraction is responsible for to amount of
storage memory immune to eviction, expressed as a fraction of the
size of the region set aside by spark.memory.fraction. So if
storage memory isn't used, execution memory may acquire all the
available memory and vice versa. This parameter controls how much
memory execution can evict if necessary.
More details here
Please look pictures of Heap memory parts here
Finally, Heap will be split in a next way:
Reserved memory is hardcoded to 300MB
User memory will calculate as (spark.executor.memory - reserved memory) * (1 - spark.memory.fraction)
Spark memory(which consists of Storage memory and Execution memory) will calculate as (spark.executor.memory - reserved memory) * spark.memory.fraction. Then all this memory will split between Storage memory and Execution memory with spark.memory.storageFraction parameter.
The next parameter you asked about is spark.execution.pyspark.memory. It's a part of External process memory and it's responsible for how much memory python daemon will able to use. Python daemon is used, for example, for executing UDFs had written on python.
And the last one is spark.python.worker.memory. In this article I had found the next explanation: JVM process and Python process communicate to each other with py4J bridge that exposes objects between JVM and Python. So spark.python.worker.memory is controlling how much memory can be occupied by py4J for creating objects before spilling them to the disk.
You can read about mm more in the next articles:
Memory management inside JVM;
Decoding Memory in Spark — Parameters that are often confused;
One more SO answer which explaining offheap memory configuration
Hot to tune apache spark jobs

Difference between "spark.yarn.executor.memoryOverhead" and "spark.memory.offHeap.size"

I am running spark on yarn. I don't understand what is the difference between the following settings spark.yarn.executor.memoryOverhead and spark.memory.offHeap.size. Both seem to be settings for allocating off-heap memory to spark executor. Which one should I use? Also what is the recommended setting for executor offheap memory?
Many thanks!
TL;DR: For Spark 1.x and 2.x, Total Off-Heap Memory = spark.executor.memoryOverhead (spark.offHeap.size included within)
For Spark 3.x, Total Off-Heap Memory = spark.executor.memoryOverhead + spark.offHeap.size (credit from this page)
Detailed explanation:
spark.executor.memoryOverhead is used by resource management like YARN, whereas spark.memory.offHeap.size is used by Spark core (memory manager). The relationship a bit different depending on the version.
Spark 2.4.5 and before:
spark.executor.memoryOverhead should include spark.memory.offHeap.size. This means that if you specify offHeap.size, you need to manually add this portion to memoryOverhead for YARN. As you can see from the code below from YarnAllocator.scala, when YARN request resource, it does not know anything about offHeap.size:
private[yarn] val resource = Resource.newInstance(
executorMemory + memoryOverhead + pysparkWorkerMemory,
executorCores)
However, the behavior is changed in Spark 3.0:
spark.executor.memoryOverhead does not include spark.memory.offHeap.size anymore. YARN will include offHeap.size for you when requesting resources. From the new documentation:
Note: Additional memory includes PySpark executor memory (when spark.executor.pyspark.memory is not configured) and memory used by other non-executor processes running in the same container. The maximum memory size of container to running executor is determined by the sum of spark.executor.memoryOverhead, spark.executor.memory, spark.memory.offHeap.size and spark.executor.pyspark.memory.
And from the code you can also tell:
private[yarn] val resource: Resource = {
val resource = Resource.newInstance(
executorMemory + executorOffHeapMemory + memoryOverhead + pysparkWorkerMemory, executorCores)
ResourceRequestHelper.setResourceRequests(executorResourceRequests, resource)
logDebug(s"Created resource capability: $resource")
resource
}
For more details of this change you can refer to this Pull Request.
For your second question, what is the recommended setting for executor offheap memory? It depends on your application and you need some testing. I found this page helpful to explain it further:
Off-heap memory is a great way to reduce GC pauses because it's not in the GC's scope. However, it brings an overhead of serialization and deserialization. The latter in its turn makes that the off-heap data can be sometimes put onto heap memory and hence be exposed to GC. Also, the new data format brought by Project Tungsten (array of bytes) helps to reduce the GC overhead. These 2 reasons make that the use of off-heap memory in Apache Spark applications should be carefully planned and, especially, tested.
BTW, spark.yarn.executor.memoryOverhead is deprecated and changed to spark.executor.memoryOverhead, which is common for YARN and Kubernetes.
spark.yarn.executor.memoryOverhead is used in StaticMemoryManager. This is used in older Spark Version like 1.2.
The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
You can find this in older Spark docs,like Spark1.2 docs:
https://spark.apache.org/docs/1.2.0/running-on-yarn.html
spark.memory.offHeap.size is used in UnifiedMemoryManager, which is used by default after version 1.6
The absolute amount of memory in bytes which can be used for off-heap allocation. This setting has no impact on heap memory usage, so if your executors' total memory consumption must fit within some hard limit then be sure to shrink your JVM heap size accordingly. This must be set to a positive value when spark.memory.offHeap.enabled=true.
You can find this in lates Spark docs,like Spark2.4 docs:
https://spark.apache.org/docs/2.4.4/configuration.html

The actual executor memory does not match the executoy-memory I set

I hava a spark2.0.1 cluster with 1 Master(slaver1) and 2 worker(slaver2,slaver3),every machine has 2GB RAM.when I run the command
./bin/spark-shell --master spark://slaver1:7077 --executor-memory 500m
when I check the executor memory in the web (slaver1:4040/executors/). I found it is 110MB.
The memory you are talking about is Storage memory Actually Spark Divides the memory [Called Spark Memory] into 2 Region First is Storage Memory and Second is Execution Memory
The Total Memory can Be calculated by this Formula
(“Java Heap” – “Reserved Memory”) * spark.memory.fraction
Just to give you an overview Storage Memory is This pool is used for both storing Apache Spark cached data and for temporary space serialized data “unroll”. Also all the “broadcast” variables are stored there as cached blocks
If you want to check total memory provided you can go to Spark UI Spark-Master-Ip:8080[default port] in the start you can find Section called MEMORY that is total memory used by spark.
Thanks
From Spark 1.6 version, The memory is divided according to the following picture
There is no hard boundary between execution and storage memory. The storage memory is required more then it takes from execution memory and viceversa. The
Execution and storage memory is given by (ExecutorMemory-300Mb)* spark.memory.fraction
In your case (500-300)*).75 = 150mb there will be 3 to 5% error in Executor memory that is allocated.
300Mb is the reserved memory
User memory = (ExecutorMemory-300)*).(1-spark.memory.fraction).
In your case (500-300)*).25 = 50mb
Java Memory : Runtime.getRuntime().maxMemory()

the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead is used to store what kind of data?

I wondered that :
spark use the spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead to store what kind of data?
And in which case i should boost the value of spark.yarn.driver.memoryOverhead or spark.yarn.executor.memoryOverhead?
In YARN terminology, executors and application masters run inside “containers”. Spark offers yarn specific properties so you can run your application :
spark.yarn.executor.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
spark.yarn.driver.memoryOverhead is the amount of off-heap memory (in megabytes) to be allocated per driver in cluster mode with the memory properties as the executor's memoryOverhead.
So it's not about storing data, it's just the resources needed for YARN to run properly.
In some cases,
e.g if you enable dynamicAllocation you might want to set these properties explicitly along with the maximum number of executor (spark.dynamicAllocation.maxExecutors) that can be created during the process which can easily overwhelm YARN by asking for thousands of executors and thus loosing the already running executors.
spark.dynamicAllocation.maxExecutors is set to infinity by default which set the upper bound for the number of executors if dynamic allocation is enabled. [ref.http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
According to the code documentation : [ref.https://github.com/apache/spark/blob/8ef3399aff04bf8b7ab294c0f55bcf195995842b/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala#L43]
Increasing the target number of executors happens in response to backlogged tasks waiting to be scheduled. If the scheduler queue is not drained in N seconds, then new executors are added. If the queue persists for another M seconds, then more executors are added and so on. The number added in each round increases exponentially from the previous round until an upper bound has been reached. The upper bound is based both on a configured property and on the current number of running and pending tasks, as described above.
This can lead into an exponential increase of the number of executors in some cases which can break the YARN resource manager. In my case :
16/03/31 07:15:44 INFO ExecutorAllocationManager: Requesting 8000 new executors because tasks are backlogged (new desired total will be 40000)
This doesn't cover all the use case which one can use those property, but it gives a general idea about how it's been used.

How does Spark occupy the memory

If my server has 50GB memory, Hbase is using 40GB. And when I run Spark I set the memory as --executor-memory 30G. So will Spark grab some memory from Hbase since there only 10GB left.
Another question, if Spark only need 1GB memory, but I gave Spark 10G memory, will Spark occupy 10GB memory.
The behavior will be different depending upon the deployment mode. In case you are using local mode, then --executor-memory will not change anything as you only have 1 Executor and that's your driver, so you need to increase the memory of your driver.
In case you are using Standalone mode and submitting your job in cluster mode then following would be applicable: -
--executor-memory is the memory required by per executor. It is the executors Heap Size. By Default 60% of the configured --executor-memory is used to cache RDDs. The remaining 40% of memory is available for any objects created during task execution. this is equivalent to -Xms and -Xmx. so in case you provide more memory then available then your executors will show errros regarding insufficient memory.
When you give Spark executor 30G memory, OS will not give it actual physical memory. But As and when your executor requires actual memory to either cache or processing this will cause your other processes like hbase to go on to swap. If your system's swap is set to zero then you will face OOM Error.
OS Swaps out idle part of the process which could make your process behave very slow.

Resources