In spark what is the meaning of spark.executor.pyspark.memory configuration option? - apache-spark

Documentation explanation is given as:
The amount of memory to be allocated to PySpark in each executor, in MiB unless otherwise specified. If set, PySpark memory for an executor will be limited to this amount. If not set, Spark will not limit Python's memory use, and it is up to the application to avoid exceeding the overhead memory space shared with other non-JVM processes. When PySpark is run in YARN or Kubernetes, this memory is added to executor resource requests.
Note: This feature is dependent on Python's resource module; therefore, the behaviours and limitations are inherited. For instance, Windows does not support resource limiting, and actual resource is not limited on macOS.
There are two other configuration options. One controlling the amount of memory allocated to each executor - spark.executor.memory and, another controlling the amount of memory that each python process within an executor can use before it starts to spill memory over to disk - spark.python.worker.memory
Can someone please explain what then is the behaviour and use of spark.executor.pyspark.memory configuration and in what ways is it different from spark.executor.memory and spark.python.worker.memory?

I extended my answer a little bit. And please, follow the links, at the end of the article, they are pretty useful and have some pictures that help to understand the whole picture of spark memory management.
We should dig into spark memory management(mm) to figure out what is spark.execution.pyspark.memory.
So, first of all, there are two big parts of spark mm:
Memory inside JVM;
Memory outside JVM.
Memory inside JVM is divided into 4 parts:
Storage memory - this memory is for spark cached data, broadcast variables, etc;
Execution memory - this memory is for storing data required during execution spark tasks;
User memory - this memory is for user purposes. You can store here your custom data structure, UDFs, UDAFs, etc;
Reserved memory - this memory is for spark purposes and it hardcoded to 300MB as of spark 1.6.
Memory outside JVM is divided into 2 parts:
OffHeap memory - this memory of things outside JVM, but for JVM purposes or this memory is used for Project Tungsten;
External process memory - this memory is specific for SparkR or PythonR and used by processes that resided outside of JVM.
So, the parameter spark.executor.memory(or --executor-memory for spar-submit) responds how much memory will allocate inside JVM Heap per exectuor. This memory will split between: reserved memory, user memory, execution memory, storage memory. To control this splitting we need 2 more parameters: spark.memory.fraction and spark.memory.storageFraction
According to spark documentation:
spark.memory.fraction is responsible for fraction of heap used for execution and storage;
spark.memory.storageFraction is responsible for to amount of
storage memory immune to eviction, expressed as a fraction of the
size of the region set aside by spark.memory.fraction. So if
storage memory isn't used, execution memory may acquire all the
available memory and vice versa. This parameter controls how much
memory execution can evict if necessary.
More details here
Please look pictures of Heap memory parts here
Finally, Heap will be split in a next way:
Reserved memory is hardcoded to 300MB
User memory will calculate as (spark.executor.memory - reserved memory) * (1 - spark.memory.fraction)
Spark memory(which consists of Storage memory and Execution memory) will calculate as (spark.executor.memory - reserved memory) * spark.memory.fraction. Then all this memory will split between Storage memory and Execution memory with spark.memory.storageFraction parameter.
The next parameter you asked about is spark.execution.pyspark.memory. It's a part of External process memory and it's responsible for how much memory python daemon will able to use. Python daemon is used, for example, for executing UDFs had written on python.
And the last one is spark.python.worker.memory. In this article I had found the next explanation: JVM process and Python process communicate to each other with py4J bridge that exposes objects between JVM and Python. So spark.python.worker.memory is controlling how much memory can be occupied by py4J for creating objects before spilling them to the disk.
You can read about mm more in the next articles:
Memory management inside JVM;
Decoding Memory in Spark — Parameters that are often confused;
One more SO answer which explaining offheap memory configuration
Hot to tune apache spark jobs

Related

Difference between "spark.yarn.executor.memoryOverhead" and "spark.memory.offHeap.size"

I am running spark on yarn. I don't understand what is the difference between the following settings spark.yarn.executor.memoryOverhead and spark.memory.offHeap.size. Both seem to be settings for allocating off-heap memory to spark executor. Which one should I use? Also what is the recommended setting for executor offheap memory?
Many thanks!
TL;DR: For Spark 1.x and 2.x, Total Off-Heap Memory = spark.executor.memoryOverhead (spark.offHeap.size included within)
For Spark 3.x, Total Off-Heap Memory = spark.executor.memoryOverhead + spark.offHeap.size (credit from this page)
Detailed explanation:
spark.executor.memoryOverhead is used by resource management like YARN, whereas spark.memory.offHeap.size is used by Spark core (memory manager). The relationship a bit different depending on the version.
Spark 2.4.5 and before:
spark.executor.memoryOverhead should include spark.memory.offHeap.size. This means that if you specify offHeap.size, you need to manually add this portion to memoryOverhead for YARN. As you can see from the code below from YarnAllocator.scala, when YARN request resource, it does not know anything about offHeap.size:
private[yarn] val resource = Resource.newInstance(
executorMemory + memoryOverhead + pysparkWorkerMemory,
executorCores)
However, the behavior is changed in Spark 3.0:
spark.executor.memoryOverhead does not include spark.memory.offHeap.size anymore. YARN will include offHeap.size for you when requesting resources. From the new documentation:
Note: Additional memory includes PySpark executor memory (when spark.executor.pyspark.memory is not configured) and memory used by other non-executor processes running in the same container. The maximum memory size of container to running executor is determined by the sum of spark.executor.memoryOverhead, spark.executor.memory, spark.memory.offHeap.size and spark.executor.pyspark.memory.
And from the code you can also tell:
private[yarn] val resource: Resource = {
val resource = Resource.newInstance(
executorMemory + executorOffHeapMemory + memoryOverhead + pysparkWorkerMemory, executorCores)
ResourceRequestHelper.setResourceRequests(executorResourceRequests, resource)
logDebug(s"Created resource capability: $resource")
resource
}
For more details of this change you can refer to this Pull Request.
For your second question, what is the recommended setting for executor offheap memory? It depends on your application and you need some testing. I found this page helpful to explain it further:
Off-heap memory is a great way to reduce GC pauses because it's not in the GC's scope. However, it brings an overhead of serialization and deserialization. The latter in its turn makes that the off-heap data can be sometimes put onto heap memory and hence be exposed to GC. Also, the new data format brought by Project Tungsten (array of bytes) helps to reduce the GC overhead. These 2 reasons make that the use of off-heap memory in Apache Spark applications should be carefully planned and, especially, tested.
BTW, spark.yarn.executor.memoryOverhead is deprecated and changed to spark.executor.memoryOverhead, which is common for YARN and Kubernetes.
spark.yarn.executor.memoryOverhead is used in StaticMemoryManager. This is used in older Spark Version like 1.2.
The amount of off heap memory (in megabytes) to be allocated per executor. This is memory that accounts for things like VM overheads, interned strings, other native overheads, etc. This tends to grow with the executor size (typically 6-10%).
You can find this in older Spark docs,like Spark1.2 docs:
https://spark.apache.org/docs/1.2.0/running-on-yarn.html
spark.memory.offHeap.size is used in UnifiedMemoryManager, which is used by default after version 1.6
The absolute amount of memory in bytes which can be used for off-heap allocation. This setting has no impact on heap memory usage, so if your executors' total memory consumption must fit within some hard limit then be sure to shrink your JVM heap size accordingly. This must be set to a positive value when spark.memory.offHeap.enabled=true.
You can find this in lates Spark docs,like Spark2.4 docs:
https://spark.apache.org/docs/2.4.4/configuration.html

Spark Driver Memory calculation

I know how to calculate executor cores and memory.But Can anyone explain on what basis spark.driver.memory is calculated ?
Operations on Datasets such as collect take require moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.
You increase spark.driver.memory when you collect large volumes to the driver.
As per
High Performance Spark by Holden Karau and Rachel Warren (O’Reilly)
most of the computational work of a Spark query is performed by the
executors, so increasing the size of the driver rarely speeds up a
computation. However, jobs may fail if they collect too much data to
the driver or perform large local computations. Thus, increasing the
driver memory and correspondingly the value of
spark.driver.maxResultSize may prevent the out-of-memory errors in
the driver.
A good heuristic for setting the Spark driver memory is simply the
lowest possible value that does not lead to memory errors in the
driver, i.e., which gives the maximum possible resources to the
executors.
Spark driver memory is the amount of memory to use for the driver process, i.e. the process running the main() function of the application and where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g).
JVM memory is divided into separate parts. At broad level, JVM Heap memory is physically divided into two parts – Young Generation and Old Generation.
Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC.
Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it’s full. Old Generation Garbage Collection is called Major GC and usually takes longer time.
Java Garbage Collection is the process to identify and remove the unused objects from the memory and free space to be allocated to objects created in the future processing. One of the best feature of java programming language is the automatic garbage collection, unlike other programming languages such as C where memory allocation and deallocation is a manual process.
Garbage Collector is the program running in the background that looks into all the objects in the memory and find out objects that are not referenced by any part of the program. All these unreferenced objects are deleted and space is reclaimed for allocation to other objects.
Sources:
https://spark.apache.org/docs/latest/configuration.html
https://www.journaldev.com/2856/java-jvm-memory-model-memory-management-in-java#java-memory-model-8211-permanent-generation

The actual executor memory does not match the executoy-memory I set

I hava a spark2.0.1 cluster with 1 Master(slaver1) and 2 worker(slaver2,slaver3),every machine has 2GB RAM.when I run the command
./bin/spark-shell --master spark://slaver1:7077 --executor-memory 500m
when I check the executor memory in the web (slaver1:4040/executors/). I found it is 110MB.
The memory you are talking about is Storage memory Actually Spark Divides the memory [Called Spark Memory] into 2 Region First is Storage Memory and Second is Execution Memory
The Total Memory can Be calculated by this Formula
(“Java Heap” – “Reserved Memory”) * spark.memory.fraction
Just to give you an overview Storage Memory is This pool is used for both storing Apache Spark cached data and for temporary space serialized data “unroll”. Also all the “broadcast” variables are stored there as cached blocks
If you want to check total memory provided you can go to Spark UI Spark-Master-Ip:8080[default port] in the start you can find Section called MEMORY that is total memory used by spark.
Thanks
From Spark 1.6 version, The memory is divided according to the following picture
There is no hard boundary between execution and storage memory. The storage memory is required more then it takes from execution memory and viceversa. The
Execution and storage memory is given by (ExecutorMemory-300Mb)* spark.memory.fraction
In your case (500-300)*).75 = 150mb there will be 3 to 5% error in Executor memory that is allocated.
300Mb is the reserved memory
User memory = (ExecutorMemory-300)*).(1-spark.memory.fraction).
In your case (500-300)*).25 = 50mb
Java Memory : Runtime.getRuntime().maxMemory()

spark spilling independent of executor memory assigned

I've noticed strange behavior when running a pyspark application with spark 2.0. In the first step in my script involving a reduceByKey (and thus shuffle) operation, I observe that the amount the shuffle writes is roughly in line with my expectations, but that much more spills occur than I had expected. I tried to avoid these spills by increasing the amount of memory assigned per executor up to 8x the original amount, but see basically no difference in the amount spilled. Strangely, I also see that while this stage is running, hardly any of the assigned storage memory is used (as reported in the executors tab in the spark web UI).
I saw this earlier question, which led me to believe that increasing executor memory might help avoid the spills: How to optimize shuffle spill in Apache Spark application
. This leads me to believe that some hard limit is leading to the spills, and not the spark.shuffle.memoryFraction parameter. Does such a hard limit exist, possibly among HDFS parameters? Otherwise, what could be done to avoid spills besides increasing executor memory?
Many thanks, R
Spilling behavior in PySpark is controlled using spark.python.worker.memory:
Amount of memory to use per python worker process during aggregation, in the same format as JVM memory strings (e.g. 512m, 2g). If the memory used during aggregation goes above this amount, it will spill the data into disks.
which is by default set to 512MB. Moreover PySpark uses its own reducing mechanism with External(GroupBy|Sorter|Merger) and exhibits slightly different behavior than its native counterpart.

How is spark dealing with driver memory? It is using more space then allocated

how it is possible that storage memory (of driver) which is limited by 5.2g as shown in image, is exceeded by spark to 36.5g?
How does this memory allocation takes place? does spark use disk apart from RAM by default settings? As indicated in UI it uses 16.5g of disc space (What is the limit on disk space use?).
Your first question:
How does this memory allocation takes place? does spark use disk apart from RAM by default settings?
Yes. Please read about block manager here. As it points out,
BlockManager manages the storage for blocks that can represent cached RDD partitions, intermediate shuffle outputs, broadcasts, etc. It is also a BlockEvictionHandler that drops a block from memory and storing it on a disk if applicable.
Your second question:
What is the limit on disk space use?
As far as I know, there is no limit. Executors start dying if they don't have enough space. You can refer this for more details.

Resources