How to run memory intensive shell script from PySpark rdd.mapPartitions - apache-spark

Let's say I have a Spark cluster with 32gb of RAM nodes. 1G of executor memory is enough for processing any of my data.
There is a Linux shell program (program) that I need to run for each partition. It would sound easy if it's simple Linux pipe script, but the program requires 10GB of memory for each run. My initial assumption was that I can just increase executor memory to 11GB and Spark will use one executor per partition for 1G and the other 10G will be allocated for the program that will run in context of executor. But it's not. It's using 11GB for 1G of Spark data and after that it runs the 10GB program in available node memory.
So, I've changed the executor memory back to 1GB and decided to play with cores, instances and yarn. I've tried to use:
--executor_memory 1G
--driver_memory 1G
--executor_cores 1
--num_executors 1
and for YARN 32GB - (10GB * 2 running programs per node) = 12G - 4G for OS = 8G * 1024M = :
"yarn.nodemanager.resource.memory-mb": "8192",
"yarn.scheduler.maximum-allocation-mb": "8192"
Because I'm using 1G per executor, Spark starts 8192 / (1024 * 1.18 of overhead) ~ 6 executors per node. Definitely if each executor will start 10GB program, there will be no RAM to do this. I've increased executor memory to reduce number of executors per node to 2 with executor memory = 3GB
Now it runs 2 executors per node, but the program still fails with Out of Memory exception.
I've added a code to check available memory right before starting the program
total_memory, used_memory, free_memory, shared_memory, cache, available_memory = map(
int, os.popen('free -t -m | grep Mem:').readlines()[0].split()[1:])
But even if available_memory is > 10G the program starts, but it's running out of memory in a middle (it runs for about 4 mins).
Is there a way to allocate memory for external script on a executor nodes? Or maybe there is a workaround for this?
I would appreciate any HELP!!!
Thanks in advance,
Orka

The answer is simple. My resource calculation is correct. All what I changed is
spark.dynamicAllocation.enabled=false
It was true by default and Spark tried to start as many executors as it can on each node.

Related

How spark manages physical memory, virtual memory and executor memory?

As I have been working on Spark for a few days, I get confused around spark memory management. I see terms like physical memory, virtual memory, executor memory, memory overhead and these values don't add up properly as per my current understanding. Can someone explain these things in terms of spark in a simple way?
E.g., I'm running a spark job with following configurations in cluster-mode:
spark_conf = SparkConf() \
.set("spark.executor.memory", "10g") \
.set("spark.executor.cores", 4) \
.set("spark.executor.instances", 30) \
.set("spark.dynamicAllocation.enabled", False)
But I get an error like this:
Failing this attempt.Diagnostics: [2020-08-18 11:57:54.479]
Container [pid=96571,containerID=container_1588672785288_540114_02_000001]
is running 62357504B beyond the 'PHYSICAL' memory limit.
Current usage: 1.6 GB of 1.5 GB physical memory used;
3.7 GB of 3.1 GB virtual memory used. Killing container.
How physical memory and virtual memory allocations are done w.r.t. executor memory and memory overhead?
Also when I run the same job in client-mode with the same configurations, it runs successfully. Why is it so? The only thing that gets changed in client-mode is the driver and I don't have any code which aggregates data to the driver.
When you see the option value,
yarn.nodemanager.vmem-pmem-ratio 2.1
the default ratio between physical and virtual memory is 2.1. You can calculate the physical memory from the total memory of the yarn resource manager divide by number of containers, i.e executors without driver things.
Here is an article but there will be more good articles how yarn allocate the physical memory.

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

I'm new to Spark on YARN and don't understand the relation between the YARN Containers and the Spark Executors. I tried out the following configuration, based on the results of the yarn-utils.py script, that can be used to find optimal cluster configuration.
The Hadoop cluster (HDP 2.4) I'm working on:
1 Master Node:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
SSD: 2 x 512 GB
5 Slave Nodes:
CPU: 2 CPUs with 6 cores each = 12 cores
RAM: 64 GB
HDD: 4 x 3 TB = 12 TB
HBase is installed (this is one of the parameters for the script below)
So I ran python yarn-utils.py -c 12 -m 64 -d 4 -k True (c=cores, m=memory, d=hdds, k=hbase-installed) and got the following result:
Using cores=12 memory=64GB disks=4 hbase=True
Profile: cores=12 memory=49152MB reserved=16GB usableMem=48GB disks=4
Num Container=8
Container Ram=6144MB
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144
yarn.scheduler.maximum-allocation-mb=49152
yarn.nodemanager.resource.memory-mb=49152
mapreduce.map.memory.mb=6144
mapreduce.map.java.opts=-Xmx4915m
mapreduce.reduce.memory.mb=6144
mapreduce.reduce.java.opts=-Xmx4915m
yarn.app.mapreduce.am.resource.mb=6144
yarn.app.mapreduce.am.command-opts=-Xmx4915m
mapreduce.task.io.sort.mb=2457
These settings I made via the Ambari interface and restarted the cluster. The values also match roughly what I calculated manually before.
I have now problems
to find the optimal settings for my spark-submit script
parameters --num-executors, --executor-cores & --executor-memory.
to get the relation between the YARN container and the Spark executors
to understand the hardware information in my Spark History UI (less memory shown as I set (when calculated to overall memory by multiplying with worker node amount))
to understand the concept of the vcores in YARN, here I couldn't find any useful examples yet
However, I found this post What is a container in YARN? , but this didn't really help as it doesn't describe the relation to the executors.
Can someone help to solve one or more of the questions?
I will report my insights here step by step:
First important thing is this fact (Source: this Cloudera documentation):
When running Spark on YARN, each Spark executor runs as a YARN container. [...]
This means the number of containers will always be the same as the executors created by a Spark application e.g. via --num-executors parameter in spark-submit.
Set by the yarn.scheduler.minimum-allocation-mb every container always allocates at least this amount of memory. This means if parameter --executor-memory is set to e.g. only 1g but yarn.scheduler.minimum-allocation-mb is e.g. 6g, the container is much bigger than needed by the Spark application.
The other way round, if the parameter --executor-memory is set to somthing higher than the yarn.scheduler.minimum-allocation-mb value, e.g. 12g, the Container will allocate more memory dynamically, but only if the requested amount of memory is smaller or equal to yarn.scheduler.maximum-allocation-mb value.
The value of yarn.nodemanager.resource.memory-mb determines, how much memory can be allocated in sum by all containers of one host!
=> So setting yarn.scheduler.minimum-allocation-mb allows you to run smaller containers e.g. for smaller executors (else it would be waste of memory).
=> Setting yarn.scheduler.maximum-allocation-mb to the maximum value (e.g. equal to yarn.nodemanager.resource.memory-mb) allows you to define bigger executors (more memory is allocated if needed, e.g. by --executor-memory parameter).

executor memory setting in spark

I made standalone cluster and wanted to find the fastest way to process my app.
My machine has 12g ram. Here is some result I tried.
Test A (took 15mins)
1 worker node
spark.executor.memory = 8g
spark.driver.memory = 6g
Test B(took 8mins)
2 worker nodes
spark.executor.memory = 4g
spark.driver.memory = 6g
Test C(took 6mins)
2 worker nodes
spark.executor.memory = 6g
spark.driver.memory = 6g
Test D(took 6mins)
3 worker nodes
spark.executor.memory = 4g
spark.driver.memory = 6g
Test E(took 6mins)
3 worker nodes
spark.executor.memory = 6g
spark.driver.memory = 6g
Compared Test A, Test B just made one more woker (but same memory spend 4*2=8) but It made app fast. Why it happened?
Test C, D, E tried to spend much more memory than it had. but It worked and even faster. is config memory size just for limiting edge of memory?
It does not just as fast as adding worker nodes. How should I know profit number of worker and executor memory size?
On TestB, your application was running in parallel on 2 CPUs, therefore the total amount of time was almost a half.
Regarding memory - memory setting defines an upper limit Setting a small amount will make your. app to perform more GC, and if eventually your heap gets full, you'll receive an OutOfMemoryException.
Regarding the most suitable configuration - well, it depends. If your task does not consume much RAM - configure Spark to have as much executors as your CPUs.
Otherwise, configure your executors to match the appropriate amount of RAM required.
Keep in mind that those limitations should not be constant, and might be changed by your application requirements.

How does Spark occupy the memory

If my server has 50GB memory, Hbase is using 40GB. And when I run Spark I set the memory as --executor-memory 30G. So will Spark grab some memory from Hbase since there only 10GB left.
Another question, if Spark only need 1GB memory, but I gave Spark 10G memory, will Spark occupy 10GB memory.
The behavior will be different depending upon the deployment mode. In case you are using local mode, then --executor-memory will not change anything as you only have 1 Executor and that's your driver, so you need to increase the memory of your driver.
In case you are using Standalone mode and submitting your job in cluster mode then following would be applicable: -
--executor-memory is the memory required by per executor. It is the executors Heap Size. By Default 60% of the configured --executor-memory is used to cache RDDs. The remaining 40% of memory is available for any objects created during task execution. this is equivalent to -Xms and -Xmx. so in case you provide more memory then available then your executors will show errros regarding insufficient memory.
When you give Spark executor 30G memory, OS will not give it actual physical memory. But As and when your executor requires actual memory to either cache or processing this will cause your other processes like hbase to go on to swap. If your system's swap is set to zero then you will face OOM Error.
OS Swaps out idle part of the process which could make your process behave very slow.

How to set executor number by memory in YARN mode?

I did some testing on r3.8 xlarge cluster, each instance has 32 cores, and 244G memory.
If I set spark.executor.cores=16, spark.executor.memory=94G, there're 2 executors per instance, but when I set spark.executor.memory larger than 94G, there will be only one executor per instance;
If I set spark.executor.cores=8, spark.executor.memory=35G, there're 4 executors per instance, but when I set spark.executor.memory larger than 35, there will be no larger than 3 executors per instance.
So, my question is, how does the executor number come out by memory set? What's the formula? I though the Spark just simply use 70% of the physical memory to allocate to the executors but seems I'm wrong...
In Yarn mode you need to set number of executor by num-executors and executor memory by executor-memory. Here's a example:
spark-submit --master yarn-cluster --executor-memory 6G --num-executors 31 --executor-cores 32 example.jar Example
Now each executor requests a container from yarn with 6G + memory overhead and 1 core.
More info on spark documentation
Regarding the behavior you're seeing it sounds like the amount of memory available to your YARN NodeManagers is actually less than the 244GB that is available to the OS. To verify this, take a look at your YARN ResourceManager Web UI and you can see how much memory is availible in total across the cluster. This is determined from the yarn.nodemanager.resource.memory-mb in yarn-site.xml.
To answer your question about how the number of executors is determined: In YARN, if you're using spark with dynamicAllocation.enabled set to true, the number of executors is limited above dynamicAllocation.minExecutors and below dynamicAllocation.maxExecutors.
Other than that you're then subjected to YARN's resource allocation which, for most schedulers, will allocate resources to fill up a given queue that your job runs in.
In the situation where you have a totally unutilized cluster with one YARN queue and you submit a job to it, the Spark job will continue to add executors with the given number of cores and memory amount until the entire cluster is full (or there is not enough cores/memory for an additional executor to be allocated).

Resources