Spark on Yarn: driver memory is checked on the client side? - apache-spark

I thought I understood the spark on yarn architecture quite well but now I wonder: when I launch
spark-submit --master yarn-cluster --class com.domain.xxx.ddpaction.DdpApp --num-executors 24 --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --conf "spark.yarn.jar=/spark/lib/spark-assembly-1.1.0-hadoop2.4.0.jar" ddpaction-3.1.0.jar yarn-cluster config.yml
it fails with a
# Native memory allocation (malloc) failed to allocate 2863333376 bytes for committing reserved memory
The server from which I launch spark-submit has less than 2GB of free mem and this causes the error, but the resource manager, where the driver should execute has far more than the 4GB set as the driver-memory parameter.
Why does driver-memory value, that in my understanding should only be checked and allocated on the yarn cluster in the resource manager, is allocated on the server that launch spark-submit in yarn-cluster mode?

This is a bug that was fixed in Spark-1.4.0 See SPARK-3884

It looks like there is a bad simplification in spark-submit script:
elif [ "$1" = "--driver-memory" ]; then
export SPARK_SUBMIT_DRIVER_MEMORY=$2
So the driver memory param value is used by spark-submit to set its allocated memory; this is only right and needed in yarn-client mode, not in yarn-cluster.
I solved my problem by replacing those lines by:
elif [ "$1" = "--spark-submit-memory" ]; then
export SPARK_SUBMIT_DRIVER_MEMORY=$2
so now I can set (if needed) the memory allocated to spark-submit to a different value of the one of the driver.

Related

Spark: use of driver-memory parameter

When I submit this command, my job failed with error "Container is running beyond physical memory limits".
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total-executor-cores 30 --num-executors 15 --conf spark.yarn.executor.memoryOverhead=1000
But adding the parameter: --driver-memory to 5GB (or upper), the job ends without error.
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total executor-cores 30 --num-executors 15 --driver-memory 5G --conf spark.yarn.executor.memoryOverhead=1000
Cluster info: 6 nodes with 120GB of Memory. YARN Container Memory Minimum: 1GB
The question is: what is the difference in using or not this parameter?
If increasing the driver memory is helping you to successfully complete the job then it means that driver is having lots of data coming into it from executors. Typically, the driver program is responsible for collecting results back from each executor after the tasks are executed. So, in your case it seems that increasing the driver memory helped to store more results back into the driver memory.
If you read the some points on executor memory, driver memory and the way Driver interacts with executors then you will get better clarity on the situation you are in.
Hope it helps to some extent.

Spark client mode - YARN allocates a container for driver?

I am running Spark on YARN in client mode, so I expect that YARN will allocate containers only for the executors. Yet, from what I am seeing, it seems like a container is also allocated for the driver, and I don't get as many executors as I was expecting.
I am running spark submit on the master node. Parameters are as follows:
sudo spark-submit --class ... \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.yarn.am.cores=2 \
--conf spark.yarn.am.memory=8G \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=10G \
--conf spark.dynamicAllocation.enabled=false \
While running this application, Spark UI's Executors page shows 1 driver and 4 executors (5 entries in total). I would expect 5, not 4 executors.
At the same time, YARN UI's Nodes tab shows that on the node that isn't actually used (at least according to Spark UI's Executors page...) there's a container allocated, using 9GB of memory. The rest of the nodes have containers running on them, 11GB of memory each.
Because in my Spark Submit the driver has 2GB less memory than executors, I think that the 9GB container allocated by YARN is for the driver.
Why is this extra container allocated? How can i prevent this?
Spark UI:
YARN UI:
Update after answer by Igor Dvorzhak
I was falsely assuming that the AM will run on the master node, and that it will contain the driver app (so setting spark.yarn.am.* settings will relate to the driver process).
So I've made the following changes:
set the spark.yarn.am.* settings to defaults (512m of memory, 1 core)
set the driver memory through spark.driver.memory to 8g
did not try to set driver cores at all, since it is only valid for cluster mode
Because AM on default settings takes up 512m + 384m of overhead, its container fits into the spare 1GB of free memory on a worker node.
Spark gets the 5 executors it requested, and the driver memory is appropriate to the 8g setting. All works as expected now.
Spark UI:
YARN UI:
Extra container is allocated for YARN application master:
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Even though in client mode driver runs in the client process, YARN application master is still running on YARN and requires container allocation.
There are no way to prevent container allocation for YARN application master.
For reference, similar question asked time ago: Resource Allocation with Spark and Yarn.
You can specify the driver memory and number of executors in spark submit as below.
spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3
Hope it helps you.

Spark-submit in Spark stand alone - all memory gone to the drivers

I have setup a Spark standalone cluster, where I can submit jobs with spark-submit:
spark-submit \
--class blah.blah.MyClass \
--master spark://myaddress:6066 \
--executor-memory 8G \
--deploy-mode cluster \
--total-executor-cores 12 \
/path/to/jar/myjar.jar
Problem is when I send multiple jobs at the same time, say over 20 in one go, the first few finished successfully. All the others are now stuck waiting for resources. I noticed all the available memory has gone to the drivers, so in the drivers section they are all running but in the running application section they all are in WAITING state.
How can I tell spark stand alone to first allocate memory to the WAITING executors instead of the SUBMITTED drivers?
thank you
Below is an extract of my spark-defaults.conf
spark.master spark://address:7077
spark.eventLog.enabled true
spark.eventLog.dir /path/tmp/sparkEventLog
spark.driver.memory 5g
spark.local.dir /path/tmp
spark.ui.port xxx

Spark ignores SPARK_WORKER_MEMORY?

I'm using standalone cluster mode, 1.5.2.
Even though I'm setting SPARK_WORKER_MEMORY in spark-env.sh, it looks like this setting is ignored.
I can't find any indications at the scripts under bin/sbin that -Xms/-Xmx are set.
If I use ps command the worker pid, it looks like memory set to 1G:
[hadoop#sl-env1-hadoop1 spark-1.5.2-bin-hadoop2.6]$ ps -ef | grep 20232
hadoop 20232 1 0 02:01 ? 00:00:22 /usr/java/latest//bin/java
-cp /workspace/3rd-party/spark/spark-1.5.2-bin-hadoop2.6/sbin/../conf/:/workspace/
3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/spark-assembly-1.5.2-hadoop2.6.0.jar:/workspace/
3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar:/workspace/
3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/workspace/
3rd-party/spark/spark-1.5.2-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/workspace/
3rd-party/hadoop/2.6.3//etc/hadoop/ -Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker
--webui-port 8081 spark://10.52.39.92:7077
spark-defaults.conf:
spark.master spark://10.52.39.92:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.executor.memory 2g
spark.executor.cores 1
spark-env.sh:
export SPARK_MASTER_IP=10.52.39.92
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=12g
Am I missing something?
Thanks.
When using spark-shell or spark-submit, use the --executor-memory option.
When configuring it for a standalone jar, set the system property programmatically before creating the spark context.
System.setProperty("spark.executor.memory", executorMemory)
You are using wrong setting in cluster mode.
SPARK_EXECUTOR_MEMORY is the right option to set Executor memory in cluster mode.
SPARK_WORKER_MEMORY works only in standalone deploy mode.
Otherway to set executor memory from command line : -Dspark.executor.memory=2g
Have a loook at one more related SE question regarding these settings :
Spark configuration, what is the difference of SPARK_DRIVER_MEMORY, SPARK_EXECUTOR_MEMORY, and SPARK_WORKER_MEMORY?
This is my configuration on cluster mode, on spark-default.conf
spark.driver.memory 5g
spark.executor.memory 6g
spark.executor.cores 4
Did have something like this?
If you don't add this code (with your options) Spark executor will get 1gb of Ram as default.
Otherwise you can add these options on ./spark-submit like this :
# Run on a YARN cluster
export HADOOP_CONF_DIR=XXX
./bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \ # can be client for client mode
--executor-memory 20G \
--num-executors 50 \
/path/to/examples.jar \
1000
Try to check on master(ip/name of master):8080 when you run an application if resources have been allocated correctly.
I've encountered the same problem as yours. The reason is that, in standalone mode, spark.executor.memory is actually ignored. What has an effect is spark.driver.memory, because the executor is living in the driver.
So what you can do is to set spark.driver.memory as high as you want.
This is where I've found the explanation:
How to set Apache Spark Executor memory

How to prevent Spark Executors from getting Lost when using YARN client mode?

I have one Spark job which runs fine locally with less data but when I schedule it on YARN to execute I keep on getting the following error and slowly all executors get removed from UI and my job fails
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on myhost1.com: remote Rpc client disassociated
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on myhost2.com: remote Rpc client disassociated
I use the following command to schedule Spark job in yarn-client mode
./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 12 /home/myuser/myspark-1.0.jar
What is the problem here? I am new to Spark.
I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.
The solution if you're using yarn was to set --conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try --conf spark.mesos.executor.memoryOverhead=600 instead.
In spark 2.3.1+ the configuration option is now --conf spark.yarn.executor.memoryOverhead=600
It seems like we were not leaving sufficient memory for YARN itself and containers were being killed because of it. After setting that we've had different out of memory errors, but not the same lost executor problem.
You can follow this AWS post to calculate memory overhead (and other spark configs to tune): best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr
When I had the same issue, deleting logs and free up more hdfs space worked.

Resources