As per My spark cluster the below configuration is set
spark.executor.memory=2g
I would like to know that this 2G of RAM is shared by all executors or this 2G of RAM is used by each executor in each worker machine?
I would like to know that this 2G of RAM is shared by all executors or
this 2G of RAM is used by each executor in each worker machine
This setting will cause each executor on every one of your Worker nodes to have 2G memory. This setting doesn't mean "share 2G of memory between all executors", it means "give each executor 2G of memory".
This is explicitly stated in the documentation (emphasis mine):
spark.executor.memory | 1g | Amount of memory to use per executor process
(e.g. 2g, 8g).
If you have multiple executors per Worker node, this means that each one of these executors will consume 2G of memory.
Related
I am learning spark and trying to execute simple wordcount application. I am using
spark version 2.4.7-bin-hadoop.2.7
scala 2.12
java 8
spark cluster having 1 master and 2 worker node is running as stand alone cluster
spark config is
spark.master spark://localhost:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.driver.memory 500M
master start script is ${SPARK_HOME}/sbin/start-master.sh
slave start script is ${SPARK_HOME}/sbin/start-slave.sh spark://localhost:7077 -c 1 -m 50M
I want to start the driver in cluster mode
${SPARK_HOME}/bin/spark-submit --master spark://localhost:7077 --deploy-mode cluster --driver-memory 500M --driver-cores 8 --executor-memory 50M --executor-cores 4 <absolut path to the jar file having code>
Note: The completed driver/apps are the ones I had to kill
I have used the above params after reading spark doc and checking the blogs.
But after I submit the job driver does not run. It always shows worker as none. I have read multiple blogs and checked the documentation to find out how to submit the job in cluster mode. I tweaked different params for spark-submit but it does not execute. Interesting thing to note is that when i submit in client mode it works.
Can you help me in fixing this issue?
Take a look at CPU and memory configurations of your workers and the driver.
Your application requires 500 Mb of RAM and one CPU core to run the driver and 50 Mb and one core to run computational jobs. So you need 550 Mb of RAM and two cores. These resources are provided by a worker when you run your driver in cluster mode. But each worker is allowed to use only one CPU core and 50 Mb of RAM. So the resources that the worker has are not enough to execute your driver.
You have to allocate your Spark cluster as much resources as you need for your work:
Worker Cores >= Driver Cores + Executor Cores
Worker Memory >= Driver Memory + Executor Memory
Perhaps you have to increase amount of memory for both the driver and the executor. Try to run Worker with 1 Gb memory and your driver with 512 Mb --driver-memory and --executor-memory.
We are running spark in cluster mode. As per spark documentation we can provide spark.executor.memory = 3g to change the executor memory size. or we can provide spark-shell --executor-memory 3g. But both the ways when i go and check in spark UI it is showing each executor having 530 MB of memory. Any ideas how to change the memory more than 530MB.
When adjusting the memory for the executors (e.g., by setting --executor-memory 2g) and setting for the master a local deployment (local[4]), does each local thread receive 2 GB of memory or are the 2 GB set in total for the local run?
spark.executor.memory is set per executor process and this amount is shared between executor threads.
A container is an abstract notion in YARN. When running Spark on YARN, each Spark executor runs as a YARN container. How many YARN containers can be launched in each Node Manager, by each client-submitted application?
You can run as many executors on a single NodeManager as you want, so long as you have the resources. If you have a server with 20gb RAM and 10 cores, you can run 10 2gb 1core executors on that nodemanager. It wouldn't be advisable to run multiple executors on the same nodemanager as there is overhead cost in shuffling data between executors, even if they process is running on the same machine.
Each executor runs in a YARN container.
Depending on how big your YARN cluster is, how your data is spread out among the worker nodes to have better data locality, how many executors you requested for your application, how much resource(cores per executor, memory per executor) you requested per executor and whether your have enabled dynamic resource allocation, Spark decides on how many executors are needed in total and, how many executors to launch per worker nodes.
If you requested for resource that YARN cluster could not accommodate, your requested will be rejected.
Following are the properties to look out for when making spark-submit request.
--num-executors - number of total executors you need
--executor-cores - number of cores per executor. Max 5 is recommended.
--executor-memory - amount of memory per executor.
--spark.dynamicAllocation.enabled
-- spark.dynamicAllocation.maxExecutors
I have a cluster of 3 nodes, each has 12 cores, and 30G, 20G and 10G of RAM respectively. When I run my application, I set the executor memory to 20G, which prevent the executor from being launched in the 10G machine since it's exceeding the slave memory threshold, it also under utilize the resources on the 30G machine. I searched but didn't find any way to set the executor memory dynamically base on the capacity of the node, so how can I config the cluster or my Spark job to fully utilize the resources of the cluster?
The solution is to have more executors with to less memory. You can use all of the memory by having 6- 10G executors (1 on the 10G node, 2 on the 20G node, 3 on the 30G node). Or by having 12- 5G executors. etc