we are submitting multiple sparks jobs in yarn-cluster mode to yarn queue simultaneously. The problem which we are facing currently is that drivers are getting initialized for all the jobs and no resources are left for the executor's initializations.
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.5</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls a number of concurrently running
applications.
</description>
</property>
According to this property, only 50% resources are available for application master but in our case, this is not working.
Any suggestions to tackle this problem.
Related
How does yarn allocate resources for spark applications and how it is done when spark runs in standalone mode?
You define the driver memory size, deployment mode, number of executors and their memory sizes when you run spark-submit. If no options are provided, the defaults from spark-env and/or yarn-site.xml are used. Then that amount of resources will be scheduled.
If dynamic executor execution is enabled, and you're reading data from HDFS, for example, then more or less executors may start, depending on how many file blocks the data contains
How does spark choose nodes to run executors?(spark on yarn)
We use spark on yarn mode, with a cluster of 120 nodes.
Yesterday one spark job create 200 executors, while 11 executors on node1,
10 executors on node2, and other executors distributed equally on the other nodes.
Since there are so many executors on node1 and node2, the job run slowly.
How does spark select the node to run executors?
according to yarn resourceManager?
As you mentioned Spark on Yarn:
Yarn Services choose executor nodes for spark job based on the availability of the cluster resource. Please check queue system and dynamic allocation of Yarn. the best documentation https://blog.cloudera.com/blog/2016/01/untangling-apache-hadoop-yarn-part-3/
Cluster Manager allocates resources across the other applications.
I think the issue is with bad optimized configuration. You need to configure Spark on the Dynamic Allocation. In this case Spark will analyze cluster resources and add changes to optimize work.
You can find all information about Spark resource allocation and how to configure it here: http://site.clairvoyantsoft.com/understanding-resource-allocation-configurations-spark-application/
Are all 120 nodes having identical capacity?
Moreover the jobs will be submitted to a suitable node manager based on the health and resource availability of the node manager.
To optimise spark job, You can use dynamic resource allocation, where you do not need to define the number of executors required for running a job. By default it runs the application with the configured minimum cpu and memory. Later it acquires resource from the cluster for executing tasks. It will release the resources to the cluster manager once the job has completed and if the job is idle up to the configured idle timeout value. It reclaims the resources from the cluster once it starts again.
We are running a spark streaming job using yarn as cluster manager, i have dedicated 7 cores per node to each node ...via yarn-site.xml as shown in the pic below
when the job is running ..it's only using 2 vcores and 5 vcores are left alone and the job is slow with lot of batches queued up ..
how can we make it use all the 7 vcores ..that's available to it this is usage when running so that it speed's up our job
Would greatly appreciate if any of the experts in the community will help out as we are new to Yarn & Spark
I searched many answers for this question. Finally, it worked after changing a yarn config file: capacity-scheduler.xml
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Don't forget to restart your yarn
At spark level you can control yarn application master's cores by using parameters spark.yarn.am.cores.
For spark executors you need to pass --executor-cores to spark-submit.
However from spark, you cannot control what(vcores/memory) yarn chooses to allocate to the container that it spawns which is right, as you are running spark over yarn.
In order to control that you will need to change yarn vcore parameters like yarn.nodemanager.resource.cpu-vcores, yarn.scheduler.minimum-allocation-vcores. More you can find here https://www.cloudera.com/documentation/enterprise/5-3-x/topics/cdh_ig_yarn_tuning.html#configuring_in_cm
I use a server with 16 cores, 64 GB ram, 2.5 TB disk and I want to execute a Giraph program. I have installed hadoop-2.7.2 and I don't know how can configure hadoop to use only a partial amount of server resources because the server used by many users.
Requirements: Hadoop must use max 12 cores (=> 4 cores for NameNode, DataNode, JobTracker, TaskTracker and max 8 for tasks) and max 28GB ram (i.e., 4*3GB + 8*2GB).
My Yarn-site resources configuration:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>28672</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
</configuration>
When I try to execute Giraph program, in http://localhost:8088 Yarn Application state is: ACCEPTED: waiting for AM container to be allocated, launched and register with RM.
I think some configurations are missing from my Yarn-site.xml in order to adapt the above requirements.
Before assigning resources to the services take a look at Yarn tuning Guide file from Cloudera, you will get idea how much resources should be allocated to OS, Hadoop daemons, etc
As you mentioned
Yarn Application state is: ACCEPTED: waiting for AM container to be allocated, launched and register with RM
If there is no available resources for a job, then it will be in ACCEPTED state until it get resources. So in your case, check how many jobs are submitting at the same time and check the resources utilisation for those jobs.
If you want to configure no waiting for your jobs, you have to consider creating scheduler queues
We are trying to run our spark cluster on yarn. We are having some performance issues especially when compared to the standalone mode.
We have a cluster of 5 nodes with each having 16GB RAM and 8 cores each. We have configured the minimum container size as 3GB and maximum as 14GB in yarn-site.xml. When submitting the job to yarn-cluster we supply number of executor = 10, memory of executor =14 GB. According to my understanding our job should be allocated 4 container of 14GB. But the spark UI shows only 3 container of 7.2GB each.
We are unable to ensure the container number and resources allocated to it. This causes detrimental performance when compared to the standalone mode.
Can you drop any pointer on how to optimize yarn performance?
This is the command I use for submitting the job:
$SPARK_HOME/bin/spark-submit --class "MyApp" --master yarn-cluster --num-executors 10 --executor-memory 14g target/scala-2.10/my-application_2.10-1.0.jar
Following the discussion I changed my yarn-site.xml file and also the spark-submit command.
Here is the new yarn-site.xml code :
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hm41</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>14336</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2560</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>13312</value>
</property>
And the new command for spark submit is
$SPARK_HOME/bin/spark-submit --class "MyApp" --master yarn-cluster --num-executors 4 --executor-memory 10g --executor-cores 6 target/scala-2.10/my-application_2.10-1.0.jar
With this I am able to get 6 cores on each machine but the memory usage of each node is still around 5G. I have attached the screen shot of SPARKUI and htop.
The memory (7.2GB) you see in the SparkUI is the spark.storage.memoryFraction, which by default is 0.6. As for your missing executors, you should look in the YARN resource manager logs.
Withing yarn-site.xml check that yarn.nodemanager.resource.memory-mb is set the right way. In my understanding of your cluster it should be set to 14GB. This setting is responsible for giving the YARN know how much memory it can use on this specific node
If you have this set right and you have 5 servers running YARN NodeManager, then your job submission command is wrong. First, --num-executors is the number of YARN containers would be started for executing on the cluster. You specify 10 containers with 14GB RAM each, but you don't have this many resources on your cluster! Second, you specify --master yarn-cluster, which means that Spark Driver would run inside of the YARN Application Master that would require a separate container.
In my opinion it shows 3 containers because out of 5 nodes in the cluster you have only 4 of them running YARN NodeManager + you request to allocate 14GB for each of the containers, so YARN first starts Application Master and then polls the NM for available resources and see that it can start only 3 containers. Regarding heap size you see, after starting the Spark find its JVM containers and see the parameters of their start - you should have many -Xmx flags in a single line - one correct and one wrong, you should find its origin in config files (Hadoop or Spark)
Before submitting an application to the cluster, start the spark-shell with the same settings (replace yarn-cluster with yarn-client) and check how it is started, check WebUI and JVMs started
Just because YARN "thinks" it has 70GB (14GBx5), doesn't mean at run time there is 70GB available on the cluster. You could be running other Hadoop components (hive, HBase, flume, solr, or your own app, etc.) which consume memory. So the run-time decision YARN makes is based on what's currently available -- and it had only 52GB (3x14GB) available to you. By the way, the GB numbers are approximate because it is really computed as 1024MB per GB...so you will see decimals.
Use nmon or top to see what else is using memory on each node.