launched executors are less than number of executors specified - apache-spark

I have EMR cluster with following configuration:
Number of cores, RAM(GB), yarn.nodemanager.resource.memory-mb(MB)
Master: 4 15 11532
core(slave1): 16 30 23040
core(slave2): 16 30 23040
core(slave3): 16 30 23040
core(slave4): 16 30 23040
I am starting a spark application with one job that gets divided into 2 stages using --master yarn-client with following configurations:
--num-executors 12 --executor-cores 5 --executor-memory 7G ---->(1)
--num-executors 12 --executor-cores 5 --executor-memory 6G ---->(2)
I have not modified any other parameter so spark.storage.* and spark.shuffle.* fractions are default.
calculations that I performed to find above configuration (master node is not performing any computation i.e verified using Ganglia except serving as a driver) are:
1. allocated 15 cores to yarn per node and started 3 executors/node
which implies 4(# of slave nodes)*3 = 12 executors.
2. 15 cores/3 executors = 5 cores per executor
3. 23040*(1-0.07) ~ 21G. Dividing this among three executors i.e
21/3=7G
In the (1) configuration, it is not launching 12 executors whereas in the (2) case it is able to do so. Though the memory is available per executor to do so, why it is not able to launch 12 executors in the (1) case?

What is your memory utilization like? Have you checked yarn-site.xml on the node managers hosts to see if all that memory and cpu is being exposed via node manager configuration?
You can do yarn node -list for a list of nodes and then yarn node -status (I beieve) to see a listing of what this node exposes to yarn as far as resources.
consult yarn log -applicationId to see a detailed log of your application interaction including captures of output.
Finally look at yarn logs on the resource manager host to see if there are any issues there

Related

Spark thrift server use only 2 cores

Google dataproc one node cluster, VCores Total = 8. I've tried from user spark:
/usr/lib/spark/sbin/start-thriftserver.sh --num-executors 2 --executor-cores 4
tried to change /usr/lib/spark/conf/spark-defaults.conf
tried to execute
export SPARK_WORKER_INSTANCES=6
export SPARK_WORKER_CORES=8
before start-thriftserver.sh
No success. In yarn UI I can see that thrift app use only 2 cores and 6 cores available.
UPDATE1:
environment tab at spark ui:
spark.submit.deployMode client
spark.master yarn
spark.dynamicAllocation.minExecutors 6
spark.dynamicAllocation.maxExecutors 10000
spark.executor.cores 4
spark.executor.instances 1
It depends on what yarn mode is that app in.
Can be yarn client - 1 core for Application Master(the app will be running on the machine where you ran command start-thriftserver.sh).
In case of yarn cluster - Driver will be inside AM container, so you can tweak cores with spark.driver.cores. Other cores will be used by executors (1 executor = 1 core by default)
Beware that --num-executors 2 --executor-cores 4 wouldn't work as you have 8 cores max and +1 will be needed for AM container (total of 9)
You can check cores usage from Spark UI - http://sparkhistoryserverip:18080/history/application_1534847473069_0001/executors/
Options below are only for Spark standalone mode:
export SPARK_WORKER_INSTANCES=6
export SPARK_WORKER_CORES=8
Please review all configs here - Spark Configuration (latest)
In your case you can edit spark-defaults.conf and add:
spark.executor.cores 3
spark.executor.instances 2
Or use local[8] mode as you have only one node anyway.
If you want YARN shows you proper number of cores allocated to executors change value in capacity-scheduler.xml for:
yarn.scheduler.capacity.resource-calculator
from:
org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
to:
org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
Otherwise it doesn't matter how many cores you ask for your executors, YARN will show you only one core per container.
Actually this config changes resource allocation behavior. More details: https://hortonworks.com/blog/managing-cpu-resources-in-your-hadoop-yarn-clusters/

Spark-submit create only 1 executor when pyspark interactive shell create 4 (both using yarn-client)

I'm using the quickstart cloudera VM (CDH 5.10.1) with Pyspark (1.6.0) and Yarn (MR2 Included) to aggregate numerical data per hour. I've got 1 CPU with 4 cores and 32 Go of RAM.
I've got a file named aggregate.py but until today I never submitted the job with spark-submit, I used pyspark interactive shell and copy/paste the code to test it.
When starting pyspark interactive shell I used :
pyspark --master yarn-client
I followed the treatment in the web UI accessible at quickstart.cloudera:8088/cluster and could see that Yarn created 3 executors and 1 driver with one core each (Not a good configuration but the main purpose is to make a proof of concept, until we move to a real cluster)
When submitting the same code with spark-submit :
spark-submit --verbose
--master yarn
--deploy-mode client \
--num-executors 2 \
--driver-memory 3G \
--executor-memory 6G \
--executor-cores 2 \
aggregate.py
I only have the driver, which also executes the tasks. Note that spark.dynamicAllocation.enabled is set to true in the environment tab, and spark.dynamicAllocation.minExecutors is set to 2.
I tried using spark-submit aggregate.py only, I still got only the driver as executor. I can't manage to have more than 1 executor with spark-submit, yet it works in spark interactive shell !
My Yarn configuration is as follow :
yarn.nodemanager.resource.memory-mb = 17 GiB
yarn.nodemanager.resource.cpu-vcores = 4
yarn.scheduler.minimum-allocation-mb = 3 GiB
yarn.scheduler.maximum-allocation-mb = 16 GiB
yarn.scheduler.minimum-allocation-vcores = 1
yarn.scheduler.maximum-allocation-vcores = 2
If someone can explain me what I'm doing wrong it would be a great help !
You have to set the driver memory and executor memory in to spark-defaults.conf.
It's located at
$SPARK_HOME/conf/spark-defaults.conf
and if there is a file like
spark-defaults.conf.template
then you have to rename the file as
spark-defaults.conf
and then set the number of executors, executor-memory ,number of executor-cores. you get the example from the template file or check this link
https://spark.apache.org/docs/latest/configuration.html.
or
When we used pyspark It's used default executor-memory but here in spark-submit you set executor-memory = 6G. I think you have to reduce the memory or remove this field so it can used default memory.
just a guess, as you said earlier "Yarn created 3 executors and 1 driver with one core each", so you have 4-cores in total.
Now as per your spark-submit statement,
cores = num-executors 2 * executor-cores 2 + for_driver 1 = 5
#but in total you have 4 cores. So it is unable to give you executors(as after driver only 3 cores left)
#Check if this is the issue.

SPARK_WORKER_INSTANCES setting not working in Spark Standalone Windows

I'm trying to setup a standalone Spark 2.0 server to process an analytics function in parallel. To do this I want to run 8 workers, with a single core per each worker. However, the Spark Master/Worker UI doesn't seem to be reflecting my configuration.
I'm using :
Standalone Spark 2.0
8 Cores 24gig RAM
windows server 2008
pyspark
spark-env.sh file is configured as follows:
SPARK_WORKER_INSTANCES = 8
SPARK_WORKER_CORES = 1
SPARK_WORKER_MEMORY = 2g
spark-defaults.conf is configured as follows:
spark.cores.max = 8
I start the master:
spark-class org.apache.spark.deploy.master.Master
I start the workers by running this command 8 times within a batch file:
spark-class org.apache.spark.deploy.worker.Worker spark://10.0.0.10:7077
The problem is that the UI shows up as follows:
As you can see each worker has 8 cores instead of the 1 core I have assigned it via the SPARK_WORKER_CORES setting. Also the memory is reflective of the entire machine memory not the 2g assigned to each worker. How can I configure Spark to run with 1 core/2g per each worker in standalone mode?
I fixed this to adding the cores and memory arguments to the worker itself.
start spark-class org.apache.spark.deploy.worker.Worker --cores 1 --memory 2g spark://10.0.0.10:7077

Spark executor GC taking long

Am running Spark job on a standalone cluster and I noticed after sometime the GC starts taking long and the red scary color begins to show.
Here is the resources available:
Cores in use: 80 Total, 76 Used
Memory in use: 312.8 GB Total, 292.0 GB Used
Job details:
spark-submit --class com.mavencode.spark.MonthlyReports
--master spark://192.168.12.14:7077
--deploy-mode cluster --supervise
--executor-memory 16G --executor-cores 4
--num-executors 18 --driver-cores 8
--driver-memory 20G montly-reports-assembly-1.0.jar
How do I fix the GC time taking so long?
I had the same problem and could resolve it by using Parallel GC instead of G1GC. You may add the following options to the executors additional Java options in the submit request
-XX:+UseParallelGC -XX:+UseParallelOldGC

spark-sql on yarn hangs when number of executors is increased - v1.3.0

I am running spark-sql on a hive table.
It runs successfully when the spark-shell is started with the following parameters,
"--driver-memory 8G --executor-memory 10G --executor-cores 1 --num-executors 30"
however the job hangs when the spark-shell is started with
"--driver-memory 8G --executor-memory 10G --executor-cores 1 --num-executors 40"
The difference is only in the number of executors (30 vs 40).
In the second case i see that there is 1 task active on each executor but it does not run. I do not see any "task completed" messages on the spark-shell.
The job runs successfully with number of executors below 30.
My yarn cluster has 42 nodes and 30 cores per node and about 50G memory per node.
Any pointers to where I have to look ?
I compared the debug level logs from both the runs, the runs that appeared to hang did not have any such log lines. The good runs had a bunch of these lines.
"org.apache.spark.storage.BlockManager logDebug - Level for block broadcast_0_piece0 is StorageLevel(true, true, false, false, 1)"
"org.apache.spark.storage.BlockManager logDebug - Level for block broadcast_1 is StorageLevel(true, true, false, true, 1)"
This was because of classpath issues, i was including some older versions of the dependencies which when removed no longer caused the issue

Resources