Why does a Spark Application launch with only a single executor on DC/OS?

Why does a Spark Application launch with only a single executor on DC/OS? - apache-spark

I have Spark installed, but when I launch, there is always only one executor allocated to the application (and that is the driver one). I’ve tried everything, but haven’t been able to find out why this is happening.
Here’s the command I used to launch, to give you an idea of all the parameters:
dcos spark run --submit-args='--class <class-name> --executor-memory 6g --total-executor-cores 32 --driver-memory 6g <jar-file-source> <application-command-line-params>

Related

Spark: use of driver-memory parameter

When I submit this command, my job failed with error "Container is running beyond physical memory limits".
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total-executor-cores 30 --num-executors 15 --conf spark.yarn.executor.memoryOverhead=1000
But adding the parameter: --driver-memory to 5GB (or upper), the job ends without error.
spark-submit --master yarn --deploy-mode cluster --executor-memory 5G --total executor-cores 30 --num-executors 15 --driver-memory 5G --conf spark.yarn.executor.memoryOverhead=1000
Cluster info: 6 nodes with 120GB of Memory. YARN Container Memory Minimum: 1GB
The question is: what is the difference in using or not this parameter?

If increasing the driver memory is helping you to successfully complete the job then it means that driver is having lots of data coming into it from executors. Typically, the driver program is responsible for collecting results back from each executor after the tasks are executed. So, in your case it seems that increasing the driver memory helped to store more results back into the driver memory.
If you read the some points on executor memory, driver memory and the way Driver interacts with executors then you will get better clarity on the situation you are in.
Hope it helps to some extent.

Spark client mode - YARN allocates a container for driver?

I am running Spark on YARN in client mode, so I expect that YARN will allocate containers only for the executors. Yet, from what I am seeing, it seems like a container is also allocated for the driver, and I don't get as many executors as I was expecting.
I am running spark submit on the master node. Parameters are as follows:
sudo spark-submit --class ... \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.yarn.am.cores=2 \
--conf spark.yarn.am.memory=8G \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=10G \
--conf spark.dynamicAllocation.enabled=false \
While running this application, Spark UI's Executors page shows 1 driver and 4 executors (5 entries in total). I would expect 5, not 4 executors.
At the same time, YARN UI's Nodes tab shows that on the node that isn't actually used (at least according to Spark UI's Executors page...) there's a container allocated, using 9GB of memory. The rest of the nodes have containers running on them, 11GB of memory each.
Because in my Spark Submit the driver has 2GB less memory than executors, I think that the 9GB container allocated by YARN is for the driver.
Why is this extra container allocated? How can i prevent this?
Spark UI:
YARN UI:
Update after answer by Igor Dvorzhak
I was falsely assuming that the AM will run on the master node, and that it will contain the driver app (so setting spark.yarn.am.* settings will relate to the driver process).
So I've made the following changes:
set the spark.yarn.am.* settings to defaults (512m of memory, 1 core)
set the driver memory through spark.driver.memory to 8g
did not try to set driver cores at all, since it is only valid for cluster mode
Because AM on default settings takes up 512m + 384m of overhead, its container fits into the spare 1GB of free memory on a worker node.
Spark gets the 5 executors it requested, and the driver memory is appropriate to the 8g setting. All works as expected now.
Spark UI:
YARN UI:

Extra container is allocated for YARN application master:
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Even though in client mode driver runs in the client process, YARN application master is still running on YARN and requires container allocation.
There are no way to prevent container allocation for YARN application master.
For reference, similar question asked time ago: Resource Allocation with Spark and Yarn.

You can specify the driver memory and number of executors in spark submit as below.
spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3
Hope it helps you.

Run Spark application with custom number of cores and memory size

I'm totally new at this, so I don't understand really well how it's doing.
I need to run spark on my machine (login with ssh) and set up memory 60g, and 6 cores for execution.
This is what I've tried.
spark-submit --master yarn --deploy-mode cluster --executor-memory 60g --executor-cores 6
And this is what I got:
SPARK_MAJOR_VERSION is set to 2, using Spark2
Exception in thread "main" java.lang.IllegalArgumentException: Missing application resource.
at org.apache.spark.launcher.CommandBuilderUtils.checkArgument(CommandBuilderUtils.java:253)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitArgs(SparkSubmitCommandBuilder.java:160)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildSparkSubmitCommand(SparkSubmitCommandBuilder.java:276)
at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:151)
at org.apache.spark.launcher.Main.main(Main.java:87)
So, I guess there is some things to add to this code line for running and I have no idea what.

Here:
spark-submit --master yarn --deploy-mode cluster --executor-memory 60g --executor-cores 6
you don't specify which the entry point and your application!
Check the spark-submit documentation, which states:
Some of the commonly used options are:
--class: The entry point for your application (e.g. org.apache.spark.examples.SparkPi)
--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)
--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)
†
--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).
application-jar: Path to a bundled jar including your application and
all dependencies. The URL must be globally visible inside of your
cluster, for instance, an hdfs:// path or a file:// path that is
present on all nodes.
application-arguments: Arguments passed to the main method of your
main class, if any
For Python applications, simply pass a .py file in the place of <application-jar> instead of a JAR, and add Python .zip, .egg or .py files to the search path with --py-files.
Here is an example that takes some JARs and a python file (I didn't include your additional parameters for simplicity):
./spark-submit --jars myjar1.jar,myjar2.jar --py-files path/to/my/main.py arg1 arg2
I hope I can entry to spark shell (with that much memory and cores) and type code in there
Then you need pyspark, not spark-submit! What is the difference between spark-submit and pyspark?
So what you really want to do is this:
pyspark --master yarn --deploy-mode cluster --executor-memory 60g --executor-cores 6

If I understand your question correctly, you total number of cores=6 and total memory is 60GB. The parameeters
--executor-memory
--executor-cores
are actually for each executor inside spark. probably you should try
--executor-memory 8G
--executor-cores 1
this will create about 6 executors of 8Gb each (total 6*8 = 48GB). The rest 12 GB for operating system processing and metadata.

spark on yarn,only one executor on one node works and the allocation is random

i run spark-shell with this command：
./bin/spark-shell --master yarn --num-executors 16
--executor-memory 14G --executor-cores 8
i have four nodes，every node has 16G memory and 4cores
after i changed num-executors,the spark webUi tell me it worked,
BUT only one executor on node which named "slave" is running
we could see TRANSFER1 and TRANSFER2 is empty
how can i solve this, when i submit job the situation has not changed
should i change worker-instances or num-executors and so on

How to prevent Spark Executors from getting Lost when using YARN client mode?

I have one Spark job which runs fine locally with less data but when I schedule it on YARN to execute I keep on getting the following error and slowly all executors get removed from UI and my job fails
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on myhost1.com: remote Rpc client disassociated
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on myhost2.com: remote Rpc client disassociated
I use the following command to schedule Spark job in yarn-client mode
./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 12 /home/myuser/myspark-1.0.jar
What is the problem here? I am new to Spark.

I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.
The solution if you're using yarn was to set --conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try --conf spark.mesos.executor.memoryOverhead=600 instead.
In spark 2.3.1+ the configuration option is now --conf spark.yarn.executor.memoryOverhead=600
It seems like we were not leaving sufficient memory for YARN itself and containers were being killed because of it. After setting that we've had different out of memory errors, but not the same lost executor problem.

You can follow this AWS post to calculate memory overhead (and other spark configs to tune): best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr

When I had the same issue, deleting logs and free up more hdfs space worked.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string