Spark Submit Command Configurations - apache-spark

Can Someone Please help me to understand below spark-submit configuration in detail.What is the use of each configuration and how exactly spark utilize these configuration for better performance.
--num-executors 3
--master yarn
--deploy-mode cluster
--driver-cores 3
--driver-memory 1G
--executor-cores 3
--executor-memory 1G

Related

Why does spark-shell/pyspark gets less resources compared to spark-submit

I am running a PySpark code which runs on a single node due to code requirements.
But when I run the code using PySpark
pyspark --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g
I get Segment error and the code fails, when I checked in Yarn I saw that my job was not getting resources even when they were available.
But when I use spark-submit
spark-submit --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g code.py
the code gets all the resources it needs and the code runs perfectly.
Am I missing some fundamental aspect of spark or why is this happening?

Spark fail if not all resources are allocated

Does spark or yarn has any flag to fail fast job if we can't allocate all resoucres?
For example if i run
spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-client
--num-executors 7
--driver-memory 512m
--executor-memory 4g
--executor-cores 1
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000
For now if spark can allocate only 5 executors it just will go with 5. Can we make to run it only with 7 or fail in other case?
You can set a spark.dynamicAllocation.minExecutors config in your job. For it you need to set spark.dynamicAllocation.enabled=true, detailed in this doc

Does spark-submit parameters work in local mode?

When I run spark-submit --master local[10] --num-executors 8 --executor-cores 5 --executor-memory 5g foo.jar,which means I am running an application in local mode,will --num-executors 8 --executor-cores 5 --executor-memory work together with local[10]? If not,which parameters will decide the resources allocation?
In other words,does --num-executors 8 --executor-cores 5 --executor-memory 5g only works on yarn?In local mode,only local[K] works?
No, the spark-submit parameters num-executors, executor-cores, executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.
Please refer here for more information on different ways to submit a spark application.

spark scala memory management issues

I am trying to submit a spark scala job with below configuration:
spark-submit --class abcd --queue new --master yarn --executor-cores 1 --executor-memory 4g --driver-memory 2g --num-executors 1
The allocated space for the queue is 700GB and it is taking entire 700GB and running.
Is there a way to restrict to 100GB only?
Thanks in advance.

Spark Job using more executors than allocated in jobs

I have following settings in my Spark job:
--num-executors 2
--executor-cores 1
--executor-memory 12G
--driver memory 16G
--conf spark.streaming.dynamicAllocation.enabled=false \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.streaming.receiver.writeAheadLog.enable=false
--conf spark.executor.memoryOverhead=8192
--conf spark.driver.memoryOverhead=8192'
My understanding is job should run with 2 executors however it is running with 3. This is happening to multiple of my jobs. Could someone please explain the reason?

Resources