Show number of executors and executor memory - apache-spark

I am runing a pyspark job using the command
spark-submit ./exp-1.py --num-executors 8 --executor-memory 4G
Is there a way to confirm that these configurations are getting reflected in during execution ?

There is a command verbose for checking configuration when spark job runs.
spark-submit --verbose ./exp-1.py --num-executors 8 --executor-memory 4G

Related

Why does spark-shell/pyspark gets less resources compared to spark-submit

I am running a PySpark code which runs on a single node due to code requirements.
But when I run the code using PySpark
pyspark --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g
I get Segment error and the code fails, when I checked in Yarn I saw that my job was not getting resources even when they were available.
But when I use spark-submit
spark-submit --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g code.py
the code gets all the resources it needs and the code runs perfectly.
Am I missing some fundamental aspect of spark or why is this happening?

Spark fail if not all resources are allocated

Does spark or yarn has any flag to fail fast job if we can't allocate all resoucres?
For example if i run
spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-client
--num-executors 7
--driver-memory 512m
--executor-memory 4g
--executor-cores 1
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000
For now if spark can allocate only 5 executors it just will go with 5. Can we make to run it only with 7 or fail in other case?
You can set a spark.dynamicAllocation.minExecutors config in your job. For it you need to set spark.dynamicAllocation.enabled=true, detailed in this doc

Does spark-submit parameters work in local mode?

When I run spark-submit --master local[10] --num-executors 8 --executor-cores 5 --executor-memory 5g foo.jar,which means I am running an application in local mode,will --num-executors 8 --executor-cores 5 --executor-memory work together with local[10]? If not,which parameters will decide the resources allocation?
In other words,does --num-executors 8 --executor-cores 5 --executor-memory 5g only works on yarn?In local mode,only local[K] works?
No, the spark-submit parameters num-executors, executor-cores, executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.
Please refer here for more information on different ways to submit a spark application.

spark scala memory management issues

I am trying to submit a spark scala job with below configuration:
spark-submit --class abcd --queue new --master yarn --executor-cores 1 --executor-memory 4g --driver-memory 2g --num-executors 1
The allocated space for the queue is 700GB and it is taking entire 700GB and running.
Is there a way to restrict to 100GB only?
Thanks in advance.

spark on yarn,only one executor on one node works and the allocation is random

i run spark-shell with this command:
./bin/spark-shell --master yarn --num-executors 16
--executor-memory 14G --executor-cores 8
i have four nodes,every node has 16G memory and 4cores
after i changed num-executors,the spark webUi tell me it worked,
BUT only one executor on node which named "slave" is running
we could see TRANSFER1 and TRANSFER2 is empty
how can i solve this, when i submit job the situation has not changed
should i change worker-instances or num-executors and so on

Resources