I am running a PySpark code which runs on a single node due to code requirements.
But when I run the code using PySpark
pyspark --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g
I get Segment error and the code fails, when I checked in Yarn I saw that my job was not getting resources even when they were available.
But when I use spark-submit
spark-submit --master yarn --executor-cores 1 --driver-memory 60g --executor-memory 5g --conf spark.driver.maxResultSize=4g code.py
the code gets all the resources it needs and the code runs perfectly.
Am I missing some fundamental aspect of spark or why is this happening?
Related
Can Someone Please help me to understand below spark-submit configuration in detail.What is the use of each configuration and how exactly spark utilize these configuration for better performance.
--num-executors 3
--master yarn
--deploy-mode cluster
--driver-cores 3
--driver-memory 1G
--executor-cores 3
--executor-memory 1G
Does spark or yarn has any flag to fail fast job if we can't allocate all resoucres?
For example if i run
spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-client
--num-executors 7
--driver-memory 512m
--executor-memory 4g
--executor-cores 1
/usr/hdp/current/spark2-client/examples/jars/spark-examples_*.jar 1000
For now if spark can allocate only 5 executors it just will go with 5. Can we make to run it only with 7 or fail in other case?
You can set a spark.dynamicAllocation.minExecutors config in your job. For it you need to set spark.dynamicAllocation.enabled=true, detailed in this doc
When I run spark-submit --master local[10] --num-executors 8 --executor-cores 5 --executor-memory 5g foo.jar,which means I am running an application in local mode,will --num-executors 8 --executor-cores 5 --executor-memory work together with local[10]? If not,which parameters will decide the resources allocation?
In other words,does --num-executors 8 --executor-cores 5 --executor-memory 5g only works on yarn?In local mode,only local[K] works?
No, the spark-submit parameters num-executors, executor-cores, executor-memory won't work in local mode because these parameters are to be used when you deploy your spark job on a cluster and not a single machine, these will only work in case you run your job in client or cluster mode.
Please refer here for more information on different ways to submit a spark application.
I am trying to submit a spark scala job with below configuration:
spark-submit --class abcd --queue new --master yarn --executor-cores 1 --executor-memory 4g --driver-memory 2g --num-executors 1
The allocated space for the queue is 700GB and it is taking entire 700GB and running.
Is there a way to restrict to 100GB only?
Thanks in advance.
I am runing a pyspark job using the command
spark-submit ./exp-1.py --num-executors 8 --executor-memory 4G
Is there a way to confirm that these configurations are getting reflected in during execution ?
There is a command verbose for checking configuration when spark job runs.
spark-submit --verbose ./exp-1.py --num-executors 8 --executor-memory 4G