Spark Standalone: application gets 0 cores - apache-spark

I seem to be unable to assign cores to an application. This leads to the following (apparently common) error message:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I have one master and two slaves in a Spark cluster. All are 8-core i7s with 16GB of RAM.
I have left the spark-env.sh virtually virgin on all three, just specifying the master's IP address.
My spark-submit is the following:
nohup ./bin/spark-submit
--jars ./ikoda/extrajars/ikoda_assembled_ml_nlp.jar,./ikoda/extrajars/stanford-corenlp-3.8.0.jar,./ikoda/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector:2.0.1-s_2.11 \
--class ikoda.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--driver-memory 4g –num-executors 2 --executor-memory 2g --executor-cores 2 \
--master spark://192.168.0.141:7077 ./ikoda/ikodaanalysis-mlserver-0.1.0.jar 1000 > ./logs/nohup.out &
I suspect I am conflating the sparkConf initialization in my code with the spark-submit. I need this as the app involves SparkStreaming which can require reinitializing the SparkContext.
The sparkConf setup is as follows:
val conf = new SparkConf().setMaster(s"spark://$sparkmaster:7077").setAppName("MLPCURLModelGenerationDataStream")
conf.set("spark.streaming.stopGracefullyOnShutdown", "true")
conf.set("spark.cassandra.connection.host", sparkcassandraconnectionhost)
conf.set("spark.driver.maxResultSize", sparkdrivermaxResultSize)
conf.set("spark.network.timeout", sparknetworktimeout)
conf.set("spark.jars.packages", "datastax:spark-cassandra-connector:"+datastaxpackageversion)
conf.set("spark.cores.max", sparkcoresmax)
The Spark UI shows the following:

OK, this is definitely a case of programmer error.
But maybe others will make a similar error. The Master had been used as a local Spark previously. I had put some executor settings in spark-defaults.conf and then months later had forgotten about this.
There is a cascading hierarchy whereby SparkConf settings get precedence, then spark-submit settings and then spark-defaults.conf. spark-defaults.conf overrides defaults set by Apache Spark team
Once I removed the settings from spark-defaults, all was fixed.

It is because of the maximum of your physical memory.
your spark memory in spark UI 14.6GB So you must request memory for each executor below the volume
14.6GB, for this you can add config to your spark conf something like this:
conf.set("spark.executor.memory", "10gb")
if you request more than your physical memory spark does't allocate cpu cores to your job and display 0 in Cores in spark UI and run NOTHING.

Related

What is happening when starting a Spark application on Kubernetes

I read this: Running Spark on Kubernetes.
I want to know more details about the interaction between Kubernetes Controller/Scheduler and Spark runtime when launching a Spark job on K8s.
Specially, assuming we launch an Spark app by :
bin/spark-submit \
--master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--..............
My question is: the K8s may not be able to allocate 5 executors (or called containers/pods) immediately due to unavailability of cluster resources at the moment the Spark app is launched. Which way does Spark app take? (1) Spark starts running tasks as soon as possible when there is at least one executor is allocated. (2) Spark won't launch any tasks until all of the 5 executors have been allocated.
If you know Hadoop YARN, it would be great if you could also answer the question in the scenario of running Spark app on Hadoop YARN(DynamicAllocation Disabled) and point out the difference.

Spark client mode - YARN allocates a container for driver?

I am running Spark on YARN in client mode, so I expect that YARN will allocate containers only for the executors. Yet, from what I am seeing, it seems like a container is also allocated for the driver, and I don't get as many executors as I was expecting.
I am running spark submit on the master node. Parameters are as follows:
sudo spark-submit --class ... \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
--conf spark.yarn.am.cores=2 \
--conf spark.yarn.am.memory=8G \
--conf spark.executor.instances=5 \
--conf spark.executor.cores=3 \
--conf spark.executor.memory=10G \
--conf spark.dynamicAllocation.enabled=false \
While running this application, Spark UI's Executors page shows 1 driver and 4 executors (5 entries in total). I would expect 5, not 4 executors.
At the same time, YARN UI's Nodes tab shows that on the node that isn't actually used (at least according to Spark UI's Executors page...) there's a container allocated, using 9GB of memory. The rest of the nodes have containers running on them, 11GB of memory each.
Because in my Spark Submit the driver has 2GB less memory than executors, I think that the 9GB container allocated by YARN is for the driver.
Why is this extra container allocated? How can i prevent this?
Spark UI:
YARN UI:
Update after answer by Igor Dvorzhak
I was falsely assuming that the AM will run on the master node, and that it will contain the driver app (so setting spark.yarn.am.* settings will relate to the driver process).
So I've made the following changes:
set the spark.yarn.am.* settings to defaults (512m of memory, 1 core)
set the driver memory through spark.driver.memory to 8g
did not try to set driver cores at all, since it is only valid for cluster mode
Because AM on default settings takes up 512m + 384m of overhead, its container fits into the spare 1GB of free memory on a worker node.
Spark gets the 5 executors it requested, and the driver memory is appropriate to the 8g setting. All works as expected now.
Spark UI:
YARN UI:
Extra container is allocated for YARN application master:
In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
Even though in client mode driver runs in the client process, YARN application master is still running on YARN and requires container allocation.
There are no way to prevent container allocation for YARN application master.
For reference, similar question asked time ago: Resource Allocation with Spark and Yarn.
You can specify the driver memory and number of executors in spark submit as below.
spark-submit --jars..... --master yarn --deploy-mode cluster --driver-memory 2g --driver-cores 4 --num-executors 5 --executor-memory 10G --executor-cores 3
Hope it helps you.

SparkConf settings not used when running Spark app in cluster mode on YARN

I wrote a Spark application, which sets sets some configuration stuff via SparkConf instance, like this:
SparkConf conf = new SparkConf().setAppName("Test App Name");
conf.set("spark.driver.cores", "1");
conf.set("spark.driver.memory", "1800m");
conf.set("spark.yarn.am.cores", "1");
conf.set("spark.yarn.am.memory", "1800m");
conf.set("spark.executor.instances", "30");
conf.set("spark.executor.cores", "3");
conf.set("spark.executor.memory", "2048m");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> inputRDD = sc.textFile(...);
...
When I run this application with the command (master=yarn & deploy-mode=client)
spark-submit --class spark.MyApp --master yarn --deploy-mode client /home/myuser/application.jar
everything seems to work fine, the Spark History UI shows correct executor information:
But when running it with (master=yarn & deploy-mode=cluster)
my Spark UI shows wrong executor information (~512 MB instead of ~1400 MB):
Also my App name equals Test App Name when running in client mode, but is spark.MyApp when running in cluster mode. It seems that however some default settings are taken when running in Cluster mode. What am I doing wrong here? How can I make these settings for the Cluster mode?
I'm using Spark 1.6.2 on a HDP 2.5 cluster, managed by YARN.
OK, I think I found out the problem! In short form: There's a difference between running Spark settings in Standalone and in YARN-managed mode!
So when you run Spark applications in the Standalone mode, you can focus on the Configuration documentation of Spark, see http://spark.apache.org/docs/1.6.2/configuration.html
You can use the following settings for Driver & Executor CPU/RAM (just as explained in the documentation):
spark.executor.cores
spark.executor.memory
spark.driver.cores
spark.driver.memory
BUT: When running Spark inside a YARN-managed Hadoop environment, you have to be careful with the following settings and consider the following points:
orientate on the "Spark on YARN" documentation rather then on the Configuration documentation linked above: http://spark.apache.org/docs/1.6.2/running-on-yarn.html (the properties explained here have a higher priority then the ones explained in the Configuration docu (this seems to describe only the Standalone cluster vs. client mode, not the YARN cluster vs. client mode!!))
you can't use SparkConf to set properties in yarn-cluster mode! Instead use the corresponding spark-submit parameters:
--executor-cores 5
--executor-memory 5g
--driver-cores 3
--driver-memory 3g
In yarn-client mode you can't use the spark.driver.cores and spark.driver.memory properties! You have to use the corresponding AM properties in a SparkConf instance:
spark.yarn.am.cores
spark.yarn.am.memory
You can't set these AM properties via spark-submit parameters!
To set executor resources in yarn-client mode you can use
spark.executor.cores and spark.executor.memory in SparkConf
--executor-cores and executor-memory parameters in spark-submit
if you set both, the SparkConf settings overwrite the spark-submit parameter values!
This is the textual form of my notes:
Hope I can help anybody else with this findings...
Just to add on to D. Müller's answer:
Same issue happened to me and I tried the settings with some different combination. I am running Pypark 2.0.0 on YARN cluster.
I found that driver-memory must be written during spark submit but executor-memory can be written in script (i.e. SparkConf) and the application will still work.
My application will die if driver-memory is less than 2g. The error is:
ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
ERROR yarn.ApplicationMaster: User application exited with status 143
CASE 1:
driver & executor both written in SparkConf
spark = (SparkSession
.builder
.appName("driver_executor_inside")
.enableHiveSupport()
.config("spark.executor.memory","4g")
.config("spark.executor.cores","2")
.config("spark.yarn.executor.memoryOverhead","1024")
.config("spark.driver.memory","2g")
.getOrCreate())
spark-submit --master yarn --deploy-mode cluster myscript.py
CASE 2:
- driver in spark submit
- executor in SparkConf in script
spark = (SparkSession
.builder
.appName("executor_inside")
.enableHiveSupport()
.config("spark.executor.memory","4g")
.config("spark.executor.cores","2")
.config("spark.yarn.executor.memoryOverhead","1024")
.getOrCreate())
spark-submit --master yarn --deploy-mode cluster --conf spark.driver.memory=2g myscript.py
The job Finished with succeed status. Executor memory correct.
CASE 3:
- driver in spark submit
- executor not written
spark = (SparkSession
.builder
.appName("executor_not_written")
.enableHiveSupport()
.config("spark.executor.cores","2")
.config("spark.yarn.executor.memoryOverhead","1024")
.getOrCreate())
spark-submit --master yarn --deploy-mode cluster --conf spark.driver.memory=2g myscript.py
Apparently the executor memory is not set. Meaning CASE 2 actually captured executor memory settings despite writing it inside sparkConf.

Spark-submit in Spark stand alone - all memory gone to the drivers

I have setup a Spark standalone cluster, where I can submit jobs with spark-submit:
spark-submit \
--class blah.blah.MyClass \
--master spark://myaddress:6066 \
--executor-memory 8G \
--deploy-mode cluster \
--total-executor-cores 12 \
/path/to/jar/myjar.jar
Problem is when I send multiple jobs at the same time, say over 20 in one go, the first few finished successfully. All the others are now stuck waiting for resources. I noticed all the available memory has gone to the drivers, so in the drivers section they are all running but in the running application section they all are in WAITING state.
How can I tell spark stand alone to first allocate memory to the WAITING executors instead of the SUBMITTED drivers?
thank you
Below is an extract of my spark-defaults.conf
spark.master spark://address:7077
spark.eventLog.enabled true
spark.eventLog.dir /path/tmp/sparkEventLog
spark.driver.memory 5g
spark.local.dir /path/tmp
spark.ui.port xxx

How to prevent Spark Executors from getting Lost when using YARN client mode?

I have one Spark job which runs fine locally with less data but when I schedule it on YARN to execute I keep on getting the following error and slowly all executors get removed from UI and my job fails
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on myhost1.com: remote Rpc client disassociated
15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on myhost2.com: remote Rpc client disassociated
I use the following command to schedule Spark job in yarn-client mode
./spark-submit --class com.xyz.MySpark --conf "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M" --driver-java-options -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client --executor-memory 2G --executor-cores 8 --num-executors 12 /home/myuser/myspark-1.0.jar
What is the problem here? I am new to Spark.
I had a very similar problem. I had many executors being lost no matter how much memory we allocated to them.
The solution if you're using yarn was to set --conf spark.yarn.executor.memoryOverhead=600, alternatively if your cluster uses mesos you can try --conf spark.mesos.executor.memoryOverhead=600 instead.
In spark 2.3.1+ the configuration option is now --conf spark.yarn.executor.memoryOverhead=600
It seems like we were not leaving sufficient memory for YARN itself and containers were being killed because of it. After setting that we've had different out of memory errors, but not the same lost executor problem.
You can follow this AWS post to calculate memory overhead (and other spark configs to tune): best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr
When I had the same issue, deleting logs and free up more hdfs space worked.

Resources