I have Spark Standalone Cluster which is doing nothing. It has such properties.
spark.executor.memory 5g
spark.driver.memory 5g
spark.cores.max 10
spark.deploy.defaultCores 5
And I have an app which creates SparkContext (which points to my cluster) and then apply some action on rdd. And it fails after first action with this extremely popular error:
Initial has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Ok. As I understood I got this error after I have asked more cores/memory than cluster could provide me. It is ok but I do not ask any resources in my app (I do not specify neither --executor-memory nor --total-executor-cores) Then what it can be?
PS: Cluster seems to be fine because I can submit some jar through ./bin/submit and it works. But with this app it does not even appear in "Running Applications" section of server's web interface.
You can check your firewall settings.
The host firewall on the host where I ran my PySpark shell rejected the connection attempts back from the worker nodes.
After allowing all traffic between all nodes involved, the problem was resolved!
The driver host was another VM in the same OpenStack project,
so allowing all traffic between the VMs in the same project was OK to do security-wise.
Spark – How to fix “WARN TaskSchedulerImpl: Initial job has not accepted any resources”
Related
TLDR: How do executors communicate with each other? Is passwordless ssh between workers (i.e., machines hosting executors) necessary for executor-executor communication?
While setting up a Spark cluster, I only enabled passwordless ssh between driver node and worker nodes on which executors run. I am surprised that executors running on different nodes in the cluster are able to communicate (e.g. for a shuffle operation) without me having configured anything explicitly. I am following the book 'Spark - The Definitive Guide' (1st edition), and Fig 15-6 below
shows red arrows between driver and executors, but not between executors. Am I missing something in my understanding? I believe executors should communicate between themselves but the picture does not show it nor did I ever configure inter executor communication but somehow everything works fine.
I see a related question here but did not fully see how executors communicate between themselves without me setting up passwordless ssh. Thank you.
I'm trying to run an application in spark (2.3.1) for java,the inconvenient is that every time I try to run the spark throw a message "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resource" after a few tries (in all of those tries spark add and remove executor in the same worker but in the same port). So anyone has an idea about how to solve this?
I'm using a master in the computer A and the worker in the computer B. Set the computer A with driver memory of 3g and the worker with 2g (this is because the app doesn't require too much memory) and 4 cores to use in the executor.
I check other similar questions an most of them were network or memory issues, I discard the network issue because other application I can run with the worker.
If you can run your application with one worker ,then assign less memory to your driver.
Driver should have 1g and the worker with 4g,this should done.
Also what all transformations and actions are you doing ?
I'm trying to fetch some data from Cloudera's Quick Start Hadoop distribution (a Linux VM for us) on our SAP HANA database using SAP Spark Controller. Every time I trigger the job in HANA, it gets stuck and I see the following warning being logged continuously every 10-15 seconds in SPARK Controller's log file, unless I kill the job.
WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Although it's logged like a warning it looks like it's a problem that prevents the job from executing on Cloudera. From what I read, it's either an issue with the resource management on Cloudera, or an issue with blocked ports. In our case we don't have any blocked ports so it must be the former.
Our Cloudera is running a single node and has 16GB RAM with 4 CPU cores.
Looking at the overall configuration I have a bunch of warnings, but I can't determine if they are relevant to the issue or not.
Here's also how the RAM is distributed on Cloudera
It would be great if you can help me pinpoint the cause for this issue because I've been trying various combinations of things over the past few days without any success.
Thanks,
Dimitar
You're trying to use the Cloudera Quickstart VM for a purpose beyond it's capacity. It's really meant for someone to play around with Hadoop and CDH and should not be used for any production level work.
Your Node Manager only has 5GB of memory to use for compute resources. In order to do any work, you need to create an Application Master(AM) and a Spark Executor and then have reserve memory for your executors which you won't have on a Quickstart VM.
Cluster Specifications : Apache Spark on top of Mesos with 5 Vms and HDFS as storage.
spark-env.sh
export SPARK_LOCAL_IP=192.168.xx.xxx #to set the IP address Spark binds to on this node
enter code here`export MESOS_NATIVE_JAVA_LIBRARY="/home/xyz/tools/mesos-1.0.0/build/src/.libs/libmesos-1.0.0.so" #to point to your libmesos.so if you use Mesos
export SPARK_EXECUTOR_URI="hdfs://vm8:9000/spark-2.0.0-bin-hadoop2.7.tgz"
HADOOP_CONF_DIR="/usr/local/tools/hadoop" #To point Spark towards Hadoop configuration files
spark-defaults.conf
spark.executor.uri hdfs://vm8:9000/spark-2.0.0-bin-hadoop2.7.tgz
spark.driver.host 192.168.xx.xxx
spark.rpc netty
spark.rpc.numRetries 5
spark.ui.port 48888
spark.driver.port 48889
spark.port.maxRetries 32
I did some experiments with submitting word-count scala application in cluster mode, I observed that it executes successfully only when it finds driver program (containing main method) from the Vm it was submitted. As per my knowledge scheduling of resources (VMs) is handled by Mesos. for example if i submit my application from vm12 and coincidently if Mesos also schedules vm12 for executing application then it will execute successfully.In contrast it will fail if mesos scheduler decides to allocate let's say vm15.I checked logs in stderr of mesos UI and found error..
16/09/27 11:15:49 ERROR SparkContext: Error initializing SparkContext.
Besides I tried looking for configuration aspects of spark in following link.
[http://spark.apache.org/docs/latest/configuration.html][1] I tried setting rpc as it seemed necessary to keep driver program near to worker-node in LAN.
But couldn't get much insights.
I also tried uploading my code (application) in HDFS and submitting application jar file from HDFS.The same observations I received.
While connecting apache-spark with Mesos according to the documentation in
following link http://spark.apache.org/docs/latest/running-on-mesos.html
I also tried configuring spark-defaults.conf, spark-env.sh in other VM's in order to check if it successfully runs at least from 2 Vm's. That also didn't workout.
Am I missing any conceptual clarity here.?
So how can I make my application run successfully regardless of Vm's I'm submitting from ?
I'm trying to submit a spark job to a remote master from my notebook. I've got a local spark installation, so I can run
./bin/spark-submit --class "a.b.C" --master spark://198.51.100.1:7077 app.jar (...)
Due to firewall policy, nat, etc. I can reach the spark master (198.51.100.1) from my notebook (192.168.0.1), but not the other way around.
The problem is that my local spark installation tries to distribute code to the workers
SparkContext: Added JAR file:/path/to/app.jar at http://192.168.0.1:52605/jars/app.jar with timestamp 1439369933876
which must fail, because the workers have no route to my notebook
WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkDriver#192.168.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters.
So, how can I submit my application to the master and force the master to distribute my code to the workers?
Or did I get this all wrong and there's another reason for my problem here?
You can upload you app.jar to a location that is visible inside you cluster (e.g. HDFS) and use cluster deploy mode when launching your app:
./bin/spark-submit --deploy-mode cluster .... hdfs://path/to.jar
See Submitting Applications for more details.