My Spark's Worker cannot connect Master.Something wrong with Akka? - apache-spark

I want to install Spark Standlone mode to a Cluster with my two virtual machines.
With the version of spark-0.9.1-bin-hadoop1, I execute spark-shell successfully in each vm. I follow the offical document to make one vm(ip:xx.xx.xx.223) as both Master and Worker and to make the other(ip:xx.xx.xx.224) as Worker only.
But the 224-ip vm cannot connect the 223-ip vm.
Followed is 223(Master)'s master log:
[#tc-52-223 logs]# tail -100f spark-root-org.apache.spark.deploy.master.Master-1-tc-52-223.out
Spark Command: /usr/local/jdk/bin/java -cp :/data/test/spark-0.9.1-bin-hadoop1/conf:/data/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip 10.11.52.223 --port 7077 --webui-port 8080
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/04/14 22:17:03 INFO Master: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:03 INFO Master: Starting Spark master at spark://10.11.52.223:7077
14/04/14 22:17:03 INFO MasterWebUI: Started Master web UI at http://tc-52-223:8080
14/04/14 22:17:03 INFO Master: I have been elected leader! New state: ALIVE
14/04/14 22:17:06 INFO Master: Registering worker tc-52-223:20599 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisteredWorker] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:17:26 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:26 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorkerFailed] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:17:46 INFO Master: Registering worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:46 INFO RemoteActorRefProvider$RemoteDeadLetterActorRef: Message [org.apache.spark.deploy.DeployMessages$RegisterWorkerFailed] from Actor[akka://sparkMaster/user/Master#1972530850] to Actor[akka://sparkMaster/deadLetters] was not delivered. [3] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker#tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker#tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 INFO LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%4010.11.52.224%3A61550-1#646150938] was not delivered. [4] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker#tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.11.52.223:7077] -> [akka.tcp://sparkWorker#tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker#tc_52_224:21371 got disassociated, removing it.
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.11.52.223:7077] -> [akka.tcp://sparkWorker#tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 ERROR EndpointWriter: AssociationError [akka.tcp://sparkMaster#10.11.52.223:7077] -> [akka.tcp://sparkWorker#tc_52_224:21371]: Error [Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkWorker#tc_52_224:21371]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: tc_52_224/10.11.52.224:21371
]
14/04/14 22:18:06 INFO Master: akka.tcp://sparkWorker#tc_52_224:21371 got disassociated, removing it.
14/04/14 22:19:03 WARN Master: Removing worker-20140414221705-tc_52_224-21371 because we got no heartbeat in 60 seconds
14/04/14 22:19:03 INFO Master: Removing worker worker-20140414221705-tc_52_224-21371 on tc_52_224:21371
Followed is 223(Worker)'s worker log:
14/04/14 22:17:06 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:06 INFO Worker: Starting Spark worker tc-52-223:20599 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Worker: Spark home: /data/test/spark-0.9.1-bin-hadoop1
14/04/14 22:17:06 INFO WorkerWebUI: Started Worker web UI at http://tc-52-223:8081
14/04/14 22:17:06 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:06 INFO Worker: Successfully registered with master spark://xx.xx.52.223:7077
Followed is 224(Worker)'s work log:
Spark Command: /usr/local/jdk/bin/java -cp :/data/test/spark-0.9.1-bin-hadoop1/conf:/data/test/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://10.11.52.223:7077 --webui-port 8081
========================================
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
14/04/14 22:17:06 INFO Worker: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/04/14 22:17:06 INFO Worker: Starting Spark worker tc_52_224:21371 with 1 cores, 4.0 GB RAM
14/04/14 22:17:06 INFO Worker: Spark home: /data/test/spark-0.9.1-bin-hadoop1
14/04/14 22:17:06 INFO WorkerWebUI: Started Worker web UI at http://tc_52_224:8081
14/04/14 22:17:06 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:26 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:17:46 INFO Worker: Connecting to master spark://xx.xx.52.223:7077...
14/04/14 22:18:06 ERROR Worker: All masters are unresponsive! Giving up.
Followed is my spark-env.sh:
JAVA_HOME=/usr/local/jdk
export SPARK_MASTER_IP=tc-52-223
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=4g
export MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}
export SPARK_LOCAL_IP=tc-52-223
I have googled many solutions, but they cant work.
Please help me.

I'm not sure if this is the same issue I encountered but you may want to try setting SPARK_MASTER_IP the same as what spark binds to. In your example is looks like it would be 10.11.52.223 and not tc-52-223.
It should be the same as what you see when you visit the master node web UI on 8080. Something like: Spark Master at spark://ec2-XX-XX-XXX-XXX.compute-1.amazonaws.com:7077

If you are getting a "Connection refused" exception, You can resolve it by checking
=> Master is running on the specific host
netstat -at | grep 7077
You will get something similar to:
tcp 0 0 akhldz.master.io:7077 *:* LISTEN
If that is the case, then from your worker machine do a
host akhldz.master.io ( replace akhldz.master.io with your master host.If something goes wrong, then add a host entry in your /etc/hosts file)
telnet akhldz.master.io 7077 ( If this is not connecting, then your worker wont connect either. )
=> Adding Host entry in /etc/hosts
Open /etc/hosts from your worker machine and add the following entry (example)
192.168.100.20 akhldz.master.io
PS :In the above case Pillis was having two ip addresses having same host name
eg:
192.168.100.40 s1.machine.org
192.168.100.41 s1.machine.org
Hope that help.

There's a lot of answers and possible solutions, and this question is a bit old, but in the interest of completeness, there is a known Spark bug about hostnames resolving to IP addresses. I am not presenting this as the complete answer in all cases, but I suggest trying with a baseline of just using all IPs, and only use the single config SPARK_MASTER_IP. With just those two practices I get my clusters to work and all the other configs, or using hostnames, just seems to muck things up.
So in your spark-env.sh get rid of SPARK_WORKER_IP and change SPARK_MASTER_IP to an IP address, not a hostname.
I have treated this more at length in this answer.
For more completeness here's part of that answer:
Can you ping the box where the Spark master is running? Can you ping
the worker from the master? More importantly, can you password-less
ssh to the worker from the master box? Per 1.5.2 docs you need to be
able to do that with a private key AND have the worker entered in the
conf/slaves file. I copied the relevant paragraph at the end.
You can get a situation where the worker can contact the master but
the master can't get back to the worker so it looks like no connection
is being made. Check both directions.
I think the slaves file on the master node, and the password-less ssh can lead to similar errors to what you are seeing.
Per the answer I crosslinked, there's also an old bug but it's not clear how that bug was resolved.

set the port for spark worker also, Eg.: SPARK_WORKER_PORT=5078 ... check thespark-installation link for correct installation

basically your ports are blocked so communication from master to worker is cut down. check here https://spark.apache.org/docs/latest/configuration.html#networking
In the "Networking" section, you can see some of the ports are by default random. You can set them to your choice like below:
val conf = new SparkConf()
.setMaster(master)
.setAppName("namexxx")
.set("spark.driver.port", "51810")
.set("spark.fileserver.port", "51811")
.set("spark.broadcast.port", "51812")
.set("spark.replClassServer.port", "51813")
.set("spark.blockManager.port", "51814")
.set("spark.executor.port", "51815")

I my case, I could overcome the problem as "adding entry of hostname and IP adres of localhost to /etc/hosts file" as follows:
For a cluster, master has the /etc/hosts content as follows:
127.0.0.1 master.yourhost.com localhost localhost4 localhost.localdomain
192.168.1.10 slave1.yourhost.com
192.168.1.9 master.yourhost.com **# this line solved the problem**
Then I also do the SAME THING on slave1.yourhost.com machine.
Hope this helps..

I had faced same issue . you can resolve it by below procedure ,
first you should go to /etc/hosts file and comment 127.0.1.1 address .
then you should go towards spark/sbin directory , then you should started spark session by these command ,
./start-all.sh
or you can use ./start-master.sh and ./start-slave.sh for the same . Now if you will run spark-shell or pyspark or any other spark component then it will automatically create spark context object sc for you .

Related

Spark - Master: got disassociated, removing it

I am deploying a Spark cluster with 1 Master node and 3 worker nodes. Upon moments of deploying the Master and Worker nodes, the master starts spamming the logs with the following messages;
19/07/17 12:56:51 INFO Master: I have been elected leader! New state: ALIVE
19/07/17 12:56:56 INFO Master: Registering worker 172.26.140.209:35803 with 1 cores, 2.0 GB RAM
19/07/17 12:56:57 INFO Master: 172.26.140.163:59146 got disassociated, removing it.
19/07/17 12:56:58 INFO Master: 172.26.140.132:56252 got disassociated, removing it.
19/07/17 12:56:58 INFO Master: 172.26.140.194:62135 got disassociated, removing it.
19/07/17 12:57:02 INFO Master: Registering worker 172.26.140.169:44249 with 1 cores, 2.0 GB RAM
19/07/17 12:57:02 INFO Master: 172.26.140.163:59202 got disassociated, removing it.
19/07/17 12:57:03 INFO Master: 172.26.140.132:56355 got disassociated, removing it.
19/07/17 12:57:03 INFO Master: 172.26.140.194:62157 got disassociated, removing it.
19/07/17 12:57:07 INFO Master: 172.26.140.163:59266 got disassociated, removing it.
19/07/17 12:57:08 INFO Master: 172.26.140.132:56376 got disassociated, removing it.
19/07/17 12:57:08 INFO Master: Registering worker 172.26.140.204:43921 with 1 cores, 2.0 GB RAM
19/07/17 12:57:08 INFO Master: 172.26.140.194:62203 got disassociated, removing it.
19/07/17 12:57:12 INFO Master: 172.26.140.163:59342 got disassociated, removing it.
19/07/17 12:57:13 INFO Master: 172.26.140.132:56392 got disassociated, removing it.
19/07/17 12:57:13 INFO Master: 172.26.140.194:62268 got disassociated, removing it.
19/07/17 12:57:17 INFO Master: 172.26.140.163:59417 got disassociated, removing it.
19/07/17 12:57:18 INFO Master: 172.26.140.132:56415 got disassociated, removing it.
19/07/17 12:57:18 INFO Master: 172.26.140.194:62296 got disassociated, removing it.
19/07/17 12:57:22 INFO Master: 172.26.140.163:59472 got disassociated, removing it.
19/07/17 12:57:23 INFO Master: 172.26.140.132:56483 got disassociated, removing it.
19/07/17 12:57:23 INFO Master: 172.26.140.194:62323 got disassociated, removing it.
The Worker Nodes seem to be connected to the Master correctly and are logging the following;
19/07/17 12:56:56 INFO Utils: Successfully started service 'sparkWorker' on port 35803.
19/07/17 12:56:56 INFO Worker: Starting Spark worker 172.26.140.209:35803 with 1 cores, 2.0 GB RAM
19/07/17 12:56:56 INFO Worker: Running Spark version 2.4.3
19/07/17 12:56:56 INFO Worker: Spark home: /opt/spark
19/07/17 12:56:56 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
19/07/17 12:56:56 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://spark-worker-0.spark-worker-service.default.svc.cluster.local:8081
19/07/17 12:56:56 INFO Worker: Connecting to master spark-master-service.default.svc.cluster.local:7077...
19/07/17 12:56:56 INFO TransportClientFactory: Successfully created connection to spark-master-service.default.svc.cluster.local/10.0.179.236:7077 after 49 ms (0 ms spent in bootstraps)
19/07/17 12:56:56 INFO Worker: Successfully registered with master spark://172.26.140.196:7077
But the Master still logs the disassociating error, for three separate nodes, every 5 seconds.
What is strange is that the IP addresses listed in the Masters logs are all from the kube-proxy service;
kube-system kube-proxy-5vp9r 1/1 Running 0 39h 172.26.140.163 aks-agentpool-31454219-2 <none> <none>
kube-system kube-proxy-kl695 1/1 Running 0 39h 172.26.140.132 aks-agentpool-31454219-1 <none> <none>
kube-system kube-proxy-xgjws 1/1 Running 0 39h 172.26.140.194 aks-agentpool-31454219-0 <none> <none>
My questions are two-fold;
1) Why are the kube-proxy nodes connecting to the Master? Or why does the Master node think that the kube-proxy nodes are taking part in this cluster?
2) What setting do I need to change in order to clear this message from my log files.
Here is the contents of my spark-defaults.conf file
spark.master=spark://spark-master-service:7077
spark.submit.deploy-mode=cluster
spark.executor.cores=1
spark.driver.memory=500m
spark.executor.memory=500m
spark.eventLog.enabled=true
spark.eventLog.dir=/mnt/eventLog
I cannot find any meaningful reason why this is occurring and any assistance would be greatly appreciated.
I had the same problem with my Spark Cluster in Kubernetes, tested spark 2.4.3 and Spark 2.4.4 and also Kubernetes 16.0 and 13.0
This is the solution:
This is how I got my spark object first
spark = SparkSession.builder.appName('Kubernetes-Spark-app').getOrCreate()
and the issue was resolved, by using the cluster ip of the Spark master!
spark = SparkSession.builder.master('spark://10.0.106.83:7077').appName('Kubernetes-Spark-app').getOrCreate()
works with this chart
helm install microsoft/spark --generate-name

Why am I getting "Removing worker because we got no heartbeat in 60 seconds" on Spark master

I think I might of stumbled across a bug and wanted to get other people's input. I am running a pyspark application using Spark 2.2.0 in standalone mode. I am doing a somewhat heavy transformation in python inside a flatMap and the driver keeps killing the workers.
Here is what am I seeing:
The master after 60s of not seeing any heartbeat message from the workers it prints out this message to the log:
Removing worker [worker name] because we got no heartbeat in 60
seconds
Removing worker [worker name] on [IP]:[port]
Telling app of
lost executor: [executor number]
I then see in the driver log the following message:
Lost executor [executor number] on [executor IP]: worker lost
The worker then terminates and I see this message in its log:
Driver commanded a shutdown
I have looked at the Spark source code and from what I can tell, as long as the executor is alive it should send a heartbeat message back as it is using a ThreadUtils.newDaemonSingleThreadScheduledExecutor to do this.
One other thing that I noticed while I was running top on one of the workers, is that the executor JVM seems to be suspended throughout this process. There are as many python processes as I specified in the SPARK_WORKER_CORES env variable and each is consuming close to 100% of the CPU.
Anyone have any thoughts on this?
I was facing this same issue, increasing interval worked.
Excerpt from Logs start-all.sh logs
INFO Utils: Successfully started service 'sparkMaster' on port 7077.
INFO Master: Starting Spark master at spark://master:7077
INFO Master: Running Spark version 3.0.1
INFO Utils: Successfully started service 'MasterUI' on port 8080.
INFO MasterWebUI: Bound MasterWebUI to 0.0.0.0, and started at http://master:8080
INFO Master: I have been elected leader! New state: ALIVE
INFO Master: Registering worker slave01:41191 with 16 cores, 15.7 GiB RAM
INFO Master: Registering worker slave02:37853 with 16 cores, 15.7 GiB RAM
WARN Master: Removing worker-20210618205117-slave01-41191 because we got no heartbeat in 60 seconds
INFO Master: Removing worker worker-20210618205117-slave01-41191 on slave01:41191
INFO Master: Telling app of lost worker: worker-20210618205117-slave01-41191
WARN Master: Removing worker-20210618204723-slave02-37853 because we got no heartbeat in 60 seconds
INFO Master: Removing worker worker-20210618204723-slave02-37853 on slave02:37853
INFO Master: Telling app of lost worker: worker-20210618204723-slave02-37853
WARN Master: Got heartbeat from unregistered worker worker-20210618205117-slave01-41191. This worker was never registered, so ignoring the heartbeat.
WARN Master: Got heartbeat from unregistered worker worker-20210618204723-slave02-37853. This worker was never registered, so ignoring the heartbeat.
Solution: add following configs to $SPARK_HOME/conf/spark-defaults.conf
spark.network.timeout 50000
spark.executor.heartbeatInterval 5000
spark.worker.timeout 5000

How to resolve master ip got disassociated issue?

7/04/29 11:39:36 INFO Master: Launching driver driver-20170429113936-0000 on worker worker-20170429113809-192.168.5.197-7078
17/04/29 11:39:42 INFO Master: 192.168.5.5:35660 got disassociated, removing it.
17/04/29 11:39:42 INFO Master: 192.168.5.5:35658 got disassociated, removing it.
17/04/29 11:39:42 INFO Master: 192.168.5.5:39706 got disassociated, removing it.
I got most of the time when i start spark standalone cluster. What will be reason of master IP got disassociated always and How to resolve this ?

Spark Shell Yarn Client Mode - Akka AssociationError

When I launch Spark Shell using:
spark-shell --master yarn --deploy-mode client
I'm getting the following error:
16/03/21 20:52:29 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#ipaddress10:47915] -> [akka.tcp://sparkExecutor#hostname02:48703]: Error [Association failed with [akka.tc
p://sparkExecutor#hostname02:48703]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#hostname02:48703]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host
]
akka.event.Logging$Error$NoCause$
16/03/21 20:52:29 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#ipaddress10:47915] -> [akka.tcp://sparkExecutor#hostname02:48703]: Error [Association failed with [akka.tc
p://sparkExecutor#hostname02:48703]] [
akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#hostname02:48703]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host
]
akka.event.Logging$Error$NoCause$
16/03/21 20:52:32 ERROR YarnScheduler: Lost executor 3 on hostname01: remote Rpc client disassociated
16/03/21 20:52:32 INFO DAGScheduler: Executor lost: 3 (epoch 0)
16/03/21 20:52:32 INFO BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster.
16/03/21 20:52:32 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, hostname01, 37497)
16/03/21 20:52:32 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/03/21 20:52:32 INFO ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0)
Firewall & Iptables are turned off. Machines in the cluster are mutually ping-able on all the ports.
But i'm puzzled why I'm still getting "akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host"
Any help please.
Probably you have a name resolution issue. You should try using IP addresses in your settings (for instance in slaves file) rather than names to confirm this hypothesis.
I have experienced the same problem before. I found that I have mistyped some environement variables regarding SPARK_LOCAL_IP and SPARK_LOCAL_DNS
To resolve your problem, you have to:
In all your nodemanager nodes, check the .bashrc and .bash_profile files that you have set the env variables to right values : SPARK_LOCAL_IP and SPARK_PUBLIC_DNS, then restart your nodemanger(s)
In your client machine (where you issue the command spark-shell) set the values of the previous env variables to your client machine IP and hostname

Spark streaming application on a single virtual machine, standalone mode

I have created spark streaming application, which worked fine when deploy mode was client.
On my virtual machine I have master and only one worker.
When I tried to change mode to "cluster" it fails. In web UI, I see that the driver is running, but application is failed.
EDITED
In the log, I see following content:
16/03/23 09:06:25 INFO Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
16/03/23 09:06:25 INFO Master: Launching driver driver-20160323090625-0001 on worker worker-20160323085541-10.0.2.15-36648
16/03/23 09:06:32 INFO Master: metering.dev.enerbyte.com:37168 got disassociated, removing it.
16/03/23 09:06:32 INFO Master: 10.0.2.15:59942 got disassociated, removing it.
16/03/23 09:06:32 INFO Master: metering.dev.enerbyte.com:37166 got disassociated, removing it.
16/03/23 09:06:46 INFO Master: Registering app wibeee-pipeline
16/03/23 09:06:46 INFO Master: Registered app wibeee-pipeline with ID app-20160323090646-0007
16/03/23 09:06:46 INFO Master: Launching executor app-20160323090646-0007/0 on worker worker-20160323085541-10.0.2.15-36648
16/03/23 09:06:50 INFO Master: Received unregister request from application app-20160323090646-0007
16/03/23 09:06:50 INFO Master: Removing app app-20160323090646-0007
16/03/23 09:06:50 WARN Master: Got status update for unknown executor app-20160323090646-0007/0
16/03/23 09:06:50 INFO Master: metering.dev.enerbyte.com:37172 got disassociated, removing it.
16/03/23 09:06:50 INFO Master: 10.0.2.15:45079 got disassociated, removing it.
16/03/23 09:06:51 INFO Master: Removing driver: driver-20160323090625-0001
So what happens is that master launches the driver on the worker,application gets registered, and then executir is tried to be launched on the same worker, which fails (although I have only one worker!)
EDIT
Can the issue be related to the fact that I use checkpointing, because I have "updateStateByKey" transformation in my code. It is set to "/tmp", but I always get a warning that "when run in cluster mode, "/tmp" needs to change. How should I set it?
Can that be the reason of my problem?
Thank you
According to log you have provided, it may not because of properties file but check this.
spark-submit only copies jar file to driver when running in cluster mode, so if your application tries to read properties file kept in the system from where you are running spark-submit, driver can not find it when running in cluster mode.
reading from properties file works in client mode because driver starts at the same machine where your are executing spark-submit.
You can copy properties to same directory in all nodes or keep properties file in cassandra file system and read from there.

Resources