Fail to connect remotely to Spark Master node inside a docker container - apache-spark

I created a spark cluster based in this link.
Everything went smooth but the problem is after the cluster created im trying to use pyspark to connect remotely to the container inside the host from other machine.
I'm receiving a 18/04/04 17:14:48 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master xxxx.xxxx:7077 even though i can connect through telnet to the 7077 port from that host!
What may i be missing out?

Related

Apache PySpark - Failed to connect to master 7077

I set up Spark and HDFS after watching this video. The only difference is that I did it on a server (ubuntu) and not on a VM.
On the server, everything works perfect. Now I wanted to access it from my local machine (Windows) with PySpark.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("spark://ubuntu-spark:7077").appName("test").getOrCreate()
spark.stop()
However, here I get the following error messages:
22/11/12 10:38:35 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see
https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
22/11/12 10:38:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
22/11/12 10:38:37 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master
ubuntu-spark:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: ...
According to other posts, the DNS should be correct. I got this from the Spark Master website (at port 8080):
URL: spark://ubuntu-spark:7077
Alive Workers: 1
Cores in use: 2 Total, 0 Used
Memory in use: 6.8 GiB Total, 0.0 B Used
Resources in use:
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
The ports are open. I also don't understand the following message: "HADOOP_HOME and hadoop.home.dir are unset." Hadoop is configured on the server. Why should I do the same thing locally again? My expectation would be that I can use Spark like an API or am I wrong?
Thank you very much for your help. If you need any configuration files I can provide them.
Hadoop should not be necessary for the code shown since you're not using HDFS, but the log is saying it's looking on your Windows machine for those settings.
DNS needs to work between your windows machine and wherever your server is running (a VM can still be server, so it's unclear where you're running this). Start debugging with ping spark-master to check, or you should be able to open spark-master:8080 from Windows browser instance as well.
If you only want to run Spark code, and don't care if it's distributed, you could just use Docker on Windows - https://github.com/jupyter/docker-stacks
Or setup Pycharm all locally for the same

Start Spark master on the IP instead of Hostname

I'm trying to set up a remote Spark 2.4.5 cluster on Ubuntu 18. After I start ./sbin/stat-master.sh WebUI is available at <INSTANCE-IP>:8080 but it shows "Spark Master at spark://spark-master:7077" where spark-master is my hostname on the remote machine.
I'm able to start a worker with ./sbin/start-slave.sh spark://spark-master:7077 only, but <INSTANCE-IP>:4040 doesn't work. When I try ./sbin/start-slave.sh spark://<INSTANCE-IP>:7077 I can see the process but the worker is not visible in WebUI.
As a result, I can not connect to the cluster from my local machine with spark-shell --master spark://<INSTANCE-IP>:7077. The error is:
StandaloneAppClient$ClientEndpoint: Failed to connect to master <INSTANCE-IP>:7077

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. edit: the environment was macOS
I can execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile the job is running it hangs YARN-UI for exactly 10 minutes. YARN still working and if I connect via ssh I can execute yarn commands.
If I don't access SparkUI (directly or through YARN) nothing happens. Jobs are executed and YARN-UI is not hanged.
More info:
Local, on Docker: Spark 2.1.2, Hadoop 2.6.0-cdh5.4.3
Production: Spark 2.1.0, Hadoop 2.6.0-cdh5.4.3
If I execute it locally (--master local[*]) it works and I can connect to SparkUI though 4040.
Spark config:
spark.driver.bindAddress 172.17.0.2 #docker_eth0_ip
spark.driver.host 192.168.XXX.XXX #local_ip
spark.driver.port 5001
spark.ui.port 4040
spark.blockManager.port 5003
Yes, ApplicationMaster and nodes have visibility over my local SparkUI or driver (telnet test)
As I said I can execute jobs then docker expose ports and its binding is working. Some logs proving it:
INFO ApplicationMaster: Driver now available: 192.168.XXX.XXX:5001
INFO TransportClientFactory: Successfully created connection to /192.168.XXX.XXX:5001 after 65 ms (0 ms spent in bootstraps)
INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> jobtracker.hadoop, PROXY_URI_BASES -> http://jobtracker.hadoop:8088/proxy/application_000_000),/proxy/application_000_000)
Some ideas or where I can look to see what's happening?
The problem was related with how docker manage IP incoming requests when it's executed on MacOS.
When YARN, which's running inside docker container, receives a request doesn't see original IP it sees the internal proxy docker IP (in my case 172.17.0.1).
When a request is send to my local container SparkUI, automatically redirects the request to hadoop master (is how YARN works) because it see that the request is not coming from hadoop master and it only accepts requests from this source.
When master receives the forwarded request it tries to send it to spark driver (my local docker container) which forward again the request to hadoop master because it see that the IP source is not the master, is the proxy IP.
It takes all threads reserved for UI. Until threads are not released YARN UI is hanged
I "solved" changing docker yarn configuration
<property>
<name>yarn.web-proxy.address</name>
<value>172.17.0.1</value>
</property>
This allows sparkUI to handle any request made to docker container.

Datastax Spark worker is always looking for master at 127.0.0.1

I am trying to bring up datastax cassandra in analytics mode by using "dse cassandra -k -s". I am using DSE 5.0 sandbox on a single node setup.
I have configured the spark-env.sh with SPARK_MASTER_IP as well as SPARK_LOCAL_IP to point to my LAN IP.
export SPARK_LOCAL_IP="172.40.9.79"
export SPARK_MASTER_HOST="172.40.9.79"
export SPARK_WORKER_HOST="172.40.9.79"
export SPARK_MASTER_IP="172.40.9.79"
All above variables are setup in spark-env.sh.
Despite these, the worker will not come up. It is always looking for a master at 127.0.0.1.This is the error i am seeing in /var/log/cassandra/system.log
WARN [worker-register-master-threadpool-8] 2016-10-04 08:02:45,832 SPARK-WORKER Logging.scala:91 - Failed to connect to master 127.0.0.1:7077
java.io.IOException: Failed to connect to /127.0.0.1:7077
Result from dse client-tool shows 127.0.0.1
$ dse client-tool -u cassandra -p cassandra spark master-address
spark://127.0.0.1:7077
However i am able to access the spark web UI from the LAN IP 172.40.9.79
Spark Web UI screenshot
Any help is greatly appreciated
Try add in file spark-defaults.conf this parameter: spark.master local[*] or spark.master 172.40.9.79. Maybe this solves your problem

Apache Spark standalone cluster network troubles

I have ubuntu server on azure cloud with installed spark-1.6.0-bin-hadoop2.6 on it. I want to start standalone cluster.
I am starting master by executing ./start-master.sh -h 10.0.0.4(it's internal ip) and after that can access web-ui in http://[master-public-ip]:8080
Next, i also have ubuntu server as a slave in different network. I start it by ./start-slave.sh spark://[master-public-ip]:7077. I can see successfully registered slave in master web-ui.
When i submit application on master with the following command:
./spark-submit --class com.MyClass --deploy-mode client \
--master spark://[master-public-ip]:7077 \
/home/user/my_jar.jar
I am getting the following error in slave web-ui:
Exception in thread "main" java.io.IOException: Failed to connect to /10.0.0.4:33742
So, master can connect to slave, but slave doesn't. How can i change configuration to slave was connected to master through public ip?
Setting SPARK_MASTER_IP to public ip doesn't work.

Resources