When SSL is enabled, Spark UI doesn't use HTTPS, but forwards to port 0 over HTTP - apache-spark

We are using Spark 2.0.2 and Hadoop 3.2.1. I have gotten SSL configured across Hadoop without any trouble. But Spark is having some trouble.
Without SSL, I can launch a job and view the Spark UI, proxied through Yarn.
When I enable SSL, I can still launch a job and run to completion, but I can not access the Spark Web UI. Yarn links to an ApplicationMaster proxy that fails with java.net.ConnectException: Connection refused (Connection refused)
Digging into the job's stdout, I see lines like this:
2020-04-14 16:21:57,258 INFO server.ServerConnector: Started ServerConnector#1484944f{HTTP/1.1}{0.0.0.0:40809}
2020-04-14 16:21:57,303 INFO server.ServerConnector: Started ServerConnector#799e8b3a{SSL-HTTP/1.1}{0.0.0.0:36015}
2020-04-14 16:21:57,303 INFO server.Server: Started #5614ms
2020-04-14 16:21:57,304 INFO util.Utils: Successfully started service 'SparkUI' on port 40809.
2020-04-14 16:21:57,309 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at https://10.10.75.40:40809
If I call up https://10.10.75.40:40809 I get an error that it isn't SSL.
If I call up http://10.10.75.40:40809 I get served a bogus redirect to port 10.10.75.40:0.
If I call up https://10.10.75.40:36015 and ignore certificate errors, or if I call via the correct hostname, I get redirected to the RM's :8090/proxy/redirect which throws the 500 error again.
My configuration consists of:
spark.ssl.enabled true
spark.ssl.keyStore /etc/sslmate/STAR.mtv.qxxxxxxxxd.com.jks
spark.ssl.keyStorePassword _the password_
spark.ssl.trustStore /etc/sslmate/STAR.mtv.qxxxxxxxxd.com.jks
spark.ssl.trustStorePassword _the password_
spark.ssl.protocol TLSv1.2
I have also experimented with trying to set ports explicitly, to no avail:
spark.ssl.ui.port 4440
spark.ssl.history.port 18480
spark.ui.driver.port 2020
spark.ssl.ui.driver.port 2420

I could not find this documented via Apache, but I found a note from MapR that got me working:
Spark UI SSL is not needed when running Spark on YARN because encryption is provided by the YARN protocol. To disable Spark SSL, add spark.ssl.ui.enabled false to the spark-defaults.conf file on each spark node.
When configured in this manner, the ResourceManager correctly provides the Spark UI via a :8090/proxy/redirect type URL. In the stderr I see output along these lines:
2020-04-14 17:36:38,098 INFO server.ServerConnector: Started ServerConnector#7ef020c8{HTTP/1.1}{0.0.0.0:45951}
2020-04-14 17:36:38,098 INFO server.Server: Started #7871ms
2020-04-14 17:36:38,107 INFO util.Utils: Successfully started service 'SparkUI' on port 45951.
2020-04-14 17:36:38,113 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.10.31.230:45951

Related

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. edit: the environment was macOS
I can execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile the job is running it hangs YARN-UI for exactly 10 minutes. YARN still working and if I connect via ssh I can execute yarn commands.
If I don't access SparkUI (directly or through YARN) nothing happens. Jobs are executed and YARN-UI is not hanged.
More info:
Local, on Docker: Spark 2.1.2, Hadoop 2.6.0-cdh5.4.3
Production: Spark 2.1.0, Hadoop 2.6.0-cdh5.4.3
If I execute it locally (--master local[*]) it works and I can connect to SparkUI though 4040.
Spark config:
spark.driver.bindAddress 172.17.0.2 #docker_eth0_ip
spark.driver.host 192.168.XXX.XXX #local_ip
spark.driver.port 5001
spark.ui.port 4040
spark.blockManager.port 5003
Yes, ApplicationMaster and nodes have visibility over my local SparkUI or driver (telnet test)
As I said I can execute jobs then docker expose ports and its binding is working. Some logs proving it:
INFO ApplicationMaster: Driver now available: 192.168.XXX.XXX:5001
INFO TransportClientFactory: Successfully created connection to /192.168.XXX.XXX:5001 after 65 ms (0 ms spent in bootstraps)
INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> jobtracker.hadoop, PROXY_URI_BASES -> http://jobtracker.hadoop:8088/proxy/application_000_000),/proxy/application_000_000)
Some ideas or where I can look to see what's happening?
The problem was related with how docker manage IP incoming requests when it's executed on MacOS.
When YARN, which's running inside docker container, receives a request doesn't see original IP it sees the internal proxy docker IP (in my case 172.17.0.1).
When a request is send to my local container SparkUI, automatically redirects the request to hadoop master (is how YARN works) because it see that the request is not coming from hadoop master and it only accepts requests from this source.
When master receives the forwarded request it tries to send it to spark driver (my local docker container) which forward again the request to hadoop master because it see that the IP source is not the master, is the proxy IP.
It takes all threads reserved for UI. Until threads are not released YARN UI is hanged
I "solved" changing docker yarn configuration
<property>
<name>yarn.web-proxy.address</name>
<value>172.17.0.1</value>
</property>
This allows sparkUI to handle any request made to docker container.

Using RStudio-sparklyr to connect to local Spark provided by IntelliJ

Good morning,
it maybe sounds like a stupid question, but I would like to access a temporary table in Spark by RStudio. I don't have any Spark cluster, and I only run everything local on my PC.
When I start Spark through IntelliJ, the instance is running fine:
17/11/11 10:11:33 INFO Utils: Successfully started service 'sparkDriver' on port 59505.
17/11/11 10:11:33 INFO SparkEnv: Registering MapOutputTracker
17/11/11 10:11:33 INFO SparkEnv: Registering BlockManagerMaster
17/11/11 10:11:33 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/11 10:11:33 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/11 10:11:33 INFO DiskBlockManager: Created local directory at C:\Users\stephan\AppData\Local\Temp\blockmgr-7ca4e8fb-9456-4063-bc6d-39324d7dad4c
17/11/11 10:11:33 INFO MemoryStore: MemoryStore started with capacity 898.5 MB
17/11/11 10:11:33 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/11 10:11:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/11 10:11:34 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.25.240.1:4040
17/11/11 10:11:34 INFO Executor: Starting executor ID driver on host localhost
17/11/11 10:11:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59516.
17/11/11 10:11:34 INFO NettyBlockTransferService: Server created on 172.25.240.1:59516
But I am not sure about the port, I have to choose in RStudio/sparklyr:
sc <- spark_connect(master = "spark://localhost:7077", spark_home = "C://Users//stephan//Downloads//spark//spark-2.2.0-bin-hadoop2.7", version = "2.2.0")
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
cannot open file 'C:\Users\stephan\AppData\Local\Temp\Rtmp61Ejow\file2fa024ce51af_spark.log': Permission denied
I tried different ports, like 59516, 4040, ... but all led to the same result. The permission denied message I guess can be ignored due that the file is written fine:
17/11/11 01:07:30 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master localhost:7077
Can please anyone assist me, how I can establish a connection between a local running Spark and RStudio, but without that RStudio is running another Spark instance?
Thanks
Stephan
Running standalone Spark cluster is not the same thing as running Spark in local mode in your IDE, which is likely the case here. local mode doesn't create any persistent services.
To run your own "pseudodistributed" cluster:
Download Spark binaries.
Start Spark master using $SPARK_HOME/sbin/start-master.sh script.
Start Spark worker using $SPARK_HOME/sbin/start-slave.sh script and passing master url.
To be able to share tables you'll also need a proper metastore (not Derby).

Issue while opening Spark shell

I am trying to open spark using command
$ spark-shell
but getting warning. How to fix it.
Warning:
WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark port you can change anytime when running in the command prompt
[hadoop#localhost ~]$ spark-shell --conf spark.ui.port=4041
by default, spark run into the 4040
By default, Spark will try to bind port 4040.
In your case there is already a spark process running on 4040.
The following message isn not an error as spark will run on port 4041:
WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
From Spark Documentation:
Every SparkContext launches a web UI, by default on port 4040, that
displays useful information about the application. This includes:
If multiple SparkContexts are running on the same host, they will bind to successive ports
beginning with 4040 (4041, 4042, etc).
The previous answer also helped me to start a spark sheet. On further research, I found that there is 16 attempt given by spark to auto-allocate a port.
Refer to Spark Documentation
A good thing is that spark also suggests how to configure a new unused port and start the spark shell the port
java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries (starting from 4040)! Consider explicitly setting the appropriate port for the service 'SparkUI' (**for example spark.ui.port for SparkUI**) to an available port or increasing spark.port.maxRetries.
Why: When a spark-shell is open, it checks port 4040 for availability. If this port is already in use then it checks the next one "4041", and so on.
Solution: Initiate spark-shell command using below option. Try different UI ports (Instead of 64080, you can try any random non-CDH port numbers)
$ spark-shell --conf spark.ui.port=14058
$ set spark.port.maxRetries=30
To find the pid of the process running on port 4040:
ss -lptn 'sport = :4040'
To terminate it:
kill -9

Can we connect to Spark cluster from remote host via java program?

I am trying to connect to spark cluster from remote system. My java code is shown below.
JAVA Code
new SparkConf()
.setAppName("Java API demo")
.setMaster("spark://192.168.XX.XX:7077")
.set("spark.driver.host","192.168.XX.XX")
.set("spark.driver.port","9929");
It gives me this error:
Error Message
16/04/14 16:08:42 ERROR NettyTransport: failed to bind to /192.168.XX.XX:9929, shutting down Netty transport
16/04/14 16:08:42 WARN Utils: Service 'sparkDriver' could not bind on port 9929. Attempting port 9930.
16/04/14 16:08:42 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
First of all, is this possible in spark? I am using Spark 1.4 version. Thanks in advcance..

Spark hangs on authentication with a Docker Mesos cluster

I'm trying to simulate a multi-node Mesos cluster using Docker and Zookeeper and trying to run a simple (py)Spark job on top of it. These Docker containers and the pyspark script are all run on the same machine. However, when I execute my Spark script, it hangs at:
No credentials provided. Attempting to register without authentication
The Mesos slave constantly outputs:
I0929 14:59:32.925915 62 slave.cpp:1959] Asked to shut down framework 20150929-143802-1224741292-5050-33-0060 by master#172.17.0.73:5050
W0929 14:59:32.926035 62 slave.cpp:1974] Cannot shut down unknown framework 20150929-143802-1224741292-5050-33-0060
And the Mesos master constantly outputs:
I0929 14:38:15.169683 39 master.cpp:2094] Received SUBSCRIBE call for framework 'test' at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693
I0929 14:38:15.169845 39 master.cpp:2164] Subscribing framework test with checkpointing disabled and capabilities [ ]
E0929 14:38:15.170361 42 socket.hpp:174] Shutdown failed on fd=15: Transport endpoint is not connected [107]
I0929 14:38:15.170409 36 hierarchical.hpp:391] Added framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.170534 39 master.cpp:1051] Framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693 disconnected
I0929 14:38:15.170549 39 master.cpp:2370] Disconnecting framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693
I0929 14:38:15.170555 39 master.cpp:2394] Deactivating framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693
E0929 14:38:15.170560 42 socket.hpp:174] Shutdown failed on fd=16: Transport endpoint is not connected [107]
I0929 14:38:15.170593 39 master.cpp:1075] Giving framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693 0ns to failover
W0929 14:38:15.170835 41 master.cpp:4482] Master returning resources offered to framework 20150929-143802-1224741292-5050-33-0000 because the framework has terminated or is inactive
I0929 14:38:15.170855 36 hierarchical.hpp:474] Deactivated framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.170990 37 hierarchical.hpp:814] Recovered cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000] (total: cpus(*):8; mem(*):31092; disk(*):443036; ports(*):[31000-32000
], allocated: ) on slave 20150929-051336-1224741292-5050-19-S0 from framework 20150929-143802-1224741292-5050-33-0000
I0929 14:38:15.171820 41 master.cpp:4469] Framework failover timeout, removing framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0
.1.1:46693
I0929 14:38:15.171835 41 master.cpp:5112] Removing framework 20150929-143802-1224741292-5050-33-0000 (test) at scheduler-2f4e1e52-a04a-401f-b9aa-1253554fe73b#127.0.1.1:46693
I0929 14:38:15.172130 41 hierarchical.hpp:428] Removed framework 20150929-143802-1224741292-5050-33-0000
The Mesos master Docker image is built with the following Dockerfile
FROM ubuntu:14.04
ENV MESOS_V 0.24.0
# update
RUN apt-get update
RUN apt-get upgrade -y
# dependencies
RUN apt-get install -y wget openjdk-7-jdk build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev
# mesos
RUN wget http://www.apache.org/dist/mesos/${MESOS_V}/mesos-${MESOS_V}.tar.gz
RUN tar -zxf mesos-*.tar.gz
RUN rm mesos-*.tar.gz
RUN mv mesos-* mesos
WORKDIR mesos
RUN mkdir build
RUN ./configure
RUN make
RUN make install
RUN ldconfig
EXPOSE 5050
ENTRYPOINT ["/bin/bash"]
And I manually execute the mesos-master command:
LIBPROCESS_IP=${MASTER_IP} mesos-master --registry=in_memory --ip=${MASTER_IP} --zk=zk://172.17.0.75:2181/mesos --advertise_ip=${MASTER_IP}
The Mesos slave Docker image is built using the same Dockerfile except port 5051 is exposed instead. Then I run the following command in its container:
LIBPROCESS_IP=172.17.0.72 mesos-slave --master=zk://172.17.0.75:2181/mesos
The pyspark script is:
import os
import pyspark
src = 'file:///{}/README.md'.format(os.environ['SPARK_HOME'])
leader_ip = '172.17.0.75'
conf = pyspark.SparkConf()
conf.setMaster('mesos://zk://{}:2181/mesos'.format(leader_ip))
conf.set('spark.executor.uri', 'http://d3kbcqa49mib13.cloudfront.net/spark-1.5.0-bin-hadoop2.6.tgz')
conf.setAppName('my_test_app')
sc = pyspark.SparkContext(conf=conf)
lines = sc.textFile(src)
words = lines.flatMap(lambda x: x.split(' '))
word_count = (words.map(lambda x: (x, 1)).reduceByKey(lambda x, y: x+y))
print(word_count.collect())
Here is the complete output of the pyspark script:
15/09/29 11:07:59 INFO SparkContext: Running Spark version 1.5.0
15/09/29 11:07:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/29 11:07:59 WARN Utils: Your hostname, hubble resolves to a loopback address: 127.0.1.1; using 192.168.1.2 instead (on interface em1)
15/09/29 11:07:59 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/09/29 11:07:59 INFO SecurityManager: Changing view acls to: ftseng
15/09/29 11:07:59 INFO SecurityManager: Changing modify acls to: ftseng
15/09/29 11:07:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ftseng); users with modify permissions: Set(ftseng)
15/09/29 11:08:00 INFO Slf4jLogger: Slf4jLogger started
15/09/29 11:08:00 INFO Remoting: Starting remoting
15/09/29 11:08:00 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#192.168.1.2:38860]
15/09/29 11:08:00 INFO Utils: Successfully started service 'sparkDriver' on port 38860.
15/09/29 11:08:00 INFO SparkEnv: Registering MapOutputTracker
15/09/29 11:08:00 INFO SparkEnv: Registering BlockManagerMaster
15/09/29 11:08:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-28695bd2-fc83-45f4-b0a0-eefcfb80a3b5
15/09/29 11:08:00 INFO MemoryStore: MemoryStore started with capacity 530.3 MB
15/09/29 11:08:00 INFO HttpFileServer: HTTP File server directory is /tmp/spark-89444c7a-725a-4454-87db-8873f4134580/httpd-341c3da9-16d5-43a4-93ee-0e8b47389fdb
15/09/29 11:08:00 INFO HttpServer: Starting HTTP Server
15/09/29 11:08:00 INFO Utils: Successfully started service 'HTTP file server' on port 51405.
15/09/29 11:08:00 INFO SparkEnv: Registering OutputCommitCoordinator
15/09/29 11:08:00 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/09/29 11:08:00 INFO SparkUI: Started SparkUI at http://192.168.1.2:4040
15/09/29 11:08:00 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#716: Client environment:host.name=hubble
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#723: Client environment:os.name=Linux
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#724: Client environment:os.arch=3.19.0-25-generic
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#725: Client environment:os.version=#26-Ubuntu SMP Fri Jul 24 21:17:31 UTC 2015
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#733: Client environment:user.name=ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#741: Client environment:user.home=/home/ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#log_env#753: Client environment:user.dir=/home/ftseng
2015-09-29 11:08:00,651:32221(0x7fc09e17c700):ZOO_INFO#zookeeper_init#786: Initiating client connection, host=172.17.0.75:2181 sessionTimeout=10000 watcher=0x7fc0962b7176 sessionId=0 sessionPasswd=<null> context=0x7fc078001860 flags=0
I0929 11:08:00.651923 32328 sched.cpp:164] Version: 0.24.0
2015-09-29 11:08:00,652:32221(0x7fc06bfff700):ZOO_INFO#check_events#1703: initiated connection to server [172.17.0.75:2181]
2015-09-29 11:08:00,657:32221(0x7fc06bfff700):ZOO_INFO#check_events#1750: session establishment complete on server [172.17.0.75:2181], sessionId=0x150177fcfc40014, negotiated timeout=10000
I0929 11:08:00.658051 32322 group.cpp:331] Group process (group(1)#127.0.1.1:48692) connected to ZooKeeper
I0929 11:08:00.658083 32322 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0929 11:08:00.658100 32322 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0929 11:08:00.659600 32326 detector.cpp:156] Detected a new leader: (id='2')
I0929 11:08:00.659904 32325 group.cpp:674] Trying to get '/mesos/json.info_0000000002' in ZooKeeper
I0929 11:08:00.661052 32326 detector.cpp:481] A new leading master (UPID=master#172.17.0.73:5050) is detected
I0929 11:08:00.661201 32320 sched.cpp:262] New master detected at master#172.17.0.73:5050
I0929 11:08:00.661798 32320 sched.cpp:272] No credentials provided. Attempting to register without authentication
After a lot more experimentation, it looks like it was an issue with the IP address of the host machine (using its local network address, 192.168.xx.xx) when it should have been using its Docker host IP (172.17.xx.xx).
I managed to get things running with:
LIBPROCESS_IP=172.17.xx.xx python test_spark.py
I'm now hitting a different error, but it seems unrelated, so I think this command solves my problem.
I'm not familiar enough with Mesos/Spark yet to understand why this fixes things, so if someone wants to add an explanation, that would be very helpful.

Resources