I may have a dumb question. I am running spark on a remote EC2 and I would like to use the UI it offers. According to the official doc https://spark.apache.org/docs/latest/spark-standalone.html
I need to run the address http://localhost:8080 on my local browser. But when I do that I have my Airflow UI opening. How do I set it to Spark? Any help is appreciated.
Also according to this doc https://spark.apache.org/docs/latest/monitoring.html, I tried to run http://localhost:18080 but it did not work (I did all the settings to be able to see history server).
edit:
I have also tried the command sc.uiWebUrl in spark, which gives a private DNS 'http://ip-***-**-**-***.ap-northeast-1.compute.internal:4040' . But I am not sure how do use it.
I assumed you ssh-ed into your EC2 instance using this command:
ssh -i /path/my-key-pair.pem my-instance-user-name#my-instance-public-dns-name
To connect to the spark UI, you can add port forwarding option in ssh:
ssh -L 8080:localhost:8080 -i /path/my-key-pair.pem my-instance-user-name#my-instance-public-dns-name
and then you can just open a browser on your local machine and go to http://localhost:8080.
If you need to forward multiple ports, you can chain the -L arguments, e.g.
ssh -L 8080:localhost:8080 -L 8081:localhost:8081 -i /path/my-key-pair.pem my-instance-user-name#my-instance-public-dns-name
Note: check the Spark port number is correct, sometimes it's 4040 and sometimes 8080, depends on how you deployed spark.
Related
I am running a stand-alone Spark 3.2.1 locally, on my mac, installed via brew. This is for low-cost (free) unit testing purposes. I am starting this instance via pyspark command from terminal and able to access the instance web ui.
I am also trying to run spark-submit locally (from the same mac) to run a pyspark script on the pyspark instance described above. When specifying the --master :7077 I am getting the "connection refused" error. It does not look like the port 7077 is open on my mac.
How do I open the port 7077 on my mac such that I can access it from my mac via spark-submit, but other machines on the same network cannot?
Could someone share clear steps with explanations?
Much appreciated :)
Michael
Check your spark master process is running.
It must be like following output.
jps
$PID Master
$PID Worker
If spark process is not running,
run script $SPARK_HOME/sbin/start-master.sh in your shell first.
also $SPARK_HOME/sbin/start-worker.sh.
and then check if process listen on 7077 port with following command.
sudo lsof -nP -i:7077 | grep LISTEN
I'm running my spark application in open shift container. The application runs for almost 2-4 hours. I do get the message of sparkUI started at http://hostname:4040. But when I click on it ,I'm getting webpage not found even though the application is still running .
You should try with port forwarding: https://docs.openshift.com/enterprise/3.0/dev_guide/port_forwarding.html
oc port-forward -p mypod 4040:4040
then you can use the same URL, localhost or whatever is the hostname.
Maybe this will help.
While running your application, you can specify the port you want the web ui to run on.
For example,
spark2-shell --conf spark.ui.port=4040
I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assigned to the WebUI. I've tried port forwarding both 4040 and 8080 with no connection. I'm forwarding like so
ssh -i ~/KEY.pem -L 8080:localhost:8080 hadoop#EMR_DNS
1) How do I find out what the Spark WebUI's assigned port is?
2) How do I verify the Spark WebUI is running?
Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.
Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html
Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.
Open a ssh tunnel to the master node with port forwarding to the machine running spark ui
ssh -i path/to/aws.pem -L 4040:SPARK_UI_NODE_URL:4040 hadoop#MASTER_URL
MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster
SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:
16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
Point your browser to localhost:4040
Tried this on EMR 4.6 running Spark 2.6.1
Glad to announce that this feature is finally available on AWS. You won't need to run any special commands (or to configure a SSH tunnel) :
By clicking on the link to the spark history server ui, you'll be able to see the old applications logs, or to access the running spark job's ui :
For more details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html
I hope it helps !
Just run the following command:
ssh -i /your-path/aws.pem -N -L 20888:ip-172-31-42-70.your-region.compute.internal:20888 hadoop#ec2-xxx.compute.amazonaws.com.cn
There are 3 places you need to change:
your .pem file
your internal master node IP
your public DNS domain.
Finally, on the Yarn UI you can click your Spark Application Tracking URL, then just replace the url:
"http://your-internal-ip:20888/proxy/application_1558059200084_0002/"
->
"http://localhost:20888/proxy/application_1558059200084_0002/"
It worked for EMR 5.x
Simply use SSH tunnel
On your local machine do:
ssh -i /path/to/pem -L 3000:ec2-xxxxcompute-1.amazonaws.com:8088 hadoop#ec2-xxxxcompute-1.amazonaws.com
On your local machine browser hit:
localhost:3000
I successfully deployed a couchdb cartridge to wso2stratos and member get activated successfully. For the implementation of the dockerfile i used this git code. which include the below line that i have no idea why it is there! Can someone explain the below code?
RUN printf "[httpd]\nport = 8101\nbind_address = 0.0.0.0" > /usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
I tried pointing http://127.0.0.1:5984/_utils/spec/run.html url and its working perfectly.
I just SSH to the docker container and start the couchdb,
root#instance-00000001:/usr/local/etc/couchdb/local.d# couchdb couchdb
Apache CouchDB 1.6.1 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.32.0>] Apache CouchDB has started on http://0.0.0.0:8101/
Then I try to pointing the browser to http://0.0.0.0:8101/ and http://127.0.0.1:5984/_utils/index.html both of them not working.
Can someone tell me why i can't view my databases and create database window?
For your first question about what those lines do:
# Set port and address for couchdb to bind too.
# Remember these are addresses inside the container
# and not necessarily publicly available.
# See http://docs.couchdb.org/en/latest/config/http.html
RUN printf "[httpd]\nport = 8101\nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
# Tell docker that this port needs to be exposed.
# You still need to run -P when running container
EXPOSE 8101
# This is the command which is run automatically when container is run
CMD ["/usr/local/bin/couchdb"]
As for why you cannot access it, What does your docker run command look like, did you expose the port? i.e.
docker run -p 8101:8101 ....
Are you by any chance testing on OSX? If so try http://192.168.59.103:8101/ On OSX docker would be inside a virtual box VM as docker cannot run natively on OSX. The IP of the virtual machine can be looked up using boot2docker ip and is normally 192.168.59.103.
I have been using proftpd on Ubuntu inside a Docker container. It logs in successfully but failed to get directory listing.
Here is the screenshot of Filezilla
And screenshot of Proftpd log file
Any help?
The problem is the proftpd advertises the internal ip address 172.... so the client cannot connect to it.
You can solve this by setting (in the proftpd.conf)
MasqueradeAddress externalIP
or by running the conatiner using:
docker run --net=host .....
This option uses the host ip network so the passive mode will work fine.
make sure to expose configured passive ports (e.g. PassivePorts 60000 65534) on the running container to allow incoming connections
Looks like the ftpd is having permissions problem changing the running user of some sort.
Try setting the ftpd to run as the user you are logging in with using dockers USER userftp (https://docs.docker.com/reference/builder/#user) in your Dockerfile.
Remember that you can make it listen on a port > 1024 and use -p 21:2121 when starting the container to make it run on port 21 out to the world.
It would be helpful if you posted the Dockerfile and configuration you are using so we can test out this ourself.