No start of Hadoop components on the slave machine - linux

I am trying to set up a multi-node hadoop cluster using my two laptops using Michael Noll tutorial. The OS on both machines is Ubuntu 14.04.
I managed to set up single-node clusters on each of the two laptops, but when I try to start (after all the necessary modifications as instructed in the tutorial) the multi-node cluster using sbin/start-all.sh on my master the slave does not react at all. All the five components on the master start, but no single one starts on the slave.
My /etc/hosts looks on both PCs like this
127.0.0.1 localhost
192.168.178.01 master
192.168.178.02 slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
(Furthermore, in /usr/local/hadoop/etc/hadoop there was no file called master, so I created it using: touch /usr/local/hadoop/etc/hadoop/master)
Then, when I ru sbin/start-all.sh, I see the following:
hduser#master:/usr/local/hadoop$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/05/17 21:21:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-master.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-master.out
15/05/17 21:21:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-master.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-master.out
hduser#master:/usr/local/hadoop$ jps
3716 DataNode
3915 SecondaryNameNode
4522 Jps
3553 NameNode
4210 NodeManager
4073 ResourceManager
hduser#master:/usr/local/hadoop$
Interesting is here that on the line 6 there is localhost. Shouldn't it be master?
I can connect to the slave using ssh slavepassword-lessly from the master and control the slave machine, but still, sbin/start-all.sh does not start any hadoop components on the slave.
Very intrestingly, if I run sbin/start-all.sh on the slave, it starts NameNode on the master (!!!) and starts NodeManager and ResourceManager on the slave itself.
Can someone help me to properly start the multi-node cluster?
P.S: I looked at this, but in my case the location of hadoop home on both machines are identical

There can be several things:
Check that you can connect with ssh password-less from slave to master. Here is a link that teach us how to do it.
The hostname of each machine is correct?
/etc/hosts file is identical on both, master and slave, alike?
Have you ckecked with ifconfig -a the ip of both machines? Are them the ones that you expected?
Have you changed all the configurations file in slave machine, so instead of localhost, now must say the master's hostname? You should seek for the words localhost and stuff like that, in all your files on your $HADOOP_HOME directory, because there are several files for configurating all sort of things and it's very easy to forget some. Something like this: sudo grep -Ril "localhost" /usr/local/hadoop/etc/hadoop
Check the same as before but in master, so instead of saying localhost, it says the hostname of it.
You should remove the localhost entry, on the /etc/hosts file on slave machine. Sometimes that entry, so typical of the hadoop tutorials, could lead to some problems
In masters and in slaves # slave host, it should say only "slave", and in your master host, in masters file it should say "master" and in your slave file, it should say slave.
You should format your filesystem on both nodes previous to start hadoop.
Those are all the problems that i remember to have when i do the as you are doing right now. Check if some of them help you!

Related

Start Spark master on the IP instead of Hostname

I'm trying to set up a remote Spark 2.4.5 cluster on Ubuntu 18. After I start ./sbin/stat-master.sh WebUI is available at <INSTANCE-IP>:8080 but it shows "Spark Master at spark://spark-master:7077" where spark-master is my hostname on the remote machine.
I'm able to start a worker with ./sbin/start-slave.sh spark://spark-master:7077 only, but <INSTANCE-IP>:4040 doesn't work. When I try ./sbin/start-slave.sh spark://<INSTANCE-IP>:7077 I can see the process but the worker is not visible in WebUI.
As a result, I can not connect to the cluster from my local machine with spark-shell --master spark://<INSTANCE-IP>:7077. The error is:
StandaloneAppClient$ClientEndpoint: Failed to connect to master <INSTANCE-IP>:7077

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. edit: the environment was macOS
I can execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile the job is running it hangs YARN-UI for exactly 10 minutes. YARN still working and if I connect via ssh I can execute yarn commands.
If I don't access SparkUI (directly or through YARN) nothing happens. Jobs are executed and YARN-UI is not hanged.
More info:
Local, on Docker: Spark 2.1.2, Hadoop 2.6.0-cdh5.4.3
Production: Spark 2.1.0, Hadoop 2.6.0-cdh5.4.3
If I execute it locally (--master local[*]) it works and I can connect to SparkUI though 4040.
Spark config:
spark.driver.bindAddress 172.17.0.2 #docker_eth0_ip
spark.driver.host 192.168.XXX.XXX #local_ip
spark.driver.port 5001
spark.ui.port 4040
spark.blockManager.port 5003
Yes, ApplicationMaster and nodes have visibility over my local SparkUI or driver (telnet test)
As I said I can execute jobs then docker expose ports and its binding is working. Some logs proving it:
INFO ApplicationMaster: Driver now available: 192.168.XXX.XXX:5001
INFO TransportClientFactory: Successfully created connection to /192.168.XXX.XXX:5001 after 65 ms (0 ms spent in bootstraps)
INFO ApplicationMaster$AMEndpoint: Add WebUI Filter. AddWebUIFilter(org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,Map(PROXY_HOSTS -> jobtracker.hadoop, PROXY_URI_BASES -> http://jobtracker.hadoop:8088/proxy/application_000_000),/proxy/application_000_000)
Some ideas or where I can look to see what's happening?
The problem was related with how docker manage IP incoming requests when it's executed on MacOS.
When YARN, which's running inside docker container, receives a request doesn't see original IP it sees the internal proxy docker IP (in my case 172.17.0.1).
When a request is send to my local container SparkUI, automatically redirects the request to hadoop master (is how YARN works) because it see that the request is not coming from hadoop master and it only accepts requests from this source.
When master receives the forwarded request it tries to send it to spark driver (my local docker container) which forward again the request to hadoop master because it see that the IP source is not the master, is the proxy IP.
It takes all threads reserved for UI. Until threads are not released YARN UI is hanged
I "solved" changing docker yarn configuration
<property>
<name>yarn.web-proxy.address</name>
<value>172.17.0.1</value>
</property>
This allows sparkUI to handle any request made to docker container.

Fail to connect remotely to Spark Master node inside a docker container

I created a spark cluster based in this link.
Everything went smooth but the problem is after the cluster created im trying to use pyspark to connect remotely to the container inside the host from other machine.
I'm receiving a 18/04/04 17:14:48 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master xxxx.xxxx:7077 even though i can connect through telnet to the 7077 port from that host!
What may i be missing out?

error initialising SparkContext, running Spark on bash on ubuntu on windows

I am attempting to install and configure spark 2.0.1 on bash on ubuntu on Windows. I followed the instructions at Apache Spark - Installation and everything seemed to get installed OK however when I run spark-shell this happens:
16/11/06 11:25:47 ERROR SparkContext: Error initializing SparkContext.
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.listen(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:224)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
Immediately prior to that error I see a warning which may or may not be related:
16/11/06 11:25:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/06 11:25:47 WARN Utils: Your hostname, DESKTOP-IKGIG97 resolves to a loopback address: 127.0.0.1; using 151.127.0.0 instead (on interface wifi0)
16/11/06 11:25:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
I'm a bit of a noob with linux I must admit so am a bit clueless as to what to do next. In case it matters here is the contents of /etc/hosts
127.0.0.1 localhost
127.0.0.1 DESKTOP-IKGIG97
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Hoping someone here can identify my issue. What do I need to do to investigate and fix this error?
As the error suggests, add the SPARK_LOCAL_IP in the conf/spark-env.sh script in the directory where Spark is installed
http://spark.apache.org/docs/2.0.1/configuration.html
Environment Variables
Certain Spark settings can be configured through environment variables, which are read from the conf/spark-env.sh script in the directory where Spark is installed (or conf/spark-env.cmd on Windows). In Standalone and Mesos modes, this file can give machine specific information such as hostnames. It is also sourced when running local Spark applications or submission scripts.
Note that conf/spark-env.sh does not exist by default when Spark is installed. However, you can copy conf/spark-env.sh.template to create it. Make sure you make the copy executable.
The following variables can be set in spark-env.sh:
SPARK_LOCAL_IP IP address of the machine to bind to.
If it doesn't solve your problem - please share the output executing the following unix command:
ifconfig

Datastax Spark worker is always looking for master at 127.0.0.1

I am trying to bring up datastax cassandra in analytics mode by using "dse cassandra -k -s". I am using DSE 5.0 sandbox on a single node setup.
I have configured the spark-env.sh with SPARK_MASTER_IP as well as SPARK_LOCAL_IP to point to my LAN IP.
export SPARK_LOCAL_IP="172.40.9.79"
export SPARK_MASTER_HOST="172.40.9.79"
export SPARK_WORKER_HOST="172.40.9.79"
export SPARK_MASTER_IP="172.40.9.79"
All above variables are setup in spark-env.sh.
Despite these, the worker will not come up. It is always looking for a master at 127.0.0.1.This is the error i am seeing in /var/log/cassandra/system.log
WARN [worker-register-master-threadpool-8] 2016-10-04 08:02:45,832 SPARK-WORKER Logging.scala:91 - Failed to connect to master 127.0.0.1:7077
java.io.IOException: Failed to connect to /127.0.0.1:7077
Result from dse client-tool shows 127.0.0.1
$ dse client-tool -u cassandra -p cassandra spark master-address
spark://127.0.0.1:7077
However i am able to access the spark web UI from the LAN IP 172.40.9.79
Spark Web UI screenshot
Any help is greatly appreciated
Try add in file spark-defaults.conf this parameter: spark.master local[*] or spark.master 172.40.9.79. Maybe this solves your problem

Resources