connection exception when using hadoop2 (YARN) - linux

I have setup Hadoop (YARN) on ubuntu. The resource manager appears to be running. When I run the hadoop fs -ls command, I receive the following error:
14/09/22 15:52:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From ubuntu-8.abcd/xxx.xxx.xxx.xxxx to ubuntu-8.testMachine:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
I checked on the suggested URL in the error message but could not figure out how to resolve the issue. I ahve tried setting the external IP address (as opposed to localhost) in my core-site.xml file (in etc/hadoop) but that has not resolved the issue. IPv6 has been disabled on the box. I am running the process as hduser (which has read/write access to the directory). Any thoughts on fixing this? I am running this on a single node.
bashrc
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_INSTALL=/usr/local/hadoop/hadoop-2.5.1
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export HADOOP_YARN_HOME=$HADOOP_INSTALL ##added because I was not sure about the line below
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

Your issue is not related to YARN. It is limited by HDFS usage.
Here is the question with similar situation - person who asked had 9000 port listening on external IP interface but configuration was pointing to localhost. I'd advise first check if somebody at all listens on port 9000 and on what interface. Looks like you have service listening on IP interface which differs from where you look for it. Looking at your logs your client is trying ubuntu-8.testMachine:9000. To what IP it is being resolved? If it is assigned in /etc/hosts to 127.0.0.1, you could have the situation as in question I have mentioned - client tries to access 127.0.0.1 but service is waiting on external IP. OK, you could have vice versa. Here is good default port mapping table for Hadoop services.
Indeed many similar cases have the same root - wrongly configured host interfaces. People often configure their workstation hostname and assign this hostname to localhost in /etc/hosts. More, they write first short name and only after this FQDN. But this means IP is resolved into short hostname but FQDN is resolved into IP (non-symmetric).
This in turn provokes number of situations where services are started on local 127.0.0.1 interface and people have serious connectivity issues (are you surprised? :-) ).
Right approach (at least I encourage it based on expirience):
Assign at least one external interface that is visible to your cluster clients. If you have DHCP and don't want to have static IP, please bind your IP to MAC but move to 'constant' IP value.
Write local hostname into /etc/hosts to match external interface. FQDN name first and then short.
If you can, make your DNS resolver to resolve your FQDN into your IP. Don't care about short name.
Example, you have external IP interface 1.2.3.4 and FQDN (fully qualified domain name) set to myhost.com - in this case your /etc/hosts record MUST look like:
1.2.3.4 myhost.com myhost
And yes, it's better your DNS resolver knows about your name. Check both direct and reverse resolution with:
host myhost.com
host 1.2.3.4
Yes, clustering is not so easy in term of networking administration ;-). Never has been and shall never be.

Be sure you that you had started all the necesary, type start-all.sh, this command will start all the services needed for the connection to hadoop.
After that, you can type jps, with this command you can see all the services running under hadoop, and at the end, check the ports opened of these services with netstat -plnet | grep java.
Hope this solve your issue.

Related

getting hostname of remote computers on the local network not setup in /etc/hosts

I have a new learning, I was trying to get hostname using python's socket.
so from my macbook I ran the below code:
socket.gethostbyaddr("192.168.1.111")
and I get the ('rock64', [], ['192.168.1.111']) then I tried IP address of a computer that is not on the network anymore but used to be:
socket.gethostbyaddr("192.168.1.189")
and it returned: ('mint', [], ['192.168.1.189']) then I realised its coming from the /etc/hosts file.
now in that host file I also have this entry:
/etc/hosts
172.217.25.3 google.com.hk
but if I try to get host from ip of wan address i get different results than expected!
socket.gethostbyaddr("172.217.25.3")
that returns ('hkg07s24-in-f3.1e100.net', ['3.25.217.172.in-addr.arpa'], ['172.217.25.3'])
so I am not wondering where in the later case of WAN ip address I am getting the hostname and why in case of local computer IP's I am getting hostname from the configured /etc/hosts file ?
How can we get hostname of host computers on the local network without socket.gethostbyaddr having to look into /etc/hosts file or by other means ?
This is opinion based answer to the question "how to build registry of network devices on your local network?"
The best way to build registry of devices on your local network is to setup ntopng on your gateway. It uses DPI (Deep Packet Inspection) Technics to collect information about hosts.
NTOPNG has nice user interface and displays host names (when possible).
You can assign aliases for specific hosts which do not leak host names via any protocol.
For some reasons ntopng developers did not include alias into json response for request http://YOUR-SERVER:3000/lua/host_get_json.lua?ifid=2&host=IP-OF-DEVICE .
You can add it manually by adding lines require "mac_utils" and hj["alias"]=getDeviceName(hj["mac_address"]) into file /usr/share/ntopng/scripts/lua/host_get_json.lua
You can use REST API to interrogate ntopng and use provided information for building any script you need.

Alternative to glibc Library call res_ninit for getting DNS details over DHCP

Is there a good API alternative to res_ninit?
There are problems with this call because the res->_u.ext.nscount6 and res->nscount do not reflect correct numbers. Adding a IPv6 address to /etc/resolv.conf still results in the nscount increasing where you would have expected the nscount6 to increase.
An older glibc version seems to increase both nscount and nscount6 for a IPv6 address in /etc/resolv.conf.
I am currently parsing resolv.conf directly because i am unable to depend on the res_ninit call. But this is fine for Manual DNS.
When it comes to DHCP DNS, then i need an API to give me the result. There is no other way (that i can think of) to determine the DNS IP addresses over DHCP.
Tried posting in other places within the board but not of help so far. E.g.
Retrieve IPv4 and IPv6 nameservers programmatically
res_ninit and res_init only ever read name server information from /etc/resolv.conf. You can always get the same data name servers by parsing /etc/resolv.conf yourself and examining the nameserver lines. If there is no nameserver line, the default 127.0.0.1 will be used.
I don't think it is necessary to provide an API for that because the file format is so simple that is likely more difficult to use the API than to read the file instead.
Name server assignment over DHCP is implemented by rewriting /etc/resolv.conf if there is no local caching resolver running on the machine. The exact mechanism used for that is distribution-specific, e.g. Debian uses resolvconf if it is installed.
If a local caching resolver is running on the system (such as dnsmasq or Unbound), name servers over DHCP can be directly configured in that caching resolver. In this case, /etc/resolv.conf will keep pointing to the same name server, typically by listing nameserver 127.0.0.1 or no name server information at all (which is the default).

Spark: Unable to Limit WebUI to localhost interface

I've search around the mailing list and SO spark tag, but it would seem that (nearly?) everyone has the opposite problem as mine. I made a stab at looking in the source for an answer, but I figured I might as well see if anyone else has run into the same problem as I.
I'm trying to limit my Master/Worker UI to run only on localhost. As it stands, I have the following two environment variables set in my spark-env.sh:
SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_IP=127.0.0.1
and my slaves file contains one line: 127.0.0.1
The problem is that when I run start-all.sh, I can nmap my box's public interface and get the following:
PORT STATE SERVICE
22/tcp open ssh
8080/tcp open http-proxy
8081/tcp open blackice-icecap
Furthermore, I can go to my box's public IP at port 8080 in my browser and get the master node's UI. The UI even reports that the URL/REST URLs to be 127.0.0.1:
Spark Master at spark://127.0.0.1:7077
URL: spark://127.0.0.1:7077
REST URL: spark://127.0.0.1:6066 (cluster mode)
I'd rather not have spark available in any way to the outside world without an explicit SSH tunnel.
There are variables to do with setting the Web UI port, but I'm not concerned with the port, only the network interface to which the Web UI binds.
Any help would be greatly appreciated.
For spark 1.6, do the following:
open core/src/main/scala/org/apache/spark/ui/WebUI.scala
find the line 'serverInfo = Some(startJettyServer("0.0.0.0", port,
handlers, conf, name))'
change "0.0.0.0" in the line to some hostname you defined in LAN
After this you can access WebUI though LAN or a SSH-tunnel.

Restcomm cluster: nodes list is empty in sip-balancer

i'm trying to create restcomm cluster: sip-balancer + a few restcomm instances. But i can't connect restcomm node and sip-loadbalancer.
i used this tutorial - http://docs.telestax.com/sip-servlets-clustering-high-availability/ however haven't got any result.
Seems it should be 2 steps
change path-name attribute in
standalone/configuration/standalone-sip.xml
add org.mobicents.ha.javax.sip.BALANCERS to
standalone/configuration/mss-sip-stack.properties
as i understand node and loadbalancer use rmi as channel. i see(i used netstat) that server listens port 2000 and node establishes connection to it.
but when i try to use loadbalancer from sip client it returns "error 500 - no available nodes".
also i used remote debugged - nodes list is empty.
have i missed something?
p.s. i used docker restromm instance and sip-loadbalancer on the same machine.
thanks,
so i have found my issue.
According to the log file on restcomm node - it can't connect to balancer by RMI.
Connection error is very strange - Connection refused to host: 127.0.0.1 and sometimes Connection refused to host: 127.0.1.1
yesterday i tired to specify java.rmi.server.hostname but it did not help me
today i created small RMI client to balancer and it worked from my local machine(balancer is hosted on it too). however this app did work from virtual machine. so i added more logs to code and found:
app can lookup remote been
remote endpoint of this been is 127.0.0.1, but should be ip address of remote machine
After that i specified externalHost and public-ip for my sip-balancer and got bean endpoint address with 127.0.1.1
so issue was found - ubuntu uses this "local" ip address for your "machine name".
you can find it in /etc/hosts.
sip-balancer(java application) gets it as ip address of endpoint for services
My fix is - change 127.0.1.1 to 127.0.0.1 in /etc/hosts. after that sip-balancer provides real ip address of your machine for remote objects.
Conclusion: my issue - wrong operation system :)
Common solution: developer should check address type and don't use loopback addresses.

UnknownHostException on tasktracker in Hadoop cluster

I have set up a pseudo-distributed Hadoop cluster (with jobtracker, a tasktracker, and namenode all on the same box) per tutorial instructions and it's working fine. I am now trying to add in a second node to this cluster as another tasktracker.
When I examine the logs on Node 2, all the logs look fine except for the tasktracker. I'm getting an infinite loop of the error message listed below. It seems that the Task Tracker is trying to use the hostname SSP-SANDBOX-1.mysite.com rather than the ip address. This hostname is not in /etc/hosts so I'm guessing this is where the problem is coming from. I do not have root access in order to add this to /etc/hosts.
Is there any property or configuration I can change so that it will stop trying to connect using the hostname?
Thanks very much,
2011-01-18 17:43:22,896 ERROR org.apache.hadoop.mapred.TaskTracker:
Caught exception: java.net.UnknownHostException: unknown host: SSP-SANDBOX-1.mysite.com
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1033)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
This blog posting might be helpful:
http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html
The short answer is that Hadoop performs reverse hostname lookups even if you specify IP addresses in your configuration files. In your environment, in order for you to make Hadoop work, SSP-SANDBOX-1.mysite.com must resolve to the IP address of that machine, and the reverse lookup for that IP address must resolve to SSP-SANDBOX-1.mysite.com.
So you'll need to talk to whoever is administering those machines to either fudge the hosts file or to provide a DNS server that will do the right thing.

Resources