How to analyze pbs_server, pbs_mom log entry of Server and Worker Node - pbs

How can I analyze these log entries, I want to understand what is the format of these log entries?
1) PBS_Server;LOG_ERROR::Cannot assign requested address (99) in
send_job, send_job failed to d23818f7 port 15002
2) pbs_mom;LOG_ALERT::mom_server_valid_message_source, bad connect
from 210.56.24.244:1023 - unauthorized server

The name in /var/spool/torque/server_name on the compute hosts and on the server should match, and should resolve to the same address. If it doesn't, check /etc/hosts, nsswitch.conf, and DNS to get that worked out.
EDIT #1: also, I'd be sure to put the output of the hostname command on the server into the server_name file on all machines.
EDIT #2: also be aware that $pbsserver in /var/spool/torque/mom_priv/config on the compute nodes will override the value in the server_name file. (So it's best not to use $pbs_server.)

Related

getting hostname of remote computers on the local network not setup in /etc/hosts

I have a new learning, I was trying to get hostname using python's socket.
so from my macbook I ran the below code:
socket.gethostbyaddr("192.168.1.111")
and I get the ('rock64', [], ['192.168.1.111']) then I tried IP address of a computer that is not on the network anymore but used to be:
socket.gethostbyaddr("192.168.1.189")
and it returned: ('mint', [], ['192.168.1.189']) then I realised its coming from the /etc/hosts file.
now in that host file I also have this entry:
/etc/hosts
172.217.25.3 google.com.hk
but if I try to get host from ip of wan address i get different results than expected!
socket.gethostbyaddr("172.217.25.3")
that returns ('hkg07s24-in-f3.1e100.net', ['3.25.217.172.in-addr.arpa'], ['172.217.25.3'])
so I am not wondering where in the later case of WAN ip address I am getting the hostname and why in case of local computer IP's I am getting hostname from the configured /etc/hosts file ?
How can we get hostname of host computers on the local network without socket.gethostbyaddr having to look into /etc/hosts file or by other means ?
This is opinion based answer to the question "how to build registry of network devices on your local network?"
The best way to build registry of devices on your local network is to setup ntopng on your gateway. It uses DPI (Deep Packet Inspection) Technics to collect information about hosts.
NTOPNG has nice user interface and displays host names (when possible).
You can assign aliases for specific hosts which do not leak host names via any protocol.
For some reasons ntopng developers did not include alias into json response for request http://YOUR-SERVER:3000/lua/host_get_json.lua?ifid=2&host=IP-OF-DEVICE .
You can add it manually by adding lines require "mac_utils" and hj["alias"]=getDeviceName(hj["mac_address"]) into file /usr/share/ntopng/scripts/lua/host_get_json.lua
You can use REST API to interrogate ntopng and use provided information for building any script you need.

Getting error while connecting helix.perforce - "Connect to server failed; check $P4PORT No such host is known."

Trying to connect helix perforce cloud through P4 Client, getting below error.
C:\Users\sagaraa>p4 -Cauto -p
ssl:TestContinuousDelivery.sagaraagre.helix.perforce.com:1667 trust
Perforce client error:
Connect to server failed; check $P4PORT.
No such host is known.
Please note that I am working behind company proxy, so not sure that could be the issue. Outside company premise it is working perfectly fine.
Please advice if any one is facing similar issue, or have any resolution.
When running command p4 set gives below output:
P4EDITOR=C:\Windows\SysWOW64\notepad.exe (set)
P4PORT=perforce:1666 (set)
P4USER=sagaraagre (set)
For your P4PORT value, perforce is the server's hostname and 1666 is the port number.
If you ping that server I bet it won't find the host either:
C:\Windows\System32>ping perforce
Ping request could not find host perforce. Please check the name and try again.
If that's the case for you, you can either use the actual hostname or IP address for your Perforce server in P4PORT:
C:\Windows\System32>p4 set P4PORT=TestContinuousDelivery.sagaraagre.helix.perforce.com:1666
Or, if you're set on using perforce there, you can add it as an alias in your hosts file:
C:\Windows\System32>echo 1.2.3.4 TestContinuousDelivery.sagaraagre.helix.perforce.com perforce >> drivers\etc\hosts
1.2.3.4 would be the IP address corresponding to TestContinuousDelivery.sagaraagre.helix.perforce.com, and you can always ping that hostname to find the IP.
As an aside, there does seem to be an ongoing issue where p4 set's P4PORT doesn't keep in sync with .p4qt\connectionmap.xml's P4Port value. Resetting P4PORT, like what was done above, is one way to resolve that.

Modify pg_hba.conf file to allow me access

I keep getting an error when trying to connect to psql database, "connection closed by remote host". I have tried modifying the pg_hba.conf file to allow the IP of my computer to have access, but I still get the same error, what am I doing wrong? Do I have to restart the server or something?
host all all <ip>/32 md5
As well, I have seen /24 instead of /32, how do I know which number to use?
The notation "/32" refers to a single IP address whereas the notation "192.168.1.0/24" refers to all addresses on the 192.168.1.x network.
And yes, you will probably have to do an SQL restart, something like:
service postmaster restart
But make sure your IP address is restrictive so that hackers won't be visiting your database all day. Use "localhost" if you can (127.0.0.1).

Centos takes very long time to resolve local network nearby servers

i have few Centos 5.1 servers, recently they took very long time to communicate eachother, it looks like for every request it checks local server in public dns, is there is anyway to give option in /etc/resolve.conf to disable dns for some IP address ?
Add the server names and their IP addresses to the file /etc/hosts, e.g.
10.0.0.100 server1 server1-alias
10.0.0.101 server2
and then make sure that you list the keyword files before the keyword dns for the hosts entry in /etc/nsswitch.conf, i.e. that file should have a line that looks something like this:
hosts: files dns
After that, any attempts to resolve hostnames or IP addresses will first consult the /etc/hosts file, and only if that is unsuccessful go on to do a DNS lookup.

UnknownHostException on tasktracker in Hadoop cluster

I have set up a pseudo-distributed Hadoop cluster (with jobtracker, a tasktracker, and namenode all on the same box) per tutorial instructions and it's working fine. I am now trying to add in a second node to this cluster as another tasktracker.
When I examine the logs on Node 2, all the logs look fine except for the tasktracker. I'm getting an infinite loop of the error message listed below. It seems that the Task Tracker is trying to use the hostname SSP-SANDBOX-1.mysite.com rather than the ip address. This hostname is not in /etc/hosts so I'm guessing this is where the problem is coming from. I do not have root access in order to add this to /etc/hosts.
Is there any property or configuration I can change so that it will stop trying to connect using the hostname?
Thanks very much,
2011-01-18 17:43:22,896 ERROR org.apache.hadoop.mapred.TaskTracker:
Caught exception: java.net.UnknownHostException: unknown host: SSP-SANDBOX-1.mysite.com
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1033)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
This blog posting might be helpful:
http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html
The short answer is that Hadoop performs reverse hostname lookups even if you specify IP addresses in your configuration files. In your environment, in order for you to make Hadoop work, SSP-SANDBOX-1.mysite.com must resolve to the IP address of that machine, and the reverse lookup for that IP address must resolve to SSP-SANDBOX-1.mysite.com.
So you'll need to talk to whoever is administering those machines to either fudge the hosts file or to provide a DNS server that will do the right thing.

Resources