Nagios check_ntp_peer not working - linux

I am running a virtualized (vmware) debian (2.6.26-2-686) which I monitor through Nagios. Lastly, I am getting the following Critical error (reported by the _check_ntp_peer_ script):
NTP CRITICAL: Server not synchronized, Offset unknown
It calls my attent
ion that none of the lines outputted by the _ntpq –no_ command has a star (*)
remote refid st t when poll reach delay offset jitter
==============================================================================
200.144.121.33 193.204.114.232 2 u 1 64 1 187.298 -34742. 32.024
146.164.53.65 200.20.186.75 2 u 2 64 1 185.574 -34716. 0.001
200.160.0.8 200.160.7.186 2 u 1 64 1 186.229 -34734. 0.001
187.49.33.13 .INIT. 16 u - 64 0 0.000 0.000 0.001
Any clue?
Here is the ntp.conf
tinker panic 0
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 0.debian.pool.ntp.org iburst dynamic
server 1.debian.pool.ntp.org iburst dynamic
server 2.debian.pool.ntp.org iburst dynamic
server 3.debian.pool.ntp.org iburst dynamic
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict ::1
So, any idea of what the problem could be?
Thanks in advance,
Wilmer

I had similar problems on ubuntu and ntp. Time was drifting off dramatically and nagios reported NTP CRITICAL: Offset unknown.
Check for status of your vmware timesync
#vmware-toolbox-cmd timesync status
Disable
Enable it if you notice it is disabled.
#vmware-toolbox-cmd timesync enable
Enabled
Helped in my case. May be helpful in yours too. I think it is not in accordance with vmware best practices but it works.

Related

WHM Server receiving lots of "FAILED: cphulk"

I have a WHM server on GoDaddy.
I'm receiving quite a lot (3-4 a day) mails about a process failing and recovering itself. Happens mostly to "cphulkd" but also to "lfd".
My server:
WHM version v68.0.33. Contains two websites (One Moodle and one Wordpress). 2GB Ram, 60GB HD.
This is the whole mail:
Server s50-62-22-123.secureserver.net Primary IP Address
50.62.22.123 Service Name cphulkd Service Status failed ⛔ Notification The service “cphulkd” appears to be down. Service Check
Method The system’s command to check or to restart this service
failed. Number of Restart Attempts 1 Service Check Raw Output (XID
ejd2e7) The “cphulkd” service is down.
The subprocess “/usr/local/cpanel/scripts/restartsrv_cphulkd” reported
error number 255 when it ended. Startup Log Starting cPHulkd...
Started. Starting PID 3789: cPhulkd - processor - dormant mode -
accepting connections Memory Information Used 2.43 GB Available
1.57 GB Installed 4 GB Load Information 0.17 0.19 0.18 Uptime 2 days, 18 hours, 59 minutes, and 37 seconds IOStat Information
avg-cpu: %user %nice %system %iowait %steal %idle
0.62 0.11 0.12 0.17 0.00 98.99 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn Top Processes
PID Owner CPU % Memory % Command 18850 root 2.45 2.29 spamd
child 3452 root 0.94 2.35
/usr/local/cpanel/3rdparty/perl/524/bin/perl -T -w
/usr/local/cpanel/3rdparty/bin/spamd --max-spare=1 --max-children=3
--allowed-ips=127.0.0.1,::1 --pidfile=/var/run/spamd.pid --listen=5 1488 mysql 0.52 7.49 /usr/sbin/mysqld --basedir=/usr
--datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=s50-62-22-179.secureserver.net.err --open-files-limit=10000 --pid-file=/var/lib/mysql/s50-62-22-179.secureserver.net.pid 18854 dovecot 0.31 0.06 dovecot/auth 20291 root 0.07 0.71 lfd -
sleeping
Any ideas?
What's weird is that the mail says I have 4GB but I only have 2GB..

Debian Linux Raspbian- Raspberry Pi time offset is 65s ahead of UTC

For some strange reason unknown to me, my RPi appears to have been set incorrectly to UTC +65s. The output I receive is the following:
sudo ntpd -gq
ntpd: time set -65.706156s
I have tried stopping and restarting ntp server (no effect).
When I check the sync servers using the following command, I do receive a ping back so it's not a case of the servers not responding, or a firewall issue:
grep -P "^server" /etc/ntp.conf
server 0.debian.pool.ntp.org iburst
server 1.debian.pool.ntp.org iburst
server 2.debian.pool.ntp.org iburst
server 3.debian.pool.ntp.org iburst
ping -c 1 0.debian.pool.ntp.org
PING 0.debian.pool.ntp.org (193.1.219.116) 56(84) bytes of data.
64 bytes from tbag.heanet.ie (193.1.219.116): icmp_req=1 ttl=51 time=18.8 ms
--- 0.debian.pool.ntp.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 18.818/18.818/18.818/0.000 ms
I'm at a loss as to how to correct this.
UPDATE:
Running the ntpq -p command yields the following info:
remote refid st t when poll reach delay offset jitter
==============================================================================
*adsl-172-10-0-1 117.70.*.110 4 u 2 64 7 0.617 -0.070 0.109
Is this the ntp server that I'm trying to sync to - because that IP belongs to CHINANET (I don't know how or why).
I also tried to manually set the RPi time, after stopping ntp service, setting the time correctly and restarting the service.
What I noticed was that the time was correctly set for a good 5 seconds, before reverting back to it's 65s offset. So it appears that this is the issue.
Found the solution as described in post 6 of the link:
http://forum.openmediavault.org/index.php/Thread/13035-Raspberry-Pi-NTP-service-not-using-etc-ntp-conf/
Basically, connecting the RPi to the network, the DHCP server acts as the NTP server and creates a copy of the ntp.conf file in the location /var/lib/ntp/ntp.conf.dhcp
This file overrides the default /etc/ntp.conf file, so deleting it and then stopping the ntp service, performing a resync, and then starting the service is the only way to resolve this.
The command for resync is:
sudo ntpdate -b pool.ntp.org
The original issue was that the ntp server was syncing with a CHINANET server and causing a 65s offset, which I suspect is down to a misconfigured DCHP/NTP server on our network.

AWS - EC2 - MongoDB replica set time sync issue - NTP - replication lag

We are encountering clock drift issues with our MongoDB replica set running on AWS. This just seemed to start happening recently after we added additional data to the set, before then we did not really notice this issue unless the system was under heavy load. The following error is logged in the mongod.log file sporadically and the system is not under load.
To test this we have isolated a set of machines with the same dataset and not in use by our web application though the error is still occurring;
2014-12-12T13:33:51.333+0000 [rsBackgroundSync] changing sync target
because current sync target's most recent OpTime is Dec 12 13:32:42:c
which is more than 30 seconds behind member mongo1:27017 whose most
recent OpTime is 1418391230
From the above the time stamp shows that one of the mongodb replica set members is over a minute behind. The worst we have seen is 12 minutes out of sync.
This error in turn causes replication lag and we receive the notification about this from the Mongo Monitoring Service although it does correct itself.
The setup is 3 x r3.xlarge AWS Linux instances, 1 in each availability zone of the EU-West-1A region. The machines have been setup using the Mongo recommended settings with a Raid array and the cloud formation scripts provided by Mongo. The data is around 4GB in size.
We think the issue is related to the NTP sync, by default on the AWS Linux Amazon Machine Image the ntpd service is configured to go to a pool of aws ntp servers hosted on www.pool.ntp.org.
To try and rule this out we setup our own NTP server on AWS that the MongoDB servers could sync to. The issue still occurred so we changed the maxpoll and minpoll time for the ntpd service on the mongo machines to sync the time every 16 seconds from the NTP server but the error is still occurring.
We increased the MongoDB OpLog size as well to see if that would make any difference but it didn’t.
Does anyone else encounter this type of issue? Is there something we are missing?
Cheers,
Colin.
ps -ef |grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5#1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
After upgrading to MongoDB 3 using the WiredTiger storage engine we do not see this issue any more.

Can't start HAProxy on Cygwin

I'm trying to start up HAProxy on Cygwin. When I do so, I get the following response:
$ /usr/local/sbin/haproxy -f /usr/local/sbin/haproxy.cfg
[ALERT] 313/180006 (4008) : cannot change UNIX socket ownership
(/tmp/haproxy.socket). Aborting.
[ALERT] 313/180006 (4008) : [/usr/local/sbin/haproxy.main()]
Some protocols failed to start
their listeners! Exiting.
It looks like it's due to the following line in my config file, when I rip this it starts up:
stats socket /tmp/haproxy.socket uid haproxy mode 770 level admin
The entire config:
global
log 127.0.0.1 local0 info
stats socket /tmp/haproxy.socket uid haproxy mode 770 level admin
maxconn 1000
daemon
defaults
log global
mode tcp
option tcplog
option dontlognull
retries 3
option redispatch
maxconn 1000
timeout connect 5s
timeout client 120s
timeout server 120s
listen rabbitmq_local_cluster 127.0.0.1:5555
mode tcp
balance roundrobin
server rabbit_0 127.0.0.1:5673 check inter 5000 rise 2 fall 3
server rabbit_1 127.0.0.1:5674 check inter 5000 rise 2 fall 3
listen private_monitoring 127.0.0.1:8100
mode http
option httplog
stats enable
stats uri /stats
stats refresh 5s
Any ideas would be appreciated, Thanks!
Simple answer, as I expected. My user "haproxy" which is referenced in the problematic line:
stats socket /tmp/haproxy.socket uid haproxy mode 770 level admin
Did not have necessary permissions on the local machine. Once this was set up, it started up fine.
Nice to know that it still works on cygwin, what version of haproxy is this ? I did not know that UNIX sockets were supported on windows BTW. Or maybe they're emulated via named pipes ?

Ntp on secondary/redundanct system does not sync time from primary

I have two systems One acts as Primary/Active and has Internet connection and gets time from NTP server. The second system is Secondary/Passive and has no connection to the external world.
Primary and Secondary are connected on a private network interface eth1.
Primary has IP 169.254.10.10 Subnet: 255.255.255.248 Broadcast: 169.254.10.15
Secondary has IP 169.254.10.11 Subnet: 255.255.255.248 Broadcast: 169.254.10.15
Primary has the following ntp.conf Configuration
driftfile /var/lib/ntp/ntp.drift
statistics loopstats peerstats clockstats
filegen loopstats file loopstats type day enable
filegen peerstats file peerstats type day enable
filegen clockstats file clockstats type day enable
server 91.189.94.4
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict ::1
restrict 169.254.10.0 mask 255.255.255.248
broadcast 169.254.10.15
disable auth
broadcastclient
I sync time on Secondary with only ntpdate and do not run ntpd daemon on Secondary.
on Secondary i run ntpdate -b -t 4 -p 4 -u 169.254.10.10 (Primary Interface IP)
And Ntpd server is running on Primary with the above said Configuration.
The time on Secondary is not updated and throws error
ntpdate[3636]: no server suitable for synchronization found
Thanks
Visu

Resources