Mounting ceph fails with "mount error 5 = Input/output error" - linux

I have tried to create a ceph filesystem in a single host, for testing purposes, with the following conf file
[global]
log file = /var/log/ceph/$name.log
pid file = /var/run/ceph/$name.pid
[mon]
mon data = /srv/ceph/mon/$name
[mon.mio]
host = penny
mon addr = 127.0.0.1:6789
[mds]
[mds.mio]
host = penny
[osd]
osd data = /srv/ceph/osd/$name
osd journal = /srv/ceph/osd/$name/journal
osd journal size = 1000 ; journal size, in megabytes
[osd.0]
host = penny
devs = /dev/loop1
/dev/loop1 is formatted with XFS and is actually a file with 500Mbs (although that shouldn't matter much) Everything works pretty much OK, and health shows:
sudo ceph -s
2013-12-12 21:14:44.387240 pg v111: 198 pgs: 198 active+clean; 8730 bytes data, 79237 MB used, 20133 MB / 102 GB avail
2013-12-12 21:14:44.388542 mds e6: 1/1/1 up {0=mio=up:active}
2013-12-12 21:14:44.388605 osd e3: 1 osds: 1 up, 1 in
2013-12-12 21:14:44.388738 log 2013-12-12 21:14:32.739326 osd.0 127.0.0.1:6801/8834 181 : [INF] 2.30 scrub ok
2013-12-12 21:14:44.388922 mon e1: 1 mons at {mio=127.0.0.1:6789/0}
but when I try to mount the filesystem
sudo mount -t ceph penny:/ /mnt/ceph
mount error 5 = Input/output error
Usual answers point to ceph-mds not running, but it's actually working:
root 8771 0.0 0.0 574092 4376 ? Ssl 20:43 0:00 /usr/bin/ceph-mds -i mio -c /etc/ceph/ceph.conf
In fact, I managed to make it work previously using these instructions http://blog.bob.sh/2012/02/basic-ceph-storage-kvm-virtualisation.html verbatim previously, but after I tried again I obtained the same problem. Any idea of what might have failed?
Update as indicated by the comment, dmesg shows a problem
[ 6715.712211] libceph: mon0 [::1]:6789 connection failed
[ 6725.728230] libceph: mon1 127.0.1.1:6789 connection failed

Try to use 127.0.0.1. It looks like the kernel is resolving the hostname, but 127.0.1.1 is weird, and maybe it isn't responding to IPv6 loopback.

Related

GlusterFS on Freebsd 11.1 / Mount issue

I want to use GlusterFS as a distributed Filestorage on FreeBSD 11.1
Documentation is poor, so I followed some howtos on the net.
I could create the glusterfs volume, but I have trouble to mount it on an other clients machine. Here is what I did so far:
I have three hosts, all in the same subnet.
10.0.0.21 Webserver
10.0.0.31 gluster1
10.0.0.32 gluster2
I added the above entries in the /etc/hosts files on all of the three hosts.
I modified /etc/rc.conf on gluster1 and gluster2 with:
glusterd_enable="YES"
on gluster1 I did:
gluster peer probe gluster2
(succeeded)
each gluster1 and gluster2 has the following harddrives: /dev/da1
they are partitioned (BSD Label) and mounted on gluster1 and gluster2 as /datastore
"cat /etc/fstab" gives on both gluster1 and gluster2:
# Device Mountpoint FStype Options Dump Pass#
/dev/da0a / ufs rw 1 1
/dev/da1a /datastore ufs rw 2 2
I created the gluster volume1:
gluster volume create volume1 replica 2 transport tcp gluster1:/datastore gluster2:/datastore force
(I'm aware of the split brain risk, this is a simple test szenario)
I started the volume1 with:
gluster volume start volume1
A check of the volume1 with:
gluster volume info
gives me back:
Type: Replicate
Volume ID: a760c545-1cc9-47a4-bc9e-51f6180e4d7a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/datastore
Brick2: gluster2:/datastore
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
So far everything worked, and seems to be fine.
Now my trouble starts to mount and use this on the client / consumer machine (Webserver)
I read at several places that the glusterfs volume1 should be mountable with:
mount -t glusterfs gluster1:/volume1 /mnt
This gives me simple back the following error:
mount: gluster1:/volume1: Operation not supported by device
As I normally do before I ask "silly" questions, I googled a lot for this.
Played around with also installing glusterfs on the client (pkg install glusterfs), enabling it in the clients /etc/rc.conf, adding stuff for FUSE, but I could not bring it up to work.
I feel quite annoyed, because I know it must be a very small thing I'm missing here!?
Can anyone shed some light into my issue?
luster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/datastore N/A N/A N N/A
Brick gluster2:/datastore N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A N 55181
Self-heal Daemon on gluster2 N/A N/A N 30318
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
So, I enabled NFS with this:
gluster volume set volume1 nfs.disable off
There was a warning of no longer using GlusterFS NFS, but instead to use NFS-Ganesha. The warning I ignored for this test.
now I restarted the volume:
gluster volume stop volume1
gluster volume start volume1
To check I did:
gluster volume info
which showed me now:
Volume Name: volume1
Type: Replicate
Volume ID: a760c545-1cc9-47a4-bc9e-51f6180e4d7a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/datastore
Brick2: gluster2:/datastore
Options Reconfigured:
nfs.disable: off
transport.address-family: inet
So the nfs.disable was set to off. NFS should be on now right?
But
gluster volume status volume1
still shows no NFS running:
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/datastore N/A N/A N N/A
Brick gluster2:/datastore N/A N/A N N/A
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A N 99115
NFS Server on gluster2 N/A N/A N N/A
Self-heal Daemon on gluster2 N/A N/A N 37075
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
Disturbing here is also (beside NFS Online is N), that both bricks seems to be not online too (Online indicated as N)?!??
So I'm really stuck and could use some help.
Finally it is working:
/usr/local/sbin/mount_glusterfs gluster1:/volume1 /mnt
did the trick...
the client also need to have the net/glusterfs package installed, and the following statement in the /boot/loader.conf:
fuse_load="YES"
Cheers
I think issue may be with ufs file system. Does it support extended attributes extensively ?
GlusterFS required FS with extended attribute support. (XFS is one).
From the link: (https://access.redhat.com/articles/1273933)
As the Red Hat Storage makes extensive use of extended attributes, an XFS inode size of 512 bytes works better with Red Hat Storage than the default XFS inode size of 256 bytes. So, inode size for XFS must be set to 512 bytes, while formatting the Red Hat Storage bricks. To set the inode size, you need to use -i size option with the mkfs.xfs command.

Docker Devmapper space issue - increase size

I have the same issue as in space issue on docker devmapper and CentOS7
It only specifies to clean up but not how I can increase the space and I dont have any images to clean. I tried several things with dm.min_free_space but nothing worked and want to increase the space.
OS Version/build: Red Hat Enterprise Linux Server release 7.3 (Maipo)
App version:
Client:
Version: 1.12.6
API version: 1.24
Package version: docker-common-1.12.6-11.el7.centos.x86_64
Go version: go1.7.4
Git commit: 96d83a5/1.12.6
Built: Tue Mar 7 09:23:34 2017
OS/Arch: linux/amd64
Server:
Version: 1.12.6
API version: 1.24
Package version: docker-common-1.12.6-11.el7.centos.x86_64
Go version: go1.7.4
Git commit: 96d83a5/1.12.6
Built: Tue Mar 7 09:23:34 2017
OS/Arch: linux/amd64
Steps to reproduce
I have no containers running currently and have some docker images pertaining to Kubernetes which will be used by the Kubernetes service.
sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[kubeuser4#kubenode4 Employee]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/busybox latest 00f017a8c2a6 5 days ago 1.11 MB
registry.access.redhat.com/rhel7/pod-infrastructure latest 34d3450d733b 6 weeks ago 205 MB
docker.io/java 8 d23bdf5b1b1b 8 weeks ago 643.1 MB
gcr.io/google_containers/heapster_grafana v2.6.0-2 b43443930626 12 months ago 230 MB
When I try to create a docker image of my application that needs to be used, I get the below error.
devmapper: Thin Pool has 8783 free data blocks which is less than minimum required 163840 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior
I tried the cleaning up as mentioned in the other forums, but not helped much and getting the same error. When I tried to run with this sudo docker --storage-opt dm.min_free_space=0%, seems like it starts as a daemon, but still it failed with another error "docker-runc not installed on system" and also I dont want to run it as a daemon.
Below are some command outputs
sudo dmsetup status
localvg00-lv_home: 0 20971520 linear
localvg00-lv_home: 20971520 20971520 linear
docker-251:5-134039-pool: 0 209715200 thin-pool 924 848/524288 1629226/1638400 - rw discard_passdown queue_if_no_space
localvg00-lv_tmp: 0 4194304 linear
localvg00-lv_swap: 0 8388608 linear
localvg00-lv_root: 0 2097152 linear
localvg00-lv_root: 2097152 20971520 linear
localvg00-lv_usr: 0 16777216 linear
localvg00-lv_var: 0 8388608 linear
localvg00-lv_var: 8388608 62914560 linear
sudo docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 4
Server Version: 1.12.6
Storage Driver: devicemapper
Pool Name: docker-251:5-134039-pool
Pool Blocksize: 65.54 kB
Base Device Size: 10.74 GB
Backing Filesystem: xfs
Data file: /dev/loop0
Metadata file: /dev/loop1
Data Space Used: 106.8 GB
Data Space Total: 107.4 GB
Data Space Available: 601.2 MB
Metadata Space Used: 3.473 MB
Metadata Space Total: 2.147 GB
Metadata Space Available: 2.144 GB
Thin Pool Minimum Free Space: 10.74 GB
Udev Sync Supported: true
Deferred Removal Enabled: false
Deferred Deletion Enabled: false
Deferred Deleted Device Count: 0
Data loop file: /var/lib/docker/devicemapper/devicemapper/data
WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device.
Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
Library Version: 1.02.135-RHEL7 (2016-11-16)
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: overlay null bridge host
Swarm: inactive
Runtimes: runc docker-runc
Default Runtime: docker-runc
Security Options: seccomp
Kernel Version: 4.1.12-61.1.28.el7uek.x86_64
Operating System: Oracle Linux Server 7.3
OSType: linux
Architecture: x86_64
Number of Docker Hooks: 2
CPUs: 2
Total Memory: 7.545 GiB
Name: kubenode4
I had also tried increasing all the physical volume size and logical volume size(lv_var) on my linux machine, but still it doesnt work.
sudo lvs
[sudo] password for kubeuser4:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
lv_home localvg00 -wi-ao---- 20.00g
lv_root localvg00 -wi-ao---- 11.00g
lv_swap localvg00 -wi-ao---- 4.00g
lv_tmp localvg00 -wi-ao---- 2.00g
lv_usr localvg00 -wi-ao---- 8.00g
lv_var localvg00 -wi-ao---- 34.00g
sudo ls -lsh /var/lib/docker/devicemapper/devicemapper/data
2.3G -rw------- 1 root root 100G Mar 14 22:16 /var/lib/docker/devicemapper/devicemapper/data
Someone please let me know how it can be done.
Thanks,
It is better move away from devicemapper for a few reasons.
devicemapper in loopback unrecoverable storage issue: https://github.com/docker/docker/issues/3182 "devicemapper not recommended for production use".
I found it easy enough to switch to overlay storage driver, YMMV of course but hopefully not too much. 'rm -rf /var/lib/docker' is somewhat optional when switching but easy and I would highly recommend it as long as you can load your images back in. http://www.projectatomic.io/blog/2015/06/notes-on-fedora-centos-and-docker-storage-drivers/
systemctl stop docker
rm -rf /var/lib/docker
# if these files do not already exist . . . create them, otherwise you need to edit by hand, you can also just add -s overlay in the systemctl docker script
ls /etc/sysconfig/docker /etc/sysconfig/docker-storage
[[ $? != 0 ]] && {
echo OPTIONS='--selinux-enabled=false' > /etc/sysconfig/docker
echo "DOCKER_STORAGE_OPTIONS= -s overlay" > /etc/sysconfig/docker-storage
}
systemctl start docker
systemctl status docker
docker images
more reading:
https://docs.docker.com/engine/userguide/storagedriver/selectadriver/
https://integratedcode.us/2016/08/30/storage-drivers-in-docker-a-deep-dive/
Was able to get it working and have mentioned it in
https://forums.docker.com/t/devmapper-space-issue/29786/3

Node state=down with TORQUE v6.1.0 on a Workstation

I was installing Torque 6.1.0 on a Ubuntu 16.04 Workstation, but the
installation doesn't seem to recognize how many cores and threads the
machine has. The only node I set up showed a status of "state=down" and
any job would trigger an error saying "not enough of the right type
of nodes". In fact, the workstation has 56 threads or 28 physical cores
on 2 processors, and I only want to use 54 threads or 27 physical cores
for the shared computing jobs. I realized that this might be related to the configuration of cgroup or NUMA starting from Torque V6.0 which I am not if I was doing the right thing while installing. I indeed had the cgroup enabled, but not sure if I also need to configure NUMA-aware function to be enabled as well. Below are some outputs of current configs. What should I do? Thanks.
$ pbsnodes
node1
state = down
power_state = Running
np = 54
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 0
total_numa_nodes = 0
total_cores = 0
total_threads = 0
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
$ lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
There is also a fishy part that it seems the server cannot see the node I defined already on the server's configure file. This can be seen on the /var/spool/torque/server_logs log file:
12/27/2016 15:48:33.147;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:18.232;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:25.491;08;PBS_Server.2696;Job;0.NapaValley;Job deleted at request of cquic#localhost
12/27/2016 15:49:27.023;08;PBS_Server.2657;Job;0.NapaValley;on_job_exit valid pjob: 0.NapaValley (substate=59)
12/27/2016 15:49:32.996;256;PBS_Server.2657;Job;0.NapaValley;dequeuing from batch, state COMPLETE
12/27/2016 15:49:59.722;256;PBS_Server.2696;Job;1.NapaValley;enqueuing into batch, state 1 hop 1
12/27/2016 15:49:59.722;08;PBS_Server.2696;Job;perform_commit_work;job_id: 1.NapaValley
12/27/2016 15:49:59.722;02;PBS_Server.2696;node;close_conn;Closing connection 9 and calling its accompanying function on close
12/27/2016 15:49:59.795;64;PBS_Server.2692;Req;node_spec;job allocation request exceeds currently available cluster nodes, 1 requested, 0 available
12/27/2016 15:49:59.796;08;PBS_Server.2692;Job;1.NapaValley;Job Modified at request of root#localhost
12/27/2016 15:50:03.312;01;PBS_Server.2696;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
On my /etc/hosts, I have
127.0.0.1 localhost node1
127.0.0.1 NapaValley
PS: I have tried to mount cpu and other modules to /var/spool/torque/cgroup directories, but lssubsys -am still showed the same information as above. I assume they should have been mounted?
A node will report to the server with a name returned by the gethostbyname call. Based on the log lines you posted, the server and the node don't agree on that name. You can have pbs_mom return a different name by starting it with the -H option:
http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/commands/pbs_mom.htm#-h
"-H hostname Sets the MOM's hostname. This can be useful on multi-homed networks."
This is equivalent to setting $mom_host node1 in /var/spool/torque/mom_priv/config.

How do I access a USB drive on a OSX host from inside a docker container?

I have an application that I eventually want to run on a cloud computing service (e.g., such as AWS or Google Cloud) packaged inside a docker image. The reason the application will need to run in the cloud is because it's designed to process large data files, but before I actually deploy, I'd like to test it first on a local laptop, using a single large data file that I've stored (for test and development purposes) on an external USB drive.
My development machine is an OSX laptop, and I'm using a recent version of docker:
stachyra> uname -a
Darwin Andrews-MacBook-Pro-76.local 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep 1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64
stachyra> docker --version
Docker version 1.10.2, build c3959b1
OSX has mounted my external USB drive, device /dev/disk2s2, as /Volumes/MGR DATA:
stachyra> df
Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on
/dev/disk1 974770480 435721376 538537104 45% 54529170 67317138 45% /
devfs 375 375 0 100% 650 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home
/dev/disk2s2 3906291632 3869523640 36767992 100% 483690453 4595999 99% /Volumes/MGR DATA
/dev/disk3s1 196608 193160 3448 99% 24143 431 98% /Volumes/VirtualBox
stachyra> diskutil list
/dev/disk0
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *500.3 GB disk0
1: EFI EFI 209.7 MB disk0s1
2: Apple_CoreStorage 499.4 GB disk0s2
3: Apple_Boot Recovery HD 650.0 MB disk0s3
/dev/disk1
#: TYPE NAME SIZE IDENTIFIER
0: Apple_HFS Macintosh HD *499.1 GB disk1
Logical Volume on disk0s2
DB70B91A-3B57-4C82-A758-C4BDEA4160FD
Unlocked Encrypted
/dev/disk2
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *2.0 TB disk2
1: EFI EFI 209.7 MB disk2s1
2: Apple_HFS MGR DATA 2.0 TB disk2s2
/dev/disk3
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *100.7 MB disk3
1: Apple_HFS VirtualBox 100.7 MB disk3s1
and it should also be noted, the drive has several directories and data which are visible inside it, at least when viewed directly through OSX:
stachyra> ls -l /Volumes/MGR\ DATA
total 0
drwxr-xr-x 6 stachyra staff 204 Apr 14 2015 1000genomes
drwxr-xr-x 5 stachyra staff 170 Oct 12 17:41 GIAB
drwxr-xr-x 4 stachyra staff 136 Apr 28 2015 genome_browser_tracks
drwxr-xr-x 24 stachyra staff 816 Oct 6 14:00 mitty
I have tried to follow the advice from this question, which describes how to mount a USB drive in docker when docker is running within a linux host. But my local laptop is OSX, not linux, so it doesn't seem to work.
Explicitly, when attempting to follow the advice of the accepted answer, I obtain the following result:
stachyra> docker run -i -t --privileged -v /dev/disk2s2:/dev/foo ubuntu bash
root#8da7b492a707:/# uname -a
Linux 8da7b492a707 4.1.18-boot2docker #1 SMP Sat Feb 20 08:24:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root#8da7b492a707:/# ls -l /dev/foo
total 0
root#8da7b492a707:/#
Based upon the response, one can see that docker does indeed launch a linux container correctly, and it also creates a volume /dev/foo inside of the container as requested, but the actual contents of the USB drive are not accessible via that location--the ls -l command claims there are no files or directories there.
I also tried the second method described in an alternate response to the same question, and that fails even worse:
stachyra> docker run -i -t --device=/dev/disk2s2 ubuntu bash
docker: Error response from daemon: error gathering device information while adding custom device "/dev/disk2s2": not a device node.
stachyra>
I have found another discussion thread on stackoverflow which suggests that raw USB access is handled quite differently in OSX than in linux, which I suspect is probably the reason why both of the above attempts at USB access are failing.
But, what should I actually do about it? That is to say, what is the correct sequence of actions or commands to allow docker to access a USB device mounted on an OSX host, rather than linux?
I was finally able to access my USB drive from /var/media inside my container by using the machine-diskutil.sh script mentioned in warmoverflow's comment like so
machine-diskutil.sh mount my-machine-name /Volumes/my-usb-drive
and then starting the container like so
docker run -v /Volumes/my-usb-drive:/var/media -it my/image:latest bash
Because I had tried to add /Volumes/my-usb-drive as a shared folder manually in VirtualBox, I first got this error.
Error: The shared folder /Volumes/Seagate already exists on the
docker machine, please unmount it first.
So I removed it manually and re-ran the machine-diskutil.sh mount command without any problems. Great stuff!
As per #pgayvallet comment on GitHub:
As the daemon runs inside a VM in Docker Desktop, it is not possible to actually share a mac host device with the container inside the VM, and this will most definitely never be possible.

AWS - EC2 - MongoDB replica set time sync issue - NTP - replication lag

We are encountering clock drift issues with our MongoDB replica set running on AWS. This just seemed to start happening recently after we added additional data to the set, before then we did not really notice this issue unless the system was under heavy load. The following error is logged in the mongod.log file sporadically and the system is not under load.
To test this we have isolated a set of machines with the same dataset and not in use by our web application though the error is still occurring;
2014-12-12T13:33:51.333+0000 [rsBackgroundSync] changing sync target
because current sync target's most recent OpTime is Dec 12 13:32:42:c
which is more than 30 seconds behind member mongo1:27017 whose most
recent OpTime is 1418391230
From the above the time stamp shows that one of the mongodb replica set members is over a minute behind. The worst we have seen is 12 minutes out of sync.
This error in turn causes replication lag and we receive the notification about this from the Mongo Monitoring Service although it does correct itself.
The setup is 3 x r3.xlarge AWS Linux instances, 1 in each availability zone of the EU-West-1A region. The machines have been setup using the Mongo recommended settings with a Raid array and the cloud formation scripts provided by Mongo. The data is around 4GB in size.
We think the issue is related to the NTP sync, by default on the AWS Linux Amazon Machine Image the ntpd service is configured to go to a pool of aws ntp servers hosted on www.pool.ntp.org.
To try and rule this out we setup our own NTP server on AWS that the MongoDB servers could sync to. The issue still occurred so we changed the maxpoll and minpoll time for the ntpd service on the mongo machines to sync the time every 16 seconds from the NTP server but the error is still occurring.
We increased the MongoDB OpLog size as well to see if that would make any difference but it didn’t.
Does anyone else encounter this type of issue? Is there something we are missing?
Cheers,
Colin.
ps -ef |grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5#1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
After upgrading to MongoDB 3 using the WiredTiger storage engine we do not see this issue any more.

Resources