Node state=down with TORQUE v6.1.0 on a Workstation - linux

I was installing Torque 6.1.0 on a Ubuntu 16.04 Workstation, but the
installation doesn't seem to recognize how many cores and threads the
machine has. The only node I set up showed a status of "state=down" and
any job would trigger an error saying "not enough of the right type
of nodes". In fact, the workstation has 56 threads or 28 physical cores
on 2 processors, and I only want to use 54 threads or 27 physical cores
for the shared computing jobs. I realized that this might be related to the configuration of cgroup or NUMA starting from Torque V6.0 which I am not if I was doing the right thing while installing. I indeed had the cgroup enabled, but not sure if I also need to configure NUMA-aware function to be enabled as well. Below are some outputs of current configs. What should I do? Thanks.
$ pbsnodes
node1
state = down
power_state = Running
np = 54
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
total_sockets = 0
total_numa_nodes = 0
total_cores = 0
total_threads = 0
dedicated_sockets = 0
dedicated_numa_nodes = 0
dedicated_cores = 0
dedicated_threads = 0
$ lssubsys -am
cpuset /sys/fs/cgroup/cpuset
cpu,cpuacct /sys/fs/cgroup/cpu,cpuacct
blkio /sys/fs/cgroup/blkio
memory /sys/fs/cgroup/memory
devices /sys/fs/cgroup/devices
freezer /sys/fs/cgroup/freezer
net_cls,net_prio /sys/fs/cgroup/net_cls,net_prio
perf_event /sys/fs/cgroup/perf_event
hugetlb /sys/fs/cgroup/hugetlb
pids /sys/fs/cgroup/pids
There is also a fishy part that it seems the server cannot see the node I defined already on the server's configure file. This can be seen on the /var/spool/torque/server_logs log file:
12/27/2016 15:48:33.147;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:18.232;01;PBS_Server.2692;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
12/27/2016 15:49:25.491;08;PBS_Server.2696;Job;0.NapaValley;Job deleted at request of cquic#localhost
12/27/2016 15:49:27.023;08;PBS_Server.2657;Job;0.NapaValley;on_job_exit valid pjob: 0.NapaValley (substate=59)
12/27/2016 15:49:32.996;256;PBS_Server.2657;Job;0.NapaValley;dequeuing from batch, state COMPLETE
12/27/2016 15:49:59.722;256;PBS_Server.2696;Job;1.NapaValley;enqueuing into batch, state 1 hop 1
12/27/2016 15:49:59.722;08;PBS_Server.2696;Job;perform_commit_work;job_id: 1.NapaValley
12/27/2016 15:49:59.722;02;PBS_Server.2696;node;close_conn;Closing connection 9 and calling its accompanying function on close
12/27/2016 15:49:59.795;64;PBS_Server.2692;Req;node_spec;job allocation request exceeds currently available cluster nodes, 1 requested, 0 available
12/27/2016 15:49:59.796;08;PBS_Server.2692;Job;1.NapaValley;Job Modified at request of root#localhost
12/27/2016 15:50:03.312;01;PBS_Server.2696;Svr;PBS_Server;LOG_ERROR::get_node_from_str, Node node1 is reporting on node NapaValley, which pbs_server doesn't know about
On my /etc/hosts, I have
127.0.0.1 localhost node1
127.0.0.1 NapaValley
PS: I have tried to mount cpu and other modules to /var/spool/torque/cgroup directories, but lssubsys -am still showed the same information as above. I assume they should have been mounted?

A node will report to the server with a name returned by the gethostbyname call. Based on the log lines you posted, the server and the node don't agree on that name. You can have pbs_mom return a different name by starting it with the -H option:
http://docs.adaptivecomputing.com/torque/6-0-2/adminGuide/help.htm#topics/torque/commands/pbs_mom.htm#-h
"-H hostname Sets the MOM's hostname. This can be useful on multi-homed networks."
This is equivalent to setting $mom_host node1 in /var/spool/torque/mom_priv/config.

Related

GlusterFS on Freebsd 11.1 / Mount issue

I want to use GlusterFS as a distributed Filestorage on FreeBSD 11.1
Documentation is poor, so I followed some howtos on the net.
I could create the glusterfs volume, but I have trouble to mount it on an other clients machine. Here is what I did so far:
I have three hosts, all in the same subnet.
10.0.0.21 Webserver
10.0.0.31 gluster1
10.0.0.32 gluster2
I added the above entries in the /etc/hosts files on all of the three hosts.
I modified /etc/rc.conf on gluster1 and gluster2 with:
glusterd_enable="YES"
on gluster1 I did:
gluster peer probe gluster2
(succeeded)
each gluster1 and gluster2 has the following harddrives: /dev/da1
they are partitioned (BSD Label) and mounted on gluster1 and gluster2 as /datastore
"cat /etc/fstab" gives on both gluster1 and gluster2:
# Device Mountpoint FStype Options Dump Pass#
/dev/da0a / ufs rw 1 1
/dev/da1a /datastore ufs rw 2 2
I created the gluster volume1:
gluster volume create volume1 replica 2 transport tcp gluster1:/datastore gluster2:/datastore force
(I'm aware of the split brain risk, this is a simple test szenario)
I started the volume1 with:
gluster volume start volume1
A check of the volume1 with:
gluster volume info
gives me back:
Type: Replicate
Volume ID: a760c545-1cc9-47a4-bc9e-51f6180e4d7a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/datastore
Brick2: gluster2:/datastore
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
So far everything worked, and seems to be fine.
Now my trouble starts to mount and use this on the client / consumer machine (Webserver)
I read at several places that the glusterfs volume1 should be mountable with:
mount -t glusterfs gluster1:/volume1 /mnt
This gives me simple back the following error:
mount: gluster1:/volume1: Operation not supported by device
As I normally do before I ask "silly" questions, I googled a lot for this.
Played around with also installing glusterfs on the client (pkg install glusterfs), enabling it in the clients /etc/rc.conf, adding stuff for FUSE, but I could not bring it up to work.
I feel quite annoyed, because I know it must be a very small thing I'm missing here!?
Can anyone shed some light into my issue?
luster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/datastore N/A N/A N N/A
Brick gluster2:/datastore N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A N 55181
Self-heal Daemon on gluster2 N/A N/A N 30318
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
So, I enabled NFS with this:
gluster volume set volume1 nfs.disable off
There was a warning of no longer using GlusterFS NFS, but instead to use NFS-Ganesha. The warning I ignored for this test.
now I restarted the volume:
gluster volume stop volume1
gluster volume start volume1
To check I did:
gluster volume info
which showed me now:
Volume Name: volume1
Type: Replicate
Volume ID: a760c545-1cc9-47a4-bc9e-51f6180e4d7a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/datastore
Brick2: gluster2:/datastore
Options Reconfigured:
nfs.disable: off
transport.address-family: inet
So the nfs.disable was set to off. NFS should be on now right?
But
gluster volume status volume1
still shows no NFS running:
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick gluster1:/datastore N/A N/A N N/A
Brick gluster2:/datastore N/A N/A N N/A
NFS Server on localhost N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A N 99115
NFS Server on gluster2 N/A N/A N N/A
Self-heal Daemon on gluster2 N/A N/A N 37075
Task Status of Volume volume1
------------------------------------------------------------------------------
There are no active volume tasks
Disturbing here is also (beside NFS Online is N), that both bricks seems to be not online too (Online indicated as N)?!??
So I'm really stuck and could use some help.
Finally it is working:
/usr/local/sbin/mount_glusterfs gluster1:/volume1 /mnt
did the trick...
the client also need to have the net/glusterfs package installed, and the following statement in the /boot/loader.conf:
fuse_load="YES"
Cheers
I think issue may be with ufs file system. Does it support extended attributes extensively ?
GlusterFS required FS with extended attribute support. (XFS is one).
From the link: (https://access.redhat.com/articles/1273933)
As the Red Hat Storage makes extensive use of extended attributes, an XFS inode size of 512 bytes works better with Red Hat Storage than the default XFS inode size of 256 bytes. So, inode size for XFS must be set to 512 bytes, while formatting the Red Hat Storage bricks. To set the inode size, you need to use -i size option with the mkfs.xfs command.

cgroup limit reached - no space left on device

We have two servers running ubuntu 14.04 using docker. Every other month when starting or building a container we get the message:
container_linux.go:247: starting container process caused "process_linux.go:258: applying cgroup configuration for process caused
\"mkdir /sys/fs/cgroup/memory/docker/cf657a58a1382e62976b4d339946f07e8a40f22f18b52822f884834f78830806: no space left on device\""
The disks have still lots of space but cat /proc/cgroups gives this: (num_cgroups keeps increasing)
#subsys_name hierarchy num_cgroups enabled
cpuset 1 65805 1
cpu 2 65807 1
cpuacct 3 65803 1
blkio 4 65803 1
memory 5 65535 1
devices 6 65805 1
freezer 7 65803 1
net_cls 8 65803 1
perf_event 9 65803 1
net_prio 10 65803 1
hugetlb 11 65803 1
Restarting the server always helped so far but we don't want to restart a server every few months.
So I started some research and found a directory in the /sys/fs/cgroup/*/user path.
/sys/fs/cgroup/systemd/user/998.user is itself holding 65662 subdirectories. All named somewhat like 36309.session (the number increases)
Is there a ways to see what process is creating those cgroups?
I thought it was process 998, but that doesn't even exists.
I ran into this same problem with AWS Batch. I have no solution but I found this discussion https://github.com/moby/moby/issues/29638. It seems that the problem is some kind of leak in kernel and/or Docker.
I encountered the same issue. You probably have a lot of dangling images/containers
which is causing the cgroup of docker to run out of space. check it by:
docker images -a
docker ps -a
You need to clean it up. One solution is to remove all images/containers/etc that are not being used at the moment:
docker system prune -a

How do I access a USB drive on a OSX host from inside a docker container?

I have an application that I eventually want to run on a cloud computing service (e.g., such as AWS or Google Cloud) packaged inside a docker image. The reason the application will need to run in the cloud is because it's designed to process large data files, but before I actually deploy, I'd like to test it first on a local laptop, using a single large data file that I've stored (for test and development purposes) on an external USB drive.
My development machine is an OSX laptop, and I'm using a recent version of docker:
stachyra> uname -a
Darwin Andrews-MacBook-Pro-76.local 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep 1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64
stachyra> docker --version
Docker version 1.10.2, build c3959b1
OSX has mounted my external USB drive, device /dev/disk2s2, as /Volumes/MGR DATA:
stachyra> df
Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on
/dev/disk1 974770480 435721376 538537104 45% 54529170 67317138 45% /
devfs 375 375 0 100% 650 0 100% /dev
map -hosts 0 0 0 100% 0 0 100% /net
map auto_home 0 0 0 100% 0 0 100% /home
/dev/disk2s2 3906291632 3869523640 36767992 100% 483690453 4595999 99% /Volumes/MGR DATA
/dev/disk3s1 196608 193160 3448 99% 24143 431 98% /Volumes/VirtualBox
stachyra> diskutil list
/dev/disk0
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *500.3 GB disk0
1: EFI EFI 209.7 MB disk0s1
2: Apple_CoreStorage 499.4 GB disk0s2
3: Apple_Boot Recovery HD 650.0 MB disk0s3
/dev/disk1
#: TYPE NAME SIZE IDENTIFIER
0: Apple_HFS Macintosh HD *499.1 GB disk1
Logical Volume on disk0s2
DB70B91A-3B57-4C82-A758-C4BDEA4160FD
Unlocked Encrypted
/dev/disk2
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *2.0 TB disk2
1: EFI EFI 209.7 MB disk2s1
2: Apple_HFS MGR DATA 2.0 TB disk2s2
/dev/disk3
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *100.7 MB disk3
1: Apple_HFS VirtualBox 100.7 MB disk3s1
and it should also be noted, the drive has several directories and data which are visible inside it, at least when viewed directly through OSX:
stachyra> ls -l /Volumes/MGR\ DATA
total 0
drwxr-xr-x 6 stachyra staff 204 Apr 14 2015 1000genomes
drwxr-xr-x 5 stachyra staff 170 Oct 12 17:41 GIAB
drwxr-xr-x 4 stachyra staff 136 Apr 28 2015 genome_browser_tracks
drwxr-xr-x 24 stachyra staff 816 Oct 6 14:00 mitty
I have tried to follow the advice from this question, which describes how to mount a USB drive in docker when docker is running within a linux host. But my local laptop is OSX, not linux, so it doesn't seem to work.
Explicitly, when attempting to follow the advice of the accepted answer, I obtain the following result:
stachyra> docker run -i -t --privileged -v /dev/disk2s2:/dev/foo ubuntu bash
root#8da7b492a707:/# uname -a
Linux 8da7b492a707 4.1.18-boot2docker #1 SMP Sat Feb 20 08:24:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root#8da7b492a707:/# ls -l /dev/foo
total 0
root#8da7b492a707:/#
Based upon the response, one can see that docker does indeed launch a linux container correctly, and it also creates a volume /dev/foo inside of the container as requested, but the actual contents of the USB drive are not accessible via that location--the ls -l command claims there are no files or directories there.
I also tried the second method described in an alternate response to the same question, and that fails even worse:
stachyra> docker run -i -t --device=/dev/disk2s2 ubuntu bash
docker: Error response from daemon: error gathering device information while adding custom device "/dev/disk2s2": not a device node.
stachyra>
I have found another discussion thread on stackoverflow which suggests that raw USB access is handled quite differently in OSX than in linux, which I suspect is probably the reason why both of the above attempts at USB access are failing.
But, what should I actually do about it? That is to say, what is the correct sequence of actions or commands to allow docker to access a USB device mounted on an OSX host, rather than linux?
I was finally able to access my USB drive from /var/media inside my container by using the machine-diskutil.sh script mentioned in warmoverflow's comment like so
machine-diskutil.sh mount my-machine-name /Volumes/my-usb-drive
and then starting the container like so
docker run -v /Volumes/my-usb-drive:/var/media -it my/image:latest bash
Because I had tried to add /Volumes/my-usb-drive as a shared folder manually in VirtualBox, I first got this error.
Error: The shared folder /Volumes/Seagate already exists on the
docker machine, please unmount it first.
So I removed it manually and re-ran the machine-diskutil.sh mount command without any problems. Great stuff!
As per #pgayvallet comment on GitHub:
As the daemon runs inside a VM in Docker Desktop, it is not possible to actually share a mac host device with the container inside the VM, and this will most definitely never be possible.

AWS - EC2 - MongoDB replica set time sync issue - NTP - replication lag

We are encountering clock drift issues with our MongoDB replica set running on AWS. This just seemed to start happening recently after we added additional data to the set, before then we did not really notice this issue unless the system was under heavy load. The following error is logged in the mongod.log file sporadically and the system is not under load.
To test this we have isolated a set of machines with the same dataset and not in use by our web application though the error is still occurring;
2014-12-12T13:33:51.333+0000 [rsBackgroundSync] changing sync target
because current sync target's most recent OpTime is Dec 12 13:32:42:c
which is more than 30 seconds behind member mongo1:27017 whose most
recent OpTime is 1418391230
From the above the time stamp shows that one of the mongodb replica set members is over a minute behind. The worst we have seen is 12 minutes out of sync.
This error in turn causes replication lag and we receive the notification about this from the Mongo Monitoring Service although it does correct itself.
The setup is 3 x r3.xlarge AWS Linux instances, 1 in each availability zone of the EU-West-1A region. The machines have been setup using the Mongo recommended settings with a Raid array and the cloud formation scripts provided by Mongo. The data is around 4GB in size.
We think the issue is related to the NTP sync, by default on the AWS Linux Amazon Machine Image the ntpd service is configured to go to a pool of aws ntp servers hosted on www.pool.ntp.org.
To try and rule this out we setup our own NTP server on AWS that the MongoDB servers could sync to. The issue still occurred so we changed the maxpoll and minpoll time for the ntpd service on the mongo machines to sync the time every 16 seconds from the NTP server but the error is still occurring.
We increased the MongoDB OpLog size as well to see if that would make any difference but it didn’t.
Does anyone else encounter this type of issue? Is there something we are missing?
Cheers,
Colin.
ps -ef |grep ntp;
mongodb1
ntp 5163 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839 0 09:31 pts/2 00:00:00 grep ntp
mongodb2
ntp 4834 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029 0 09:31 pts/0 00:00:00 grep ntp
mongodb3
ntp 5795 1 0 Dec11 ? 00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173 0 09:31 pts/0 00:00:00 grep ntp
cat /etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).
driftfile /var/lib/ntp/drift
# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery
# Permit all access over the loopback interface. This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1
# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst
#broadcast 192.168.1.255 autokey # broadcast server
#broadcastclient # broadcast client
#broadcast 224.0.1.1 autokey # multicast server
#multicastclient 224.0.1.1 # multicast client
#manycastserver 239.255.254.254 # manycast server
#manycastclient 239.255.254.254 autokey # manycast client
# Enable public key cryptography.
#crypto
includefile /etc/ntp/crypto/pw
# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys
# Specify the key identifiers which are trusted.
#trustedkey 4 8 42
# Specify the key identifier to use with the ntpdc utility.
#requestkey 8
# Specify the key identifier to use with the ntpq utility.
#controlkey 8
# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats
# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall
# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6
ntpq -npcrv;
remote refid st t when poll reach delay offset jitter
==============================================================================
*172.31.14.137 91.*.*.* 3 u 557 1024 377 1.121 -0.264 0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p5#1.2349-o Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1 Tue, Dec 16 2014 9:10:18.091,
clock=d83a77a7.82431efa Tue, Dec 16 2014 9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053
After upgrading to MongoDB 3 using the WiredTiger storage engine we do not see this issue any more.

Mounting ceph fails with "mount error 5 = Input/output error"

I have tried to create a ceph filesystem in a single host, for testing purposes, with the following conf file
[global]
log file = /var/log/ceph/$name.log
pid file = /var/run/ceph/$name.pid
[mon]
mon data = /srv/ceph/mon/$name
[mon.mio]
host = penny
mon addr = 127.0.0.1:6789
[mds]
[mds.mio]
host = penny
[osd]
osd data = /srv/ceph/osd/$name
osd journal = /srv/ceph/osd/$name/journal
osd journal size = 1000 ; journal size, in megabytes
[osd.0]
host = penny
devs = /dev/loop1
/dev/loop1 is formatted with XFS and is actually a file with 500Mbs (although that shouldn't matter much) Everything works pretty much OK, and health shows:
sudo ceph -s
2013-12-12 21:14:44.387240 pg v111: 198 pgs: 198 active+clean; 8730 bytes data, 79237 MB used, 20133 MB / 102 GB avail
2013-12-12 21:14:44.388542 mds e6: 1/1/1 up {0=mio=up:active}
2013-12-12 21:14:44.388605 osd e3: 1 osds: 1 up, 1 in
2013-12-12 21:14:44.388738 log 2013-12-12 21:14:32.739326 osd.0 127.0.0.1:6801/8834 181 : [INF] 2.30 scrub ok
2013-12-12 21:14:44.388922 mon e1: 1 mons at {mio=127.0.0.1:6789/0}
but when I try to mount the filesystem
sudo mount -t ceph penny:/ /mnt/ceph
mount error 5 = Input/output error
Usual answers point to ceph-mds not running, but it's actually working:
root 8771 0.0 0.0 574092 4376 ? Ssl 20:43 0:00 /usr/bin/ceph-mds -i mio -c /etc/ceph/ceph.conf
In fact, I managed to make it work previously using these instructions http://blog.bob.sh/2012/02/basic-ceph-storage-kvm-virtualisation.html verbatim previously, but after I tried again I obtained the same problem. Any idea of what might have failed?
Update as indicated by the comment, dmesg shows a problem
[ 6715.712211] libceph: mon0 [::1]:6789 connection failed
[ 6725.728230] libceph: mon1 127.0.1.1:6789 connection failed
Try to use 127.0.0.1. It looks like the kernel is resolving the hostname, but 127.0.1.1 is weird, and maybe it isn't responding to IPv6 loopback.

Resources