Kubernetes NFS PV: Lock reclaim failed - linux

Configuration:
NFS server and the k8s cluster(single node cluster) run on two machines and use the same OS and NFS software, as below:
[root#test-2 ~]# yum info nfs-utils
Failed to set locale, defaulting to C
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.huaweicloud.com
Installed Packages
Name : nfs-utils
Arch : x86_64
Epoch : 1
Version : 1.3.0
Release : 0.68.el7
Size : 1.1 M
Repo : installed
From repo : base
Summary : NFS utilities and supporting clients and daemons for the kernel NFS server
URL : http://sourceforge.net/projects/nfs
License : MIT and GPLv2 and GPLv2+ and BSD
Description : The nfs-utils package provides a daemon for the kernel NFS server and
: related tools, which provides a much higher level of performance than the
: traditional Linux NFS server used by most users.
:
: This package also contains the showmount program. Showmount queries the
: mount daemon on a remote host for information about the NFS (Network File
: System) server on the remote host. For example, showmount can display the
: clients which are mounted on that host.
:
: This package also contains the mount.nfs and umount.nfs program.
[root#test-2 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root#test-2 ~]# uname -a
Linux test-2 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root#test-2 ~]# cat /etc/exports
/home/nfs 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check,insecure)
K8S version: v1.17.9
Problems:
The application(a statefulset) running on k8s is using a PV that was dynamically provisioned by the k8s-nfs-provisioner, the PV is actually backed by a directory on remote NFS server. The application is keeping "CrashLoopBackOff" because it hits "input/output error" constantly when writing some data to the PV after only a few seconds of running.
Meanwhile, I saw a lot of errors in /var/log/messages:
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:42 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
I took a tcpdump until hit "Lock reclaim failed" in system log, and found there are many NFS errors as below:
NFS4ERR_BADSESSION (10052)
NFS4ERR_STALE_CLIENTID (10022)
NFS4ERR_NO_GRACE (10033)
I'm not sure if they're related to the "lock reclaim failed" or the "input/output" error.
I have encountered this problem on different machines with different machines from time to time and it really annoys me.
Anyone knows the root cause or how to fix it? Big thanks in advance.
Screenshots
application pod log
NFS errors in tcpdump
nfsstate -m output on k8s
nfsstate -c output on k8s, NOTE the high open_noat value.
NFS server configuration (my k8s node is 111.1.30.16)

Related

Remote I/O error while read/write to NFS share for specific user and client

Setup:
NFS server: NFSServerHOST
NFS Share: NFSServerHOST:/MYSHARE
NFS Client1: CLNT1
NFS Client2: CKNT2
NFS Client3: CLNT3
Client OS: RHEL 7 and 8
Client users: User1, User2
Local Mount on Client: /var/NFSSHARE
mount -t nfs4 NFSServerHOST:/MYSHARE /var/NFSSHARE
Mount created successfully on all clients. Both User1 and User2 can read/write on /var/NFSSHARE from all 3 Clients.
Now Something happens on Client2 (we're yet to find out if it's related to server patch or some cron job) User1 cannot read/write to /var/NFSSHARE only on Client1. User2 can still read/write to NFSSHare on CLient1. Both users can still read/write on Client2 and client3.
Error while performing read/write on Client1 for User1: Remote I/O error
If we reboot client1 the issue is gone and User1 can again perform I/O operatrion on NFSSahre from Client1.
Some of the things we checked:
No version mismatch: Both NFS client and NFS Server config is for NFS V4
Nothing wrong with whitelisting: ALl 3 Client IPs are whitelisted on NFSServer
Have checked the inodes and lsof usage which is well within the limit.
nfs4_getafacl /var/NFSSHARE
# file: /var/NFSSHARE
A::EVERYONE#:rwaDxtTnNcy
running getfacl /var/NFSSHARE with User1 as logged in User on CLient1
# file: var/MQHA/
# owner: nobody
# group: nobody
user::rwx
group::rwx
other::rwx
comparing rpcdebug log while performing I/O operation on Client1 (FAILURE) vs Client2 (SUCCESS)
kernel: NFS: nfs_update_inode(0:57/3963604504 fh_crc=0xbf9e74c8 ct=2 info=0x427e7f)
kernel: NFS: (0:57/3963604504) revalidation complete
kernel: NFS: permission(0:57/3963604504), mask=0x1, res=0
kernel: NFS: permission(0:57/3963604504), mask=0x3, res=0
kernel: NFS: atomic_open(0:57/3963604504), Abhi
kernel: --> nfs_put_client({2})
kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
Logs chane after this point. Before both SUCCESS and FAILURE are more or less same just the numeric values are different.
Client1 (FAILURE)
kernel: nfs4_free_slot: slotid 0 highest_used_slotid 1
kernel: NFS: permission(0:57/3963604504), mask=0x81, res=-10
kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
kernel: decode_attr_type: type=00
Client2 (SUCCESS)
kernel: decode_attr_type: type=0100000
kernel: decode_attr_change: change attribute=7148460619683717735
kernel: decode_attr_size: file size=0
Looking for suggestions to diagnose this issue. What more can we do to enable more verbose logging either on Client or server side to know more about the error ?
Thanks

JxBrower7.7 start timeOut error,Red Hat Enterprise Linux Server release 7.6 (Maipo)

linux:
[root#localhost bin]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
[root#localhost bin]# cat /proc/version
Linux version 4.14.0-115.5.1.el7a.06.aarch64 (mockbuild#arm-buildhost1) (gcc version 4.8.5 20150623 (NeoKylin 4.8.5-36) (GCC)) #1 SMP Tue Jun 18 10:34:55 CST 2019
[root#localhost bin]# file /bin/bash
/bin/bash: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 3.7.0, BuildID[sha1]=8a346ec01d611062313a5a4ed2b0201ecc9d9fa1, stripped
JxBrower7.7:
i used this demo,the line 55 is:Browser browser = engine.newBrowser();
enter code here
public static void main(String[] args) {
Engine engine = Engine.newInstance(
EngineOptions.newBuilder(OFF_SCREEN).build());
Browser browser = engine.newBrowser();
enter code here
[root#localhost bin]# java -jar test.jar
Exception in thread "main" com.teamdev.jxbrowser.navigation.TimeoutException: Failed to execute task withing 45 seconds.
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadAndWait(NavigationImpl.java:248)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:105)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:82)
at com.teamdev.jxbrowser.navigation.internal.NavigationImpl.loadUrlAndWait(NavigationImpl.java:74)
at com.teamdev.jxbrowser.engine.internal.EngineImpl.newBrowser(EngineImpl.java:458)
at com.pinnet.HelloWorld.main(HelloWorld.java:55)
linux logs at /var/logs/messages:
22 09:48:53 localhost dbus[8661]: [system] Activating via systemd: service name='org.bluez' unit='dbus-org.bluez.service'
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90562 (chromium) of user 0 killed by SIGABRT - dumping core
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90566 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90561 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:54 localhost abrt-hook-ccpp: Process 90593 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:55 localhost abrt-hook-ccpp: Process 90624 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:55 localhost abrt-hook-ccpp: Process 90623 (chromium) of user 0 killed by SIGABRT - ignoring (repeated crash)
May 22 09:48:56 localhost abrt-server: Duplicate: core backtrace
May 22 09:48:56 localhost abrt-server: DUP_OF_DIR: /var/spool/abrt/ccpp-2020-05-21-16:55:06-33694
May 22 09:48:56 localhost abrt-server: Deleting problem directory ccpp-2020-05-22-09:48:54-90562 (dup of ccpp-2020-05-21-16:55:06-33694)
May 22 09:48:56 localhost abrt-server: /bin/sh: reporter-mailx: 未找到命令
May 22 09:49:18 localhost dbus[8661]: [system] Failed to activate service 'org.bluez': timed out

After suspend guest OS hangs when using vagrant with nfs

Host OS Ubuntu 15.10
Guest OS Ubuntu 14.10
Using Vagrant with nfs and Virtualbox and static ip on the private network.
It is working perfectly except that after having suspended the host OS, the entire guest OS will be unusable.
This does not happen when using the normal virtualbox shared folders.
It's not only the nfs shared folder that is unusable, the entire OS is hanging.
Even syslog does not seem to see much action.
This is syslog on the guest, from waking up until vagrant halt is completed.
Feb 26 07:15:33 vagrant kernel: [ 8375.252989] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Feb 26 07:16:11 vagrant kernel: [ 8413.109832] nfs: server 192.168.33.1 not responding, still trying
Feb 26 07:16:38 vagrant kernel: [ 8440.687476] nfs: server 192.168.33.1 not responding, still trying
Feb 26 07:17:01 vagrant CRON[3776]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 26 07:20:33 vagrant rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="753" x-info="http://www.rsyslog.com"] exiting on signal 15.
How can this be fixed?
How should I debug it?

Is there a gcloud API to detect when a Compute Engine server is completely up?

I create an VM instance. I can connect to it as soon ad the SSH Daemon is started. But this is too early because kernel startup is only at approx. 30%. Is there a gcloud or other API to get the VM state when the kernel has finished startup?
Nov 18 10:58:51 image-name google: No startup script found in metadata.
Nov 18 10:58:53 image-name kernel: [ 27.491829] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.703142] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.735867] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.771732] aufs au_opts_verify:1570:docker[2260]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.797540] device vethfa3ab85 entered promiscuous mode
Nov 18 10:58:53 image-name kernel: [ 27.804420] IPv6: ADDRCONF(NETDEV_UP): vethfa3ab85: link is not ready
Nov 18 10:58:53 image-name kernel: [ 28.028306] IPv6: ADDRCONF(NETDEV_CHANGE): vethfa3ab85: link becomes ready
Nov 18 10:58:53 image-name kernel: [ 28.035505] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.041963] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.048532] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
Nov 18 10:58:54 image-name kernel: [ 28.980082] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:1 detected!
->>> about here I can SSH to the server
Nov 18 10:59:08 image-name kernel: [ 43.068094] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:59:53 image-name kernel: [ 87.944452] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.001012] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.049510] aufs au_opts_verify:1570:docker[2815]: dirperm1 breaks the protection by the permission bits on the lower branch
->>> I want to know about this point in the startup process
My problem is that I can connect to it using SSH when kernel progress is below 30% and some processes are not yet started. I want to detect somehow if the server has completed startup. Or is there a script that can push to the server (through the GCE APIs) to notify me when a server is completely up?
gcloud compute instances describe image-name does return the same output from the moment the instance is started till the kernel startup is complete.
(In my case I use the Node.js GCE API, but this should not make any difference.)
Presently I am not aware of any such google native API that can provide a progress of instance start.
However this is a quick workaround check if this fits your requirement.
You can either use the Google Startup script or the native linux rc.local. The concept is the same, so explaining it for the case of rc.local [as it is generic and not tied to google]
We know that the last process in a bootup sequence that runs is rc.local. Any command or script or call that is in this rc.local [which is a sh or bash script by itself] will be executed at the end of boot process.
So the idea would be in the google image in case of rc.local, have a script or a call which send your a notification or writes a output to central system like KV or cloud storage the state that bootup is all done.
Similar to Kamran, but here is how I get this done. It depends on using a google startup script and an image where gcloud is installed by default (though you could rework this to just use curl and API calls)
On instance creation/configuration, I set a custom metadata flag: serverready=False
At the end of my google startup script, I have this:
sudo gcloud compute instances add-metadata $(hostname) \
--metadata serverready=True \
--zone $(curl \
"http://metadata.google.internal/computeMetadata/v1/instance/zone" \
-H "Metadata-Flavor: Google"|cut -d/ -f4)
When I run the instance creation, I can just poll the metadata for the serverready key, and set my app to wait until it sees serverready=True

Creating dm-cache using dmsetup Kernel 3.9.6

I am trying to create a dm-cache device using a virtual machine. I have multiple disks labelled Cache_disk, Device_to_Cache and meta_data in /dev/sdb,sdc,sdd respectively, When trying to create the cache, I run the command:
dmsetup create my_cache --table '0 16775168 cache /dev/sdd1 /dev/sdb1 /dev/sdc1 512 1 writeback default 0'
as instructed in the dm-cache article Documentation page
I have enabled dm-cache in the kernel but am catching this error:
device-mapper: reload ioctl failed: Invalid or incomplete multibyte or wide character
command failed
When looking at the dmesg device-mapper: cache metadata fails its sb_check
root#msali014-VirtualBox:/home/msali014# dmesg
[ 5432.738603] device-mapper: cache-policy-mq: version 1.0.0 loaded
[ 5432.794852] device-mapper: cache metadata: sb_check failed: magic 0: wanted 1623043
[ 5432.794862] device-mapper: block manager: superblock validator check failed for block 0
[ 5432.794867] device-mapper: cache metadata: couldn't read lock superblock
[ 5432.797952] device-mapper: table: 252:0: cache: Error creating metadata object
the /var/log/syslog is similar:
Jun 28 11:17:01 msali014-VirtualBox CRON[2935]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.738603] device-mapper: cache-policy-mq: version 1.0.0 loaded
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794852] device-mapper: cache metadata: sb_check failed: magic 0: wanted 1623043
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794862] device-mapper: block manager: superblock validator check failed for block 0
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.794867] device-mapper: cache metadata: couldn't read lock superblock
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.797952] device-mapper: table: 252:0: cache: Error creating metadata object
Jun 28 11:33:08 msali014-VirtualBox kernel: [ 5432.797960] device-mapper: ioctl: error adding target to table
Jun 28 11:33:08 msali014-VirtualBox udevd[619]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory
Jun 28 11:33:08 msali014-VirtualBox udevd[619]: inotify_add_watch(6, /dev/dm-0, 10) failed: No such file or directory
[ 5432.797960] device-mapper: ioctl: error adding target to table
How can I change the value of sb->magic to make dm-cache successfully load? Any help would be greatly appreciated.
The multibyte or wide character error message is troubling to me and I don't have any direct advice to working around that.
I assume /dev/sdd1 and /dev/sdb1 are your metadata and data storage block devices? Do they contain any data?
Have you tried zeroing out the metadata volume (dd if=/dev/zero of=/dev/sdd1)? I had issues with when setting up dm-cache a while back.
In a nutshell I do the following (on Ubuntu 13.04 + Linux 3.10 release):
dmsetup create ssd-metadata --table '0 19370 linear /dev/disk/by-id/scsi-SATA_OCZ-AGILITY2_f2d200034-part6 0'
dmsetup create ssd-blocks --table '0 189008982 linear /dev/disk/by-id/scsi-SATA_OCZ-AGILITY2_f2d200034-part6 19370'
dmsetup create home-cached --table '0 1048576000 cache /dev/mapper/ssd-metadata /dev/mapper/ssd-blocks /dev/vg0/spindle 512 1 writeback default 0'
On a side note, I ran the 3.9.6 and a few earlier 3.9 kernels without issue on Ubuntu 12.11 and 13.04.
If all else fails, I have a working solution for my setup on my blog with more details, you might want to check out for a step by step tutorial.
I had the exact same problem:
[ 968.960618] device-mapper: cache metadata: sb_check failed: blocknr 985712174465152: wanted 0
Copying 0 to my metadata device fixed the problem.
dd bs=64k if=/dev/zero of=/dev/md1
Thanks #Kyle.

Resources