Remote I/O error while read/write to NFS share for specific user and client - linux

Setup:
NFS server: NFSServerHOST
NFS Share: NFSServerHOST:/MYSHARE
NFS Client1: CLNT1
NFS Client2: CKNT2
NFS Client3: CLNT3
Client OS: RHEL 7 and 8
Client users: User1, User2
Local Mount on Client: /var/NFSSHARE
mount -t nfs4 NFSServerHOST:/MYSHARE /var/NFSSHARE
Mount created successfully on all clients. Both User1 and User2 can read/write on /var/NFSSHARE from all 3 Clients.
Now Something happens on Client2 (we're yet to find out if it's related to server patch or some cron job) User1 cannot read/write to /var/NFSSHARE only on Client1. User2 can still read/write to NFSSHare on CLient1. Both users can still read/write on Client2 and client3.
Error while performing read/write on Client1 for User1: Remote I/O error
If we reboot client1 the issue is gone and User1 can again perform I/O operatrion on NFSSahre from Client1.
Some of the things we checked:
No version mismatch: Both NFS client and NFS Server config is for NFS V4
Nothing wrong with whitelisting: ALl 3 Client IPs are whitelisted on NFSServer
Have checked the inodes and lsof usage which is well within the limit.
nfs4_getafacl /var/NFSSHARE
# file: /var/NFSSHARE
A::EVERYONE#:rwaDxtTnNcy
running getfacl /var/NFSSHARE with User1 as logged in User on CLient1
# file: var/MQHA/
# owner: nobody
# group: nobody
user::rwx
group::rwx
other::rwx
comparing rpcdebug log while performing I/O operation on Client1 (FAILURE) vs Client2 (SUCCESS)
kernel: NFS: nfs_update_inode(0:57/3963604504 fh_crc=0xbf9e74c8 ct=2 info=0x427e7f)
kernel: NFS: (0:57/3963604504) revalidation complete
kernel: NFS: permission(0:57/3963604504), mask=0x1, res=0
kernel: NFS: permission(0:57/3963604504), mask=0x3, res=0
kernel: NFS: atomic_open(0:57/3963604504), Abhi
kernel: --> nfs_put_client({2})
kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
Logs chane after this point. Before both SUCCESS and FAILURE are more or less same just the numeric values are different.
Client1 (FAILURE)
kernel: nfs4_free_slot: slotid 0 highest_used_slotid 1
kernel: NFS: permission(0:57/3963604504), mask=0x81, res=-10
kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
kernel: decode_attr_type: type=00
Client2 (SUCCESS)
kernel: decode_attr_type: type=0100000
kernel: decode_attr_change: change attribute=7148460619683717735
kernel: decode_attr_size: file size=0
Looking for suggestions to diagnose this issue. What more can we do to enable more verbose logging either on Client or server side to know more about the error ?
Thanks

Related

Filesystem stats not available after CIFS reconnect

I am using a Windows Server 2019 with SMBServer Shares which get mounted on a SLES via CIFS.
When Windows does its periodic system cleanup (closing idle SMBServer sessions) the Linux server reconnects with following Kernel message:
CIFS: VFS: \\filer.example.com has not responded in 180 seconds. Reconnecting...
The reconnect seems to be successful, as reading and writing files to the mount is possible.
But querying disk stats is not possible anymore.
Bad file descriptor on df:
user#suse:~$ df -h
df: /mnt/test: Bad file descriptor
Filesystem Size Used Avail Use% Mounted on
...
Wrong data on stat:
user#suse:~$ stat /mnt/test
File: /mnt/test
Size: 0 Blocks: 0 IO Block: 1048576 directory
Device: 38h/56d Inode: 281474976710700 Links: 2
Access: (0755/drwxr-xr-x) Uid: ( 1100/ application-user) Gid: ( 80/ application-group)
Access: 2023-02-15 11:04:55.977807600 +0100
Modify: 2023-02-15 11:04:55.977807600 +0100
Change: 2023-02-16 09:50:07.638662000 +0100
Birth: 2023-02-10 14:07:14.408836200 +0100
I noticed the same problem when mounting subdirectories of a single share multiple times. The SLES does mount multiple shares, but each share only once.

Kubernetes NFS PV: Lock reclaim failed

Configuration:
NFS server and the k8s cluster(single node cluster) run on two machines and use the same OS and NFS software, as below:
[root#test-2 ~]# yum info nfs-utils
Failed to set locale, defaulting to C
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.tuna.tsinghua.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.huaweicloud.com
Installed Packages
Name : nfs-utils
Arch : x86_64
Epoch : 1
Version : 1.3.0
Release : 0.68.el7
Size : 1.1 M
Repo : installed
From repo : base
Summary : NFS utilities and supporting clients and daemons for the kernel NFS server
URL : http://sourceforge.net/projects/nfs
License : MIT and GPLv2 and GPLv2+ and BSD
Description : The nfs-utils package provides a daemon for the kernel NFS server and
: related tools, which provides a much higher level of performance than the
: traditional Linux NFS server used by most users.
:
: This package also contains the showmount program. Showmount queries the
: mount daemon on a remote host for information about the NFS (Network File
: System) server on the remote host. For example, showmount can display the
: clients which are mounted on that host.
:
: This package also contains the mount.nfs and umount.nfs program.
[root#test-2 ~]# cat /etc/redhat-release
CentOS Linux release 7.7.1908 (Core)
[root#test-2 ~]# uname -a
Linux test-2 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
[root#test-2 ~]# cat /etc/exports
/home/nfs 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check,insecure)
K8S version: v1.17.9
Problems:
The application(a statefulset) running on k8s is using a PV that was dynamically provisioned by the k8s-nfs-provisioner, the PV is actually backed by a directory on remote NFS server. The application is keeping "CrashLoopBackOff" because it hits "input/output error" constantly when writing some data to the PV after only a few seconds of running.
Meanwhile, I saw a lot of errors in /var/log/messages:
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:11:36 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:12:05 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:41 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Dec 2 17:21:42 localhost kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
I took a tcpdump until hit "Lock reclaim failed" in system log, and found there are many NFS errors as below:
NFS4ERR_BADSESSION (10052)
NFS4ERR_STALE_CLIENTID (10022)
NFS4ERR_NO_GRACE (10033)
I'm not sure if they're related to the "lock reclaim failed" or the "input/output" error.
I have encountered this problem on different machines with different machines from time to time and it really annoys me.
Anyone knows the root cause or how to fix it? Big thanks in advance.
Screenshots
application pod log
NFS errors in tcpdump
nfsstate -m output on k8s
nfsstate -c output on k8s, NOTE the high open_noat value.
NFS server configuration (my k8s node is 111.1.30.16)

After suspend guest OS hangs when using vagrant with nfs

Host OS Ubuntu 15.10
Guest OS Ubuntu 14.10
Using Vagrant with nfs and Virtualbox and static ip on the private network.
It is working perfectly except that after having suspended the host OS, the entire guest OS will be unusable.
This does not happen when using the normal virtualbox shared folders.
It's not only the nfs shared folder that is unusable, the entire OS is hanging.
Even syslog does not seem to see much action.
This is syslog on the guest, from waking up until vagrant halt is completed.
Feb 26 07:15:33 vagrant kernel: [ 8375.252989] e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Feb 26 07:16:11 vagrant kernel: [ 8413.109832] nfs: server 192.168.33.1 not responding, still trying
Feb 26 07:16:38 vagrant kernel: [ 8440.687476] nfs: server 192.168.33.1 not responding, still trying
Feb 26 07:17:01 vagrant CRON[3776]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 26 07:20:33 vagrant rsyslogd: [origin software="rsyslogd" swVersion="7.4.4" x-pid="753" x-info="http://www.rsyslog.com"] exiting on signal 15.
How can this be fixed?
How should I debug it?

Is there a gcloud API to detect when a Compute Engine server is completely up?

I create an VM instance. I can connect to it as soon ad the SSH Daemon is started. But this is too early because kernel startup is only at approx. 30%. Is there a gcloud or other API to get the VM state when the kernel has finished startup?
Nov 18 10:58:51 image-name google: No startup script found in metadata.
Nov 18 10:58:53 image-name kernel: [ 27.491829] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.703142] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.735867] aufs au_opts_verify:1570:docker[2414]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.771732] aufs au_opts_verify:1570:docker[2260]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:58:53 image-name kernel: [ 27.797540] device vethfa3ab85 entered promiscuous mode
Nov 18 10:58:53 image-name kernel: [ 27.804420] IPv6: ADDRCONF(NETDEV_UP): vethfa3ab85: link is not ready
Nov 18 10:58:53 image-name kernel: [ 28.028306] IPv6: ADDRCONF(NETDEV_CHANGE): vethfa3ab85: link becomes ready
Nov 18 10:58:53 image-name kernel: [ 28.035505] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.041963] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:58:53 image-name kernel: [ 28.048532] IPv6: ADDRCONF(NETDEV_CHANGE): docker0: link becomes ready
Nov 18 10:58:54 image-name kernel: [ 28.980082] IPv6: eth0: IPv6 duplicate address fe80::42:acff:fe11:1 detected!
->>> about here I can SSH to the server
Nov 18 10:59:08 image-name kernel: [ 43.068094] docker0: port 1(vethfa3ab85) entered forwarding state
Nov 18 10:59:53 image-name kernel: [ 87.944452] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.001012] aufs au_opts_verify:1570:docker[2864]: dirperm1 breaks the protection by the permission bits on the lower branch
Nov 18 10:59:53 image-name kernel: [ 88.049510] aufs au_opts_verify:1570:docker[2815]: dirperm1 breaks the protection by the permission bits on the lower branch
->>> I want to know about this point in the startup process
My problem is that I can connect to it using SSH when kernel progress is below 30% and some processes are not yet started. I want to detect somehow if the server has completed startup. Or is there a script that can push to the server (through the GCE APIs) to notify me when a server is completely up?
gcloud compute instances describe image-name does return the same output from the moment the instance is started till the kernel startup is complete.
(In my case I use the Node.js GCE API, but this should not make any difference.)
Presently I am not aware of any such google native API that can provide a progress of instance start.
However this is a quick workaround check if this fits your requirement.
You can either use the Google Startup script or the native linux rc.local. The concept is the same, so explaining it for the case of rc.local [as it is generic and not tied to google]
We know that the last process in a bootup sequence that runs is rc.local. Any command or script or call that is in this rc.local [which is a sh or bash script by itself] will be executed at the end of boot process.
So the idea would be in the google image in case of rc.local, have a script or a call which send your a notification or writes a output to central system like KV or cloud storage the state that bootup is all done.
Similar to Kamran, but here is how I get this done. It depends on using a google startup script and an image where gcloud is installed by default (though you could rework this to just use curl and API calls)
On instance creation/configuration, I set a custom metadata flag: serverready=False
At the end of my google startup script, I have this:
sudo gcloud compute instances add-metadata $(hostname) \
--metadata serverready=True \
--zone $(curl \
"http://metadata.google.internal/computeMetadata/v1/instance/zone" \
-H "Metadata-Flavor: Google"|cut -d/ -f4)
When I run the instance creation, I can just poll the metadata for the serverready key, and set my app to wait until it sees serverready=True

BLKRASET: Inappropriate ioctl for device

I'm receiving this error
BLKRASET: Inappropriate ioctl for device
when trying to run
sudo blockdev --setra 256 /data
on my Linux server. The server is being used as a MongoDB server and /data is where it stores it's data.
I initially tried to run this command when I received this warning when starting my MongoDB shell:
Wed Mar 20 22:40:49.850 [initandlisten]
Wed Mar 20 22:40:49.850 [initandlisten] ** WARNING: Readahead for
/data/db is set to 2048KB
Wed Mar 20 22:40:49.850 [initandlisten] ** We suggest setting it to
256KB (512 sectors) or less
Wed Mar 20 22:40:49.850 [initandlisten] **
http://dochub.mongodb.org/core/readahead
The blockdev --setra command is supposed to set the readahead value for that directory and resolve the issue but I'm running into this issue
The blockdev command operates on block devices (disks), not directories. You need to pass it the name of the device in /dev/ where your data directory is stored. If you df /data it will tell you which device is currently mounted there. Then you can run blockdev --setra 512 /dev/whatever

Resources