flock and NFS -- what happens upon unexpected shutdown? - linux

I am using flock within an HPC application on a file system shared among many machines via NFS. Locking works fine as long as all machines behave as expected (Quote from http://en.wikipedia.org/wiki/File_locking: "Kernel 2.6.12 and above implement flock calls on NFS files using POSIX byte-range locks. These locks will be visible to other NFS clients that implement fcntl-style POSIX locks").
I would like to know what is expected to happen if one of the machines that has acquired a certain lock unexpectedly shuts down, e.g. due to a power outage. I am not sure where to look this up. My guess is that this is entirely up to NFS and its way to deal with NFS handles of non-responsive machines. I could imagine that the other clients will still see the lock until a timeout occurs and the NFS server declares all NFS handles of the machine that timed out as invalid. Is that correct? What would that timeout be? What happens if the machine comes up again within the timeout? Can you recommend a definite reference to look all of this up?
Thanks!

When you use NFS v4 (!) the file will be unlocked when the server hasn't heard from the client for a certain amount of time. This lease period defaults to 90s.

There is a good explanation in the O'Reilly book about NFS and NIS, chapter 11.2. To sum up quickly: As NFS is stateless, the server has no way of knowing the client has crashed. The client is responsible for clearing the lock after it reboots.

Related

High system load due to .vscode-server processes

on our Linux servers we observe quite some .vscode-server processes (basically $PREFIX/.vscode-server/bin/$ID/node) from developers using the vscode-remote-ssh extension. Unfortunately these processes put considerable load onto the systems, due to waiting for I/O (state "D"/uninterruptible sleep).
All affected filesystems are NFS (v3 and v4.0) mounted shares. There's nothing we can do on the fileserver end.
Why exactly do these processes require so much I/O? The .vscode-server processes sometimes generate more load than some of the data processings on these servers.
Is this a known problem of vscode-remote-ssh and/or is there a way to solve or work around this I/O problem?

What happen when I lock file located on remote storage via fcntl?

I just wonder. I have two processes on two different servers.
Those processes write information to the same file and use locking via fcntl for synchronization. What happen if one of processes will be aborted and it owned file lock? How NFS server will be notified that this process died?
Read http://man7.org/linux/man-pages/man2/fcntl.2.html
Record locking and NFS Before Linux 3.12, if an NFSv4 client loses
contact with the server for a period of time (defined as more than 90
seconds with no communication), it might lose and regain a lock
without ever being aware of the fact. (The period of time after which
contact is assumed lost is known as the NFSv4 leasetime. On a Linux
NFS server, this can be determined by looking at
/proc/fs/nfsd/nfsv4leasetime, which expresses the period in seconds.
The default value for this file is 90.) This scenario potentially
risks data corruption, since another process might acquire a lock in
the intervening period and perform file I/O.
Since Linux 3.12, if an NFSv4 client loses contact with the server,any
I/O to the file by a process which "thinks" it holds a lock will fail
until that process closes and reopens the file. A kernel parameter,
nfs.recover_lost_locks, can be set to 1 to obtain the pre-3.12
behavior, whereby the client will attempt to recover lost locks when
contact is reestablished with the server.Because of the attendant risk
of data corruption, this parameter defaults to 0 (disabled).
If process terminates then all locks hold by process will be released.
I think this the answer you were expected

mmap file shared via nfs?

Scenario A:
To share a read/write block of memory between two processes running on the same host, Joe mmaps the same local file from both processes.
Scenario B:
To share a read/write block of memory between two processes running on two different hosts, Joe shares a file via nfs between the hosts, and then mmaps the shared file from both processes.
Has anyone tried Scenario B? What are the extra problems that arise in Scenario B that do not apply to Scenario A?.
Mmap will not share data without some additional actions.
If you change data in mmaped part of file, changes will be stored only in memory. They will not be flushed to the filesystem (local or remote) until msync or munmap or close or even decision of OS kernel and its FS.
When using NFS, locking and storing of data will be slower than if using local FS. Timeouts of flushing and time of file operations will vary too.
On the sister site people says that NFS may have poor caching policy, so there will be much more I/O requests to the NFS server comparing I/O request count to local FS.
You will need byte-range-lock for correct behavior. They are available in NFS >= v4.0.
I'd say scenario B has all kinds of problems (assuming it works as suggested in the comments). The most obvious is the standards concurrency issues - 2 processes sharing 1 resource with no form of locking etc. That could lead to problems... Not sure whether NFS has its own peculiar quirks in this regard or not.
Assuming you can get around the concurrency issues somehow, you are now reliant on maintaining a stable (and speedy) network connection. Obviously if the network drops out, you might miss some changes. Whether this matters depends on your architecture.
My thought is it sounds like an easy way to share a block of memory on different machines, but I can't say I've heard of it being done which makes me think it isn't so good. When I think sharing data between procs, I think DBs, messaging or a dedicated server. In this case if you made one proc the master (to handle concurrency and owning the concept -i.e. whatever this guy says is the best copy of the data) it might work...

Track changes in nfs / sync nfs over multiple datacenters

We have two datacenters, each with a number of Linux servers that share a large EMC-based nfs.
The challenge is to keep the two nfs' in sync. For the moment assume that writes will only occur to nfs1, which then has to propagate the changes to nfs2.
Periodic generic rsyncs have proved too slow - each rsync takes several hours to complete, even with -az. We need to do specific syncs when a file or directory actually changes.
So then the problem is, how do we know when a file or directory has changed? inotify is the obvious answer, but it famously does not work with nfs. (There is some chatter about inotify possibly working if it is installed on the nfs server, but that isn't an option for us - we only have control of the clients, not the server.)
Does the linux nfs client allow you to capture all the changes it sends to the server, in a logfile or otherwise? Or could we hack the client to do this? We could then collect the changes from each client and periodically kick off targeted rsyncs.
Any other ideas welcome. Thanks!
If you need to keep the 2 EMC servers in sync, it might be bettering to look into EMC specific mirroring capabilities to achieve this. Typically these are block-based updates for high performance and low bandwidth utilization. For example, using SnapMirror on NetApp could achieve this. I'm not as familiar with EMC but a quick google search revealed EMC MirrorView or EMC SRDF as possible options.

How can I manage use of a shared resource used by several Perl programs?

I am looking for a good way to manage the access to an external FTP server from various programs on a single server.
Currently I am working with a lock file, so that only one process can use the ftp server at a time. What would be a good way to allow 2-3 parallel processes to access the ftp server simultaneously. Unfortunately the provider does not allow more sessions and locks my account for a day if too many processes access their server.
Used platforms are Solaris and Linux - all ftp access is encapsulated in a single library thus there is only 1 function which I need to change. Would be nice if there is something on CPAN.
I'd look into perlipc(1) for SystemV semaphores or modules like POSIX::RT::Semaphore for posix semaphores. I'd create a semaphore with a resource count of 2-3, and then in the different process try to get the semaphore.
Instead of making a bunch of programs wait in line, could you create one local program that handled all the remote communication while the local programs talked to it? You effectively create a proxy and push that complexity away from your programs so you don't have to deal with it in every program.
I don't know the other constraints on your problem, but this has worked for me on similar issues.

Resources