Linux Shell Script: How to detect NFS Mount-point (or the Server) is dead? - linux

Generally on NFS Client, how to detect the Mounted-Point is no more available or DEAD from Server-end, by using the Bash Shell Script?
Normally i do:
if ls '/var/data' 2>&1 | grep 'Stale file handle';
then
echo "failing";
else
echo "ok";
fi
But the problem is, when especially the NFS Server is totally dead or stopped, even the, ls command, into that directory, at Client-side is hanged or died. Means, the script above is no more usable.
Is there any way to detect this again please?

"stat" command is a somewhat cleaner way:
statresult=`stat /my/mountpoint 2>&1 | grep -i "stale"`
if [ "${statresult}" != "" ]; then
#result not empty: mountpoint is stale; remove it
umount -f /my/mountpoint
fi
Additionally, you can use rpcinfo to detect whether the remote nfs share is available:
rpcinfo -t remote.system.net nfs > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo Remote NFS share available.
fi
Added 2013-07-15T14:31:18-05:00:
I looked into this further as I am also working on a script that needs to recognize stale mountpoints. Inspired by one of the replies to "Is there a good way to detect a stale NFS mount", I think the following may be the most reliable way to check for staleness of a specific mountpoint in bash:
read -t1 < <(stat -t "/my/mountpoint")
if [ $? -eq 1 ]; then
echo NFS mount stale. Removing...
umount -f -l /my/mountpoint
fi
"read -t1" construct reliably times out the subshell if stat command hangs for some reason.
Added 2013-07-17T12:03:23-05:00:
Although read -t1 < <(stat -t "/my/mountpoint") works, there doesn't seem to be a way to mute its error output when the mountpoint is stale. Adding > /dev/null 2>&1 either within the subshell, or in the end of the command line breaks it. Using a simple test: if [ -d /path/to/mountpoint ] ; then ... fi also works, and may preferable in scripts. After much testing it is what I ended up using.
Added 2013-07-19T13:51:27-05:00:
A reply to my question "How can I use read timeouts with stat?" provided additional detail about muting the output of stat (or rpcinfo) when the target is not available and the command hangs for a few minutes before it would time out on its own. While [ -d /some/mountpoint ] can be used to detect a stale mountpoint, there is no similar alternative for rpcinfo, and hence use of read -t1 redirection is the best option. The output from the subshell can be muted with 2>&-. Here is an example from CodeMonkey's response:
mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [[ -n "$REPLY" ]]; then
echo "NFS mount stale. Removing..."
umount -f -l "$mountpoint"
fi
Perhaps now this question is fully answered :).

The final answers give by Ville and CodeMonkey are almost correct. I'm not sure how no one noticed this, but a $REPLY string having content is a success, not a failure. Thus, an empty $REPLY string means the mount is stale. Thus, the conditional should use -z, not -n:
mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS mount stale. Removing..."
umount -f -l "$mountpoint"
fi
I have ran this multiple times with a valid and invalid mount point and it works. The -n check gave me reverse results, echoing the mount was stale when it was absolutely valid.
Also, the double bracket isn't necessary for a simple string check.

Building off the answers here, I found some issues in testing that would output bad data due to how the $REPLY var would get updated (or not, if the result was empty), and the inconsistency of the stat command as provided in the answers.
This uses the stat command to check the FS type which responds to changes pretty fast or instant, and checks the contents of $REPLY to make sure the fs is NFS [ ref: https://unix.stackexchange.com/questions/20523/how-to-determine-what-filesystem-a-directory-exists-on ]
read -t1 < <(timeout 1 stat -f -c %T "/mnt/nfsshare/" 2>&-);if [[ ! "${REPLY}" =~ "nfs" ]];then echo "NFS mount NOT WORKING...";fi

Related

loop to test all NFS mount point

this code work very well
mountpoint="/mnt/testnfs"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS mount stale. Removing..."
fi
If I try to put it into a loop for :
declare -a nfs_array=( "/mnt/testnfs1" "/mnt/testnfs2/" )
for i in "${nfs_array[#]}"
do
read -t1 < <(stat -t "$nfs_array" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS dead"
fi
done
Aim is to test all mounts points, this code test and read only the first entries from nfs_array. If I swapped testnfs1 with testnfs2 this code will test testnfs2 mount point and forget testnfs1 :-(
In your loop it should be:
read -r -t1 < <(stat -t "$i" 2>&-)
Currently it's just reading the first array value and $i isn't used.
If you really want to list all nfs mounts (the title tells so), then use either:
mount | grep ' type nfs' | ...
This may have false positives because a mount point or mounted path contains type nfs.
If the /proc/ file system is available, this is a better way:
awk '$3 ~ /^nfs/ {print}' /proc/mounts | ...
Here I'm not sure what happens, if mount point or mounted path contains a space -- I never had this situation.

Linux: Reading the output of readlink /proc/pid/exe within a Bash Script

So I am writing a bash script which will run through all of the process ids in /proc/[pid] and read the executable that was used to run it.
From what I have had a looked at, the /proc filesystem contains the /proc/[pid]/exe symbolic link. Within the bash script I am trying work out how to read the value of "readlink /proc/[pid]/exe" to check if (deleted) or nothing is returned to find out whether the original executable exists on the disk or not.
Is there a way of doing this, so far I have?
#!/bin/bash
pid = "0"
while [ $pid -lt 32769 ]
do
if [-d /proc/$pid]; then
if [-f /proc/$pid/exe]; then
echo $pid
readlink /proc/$pid/exe
fi
fi
pid = $[$pid+1]
done
This fails to work and always returns nothing.I am trying to list all of the processes that no longer have their executables available on disk.
Will this work for you?
#!/bin/bash
for i in $(ls /proc | awk '/^[[:digit:]]+/{print $1}'); do
if [ -h /proc/$i/exe ]; then
echo -n "$i: "
if readlink /proc/$i/exe >/dev/null 2>&1 ; then
echo "executable exists"
else
echo "executable not found"
fi
fi
done
I've updated your script to make it work. Notice that -f checks whether a file name represents a regular file. I would return false for a symbolic link:
pid="0"
while [ $pid -lt 32769 ]
do
if [ -d /proc/$pid ]; then
if [ -h /proc/$pid/exe ]; then
echo $pid
readlink /proc/$pid/exe
fi
fi
pid=$[$pid+1]
done
you can read returned value after any command in shell by printing $? variable:
readlink
echo $?
if link is invalid, $? will be bigger than 0.
however if link exist and actual file is deleted, you can use something like:
ls `readlink somelink`
readlink -f `ls --dereference /proc/$pid/exe`

scp: how to find out that copying was finished

I'm using scp command to copy file from one Linux host to another.
I run scp commend on host1 and copy file from host1 to host2. File is quite big and it takes for some time to copy it.
On host2 file appears immediately as soon as copying was started. I can do everything with this file even if copying is still in progress.
Is there any reliable way to find out if copying was finished or not on host2?
Off the top of my head, you could do something like:
touch tinyfile
scp bigfile tinyfile user#host:
Then when tinyfile appears you know that the transfer of bigfile is complete.
As pointed out in the comments, this assumes that scp will copy the files one by one, in the order specified. If you don't trust it, you could do them one by one explicitly:
scp bigfile user#host:
scp tinyfile user#host:
The disadvantage of this approach is that you would potentially have to authenticate twice. If this were an issue you could use something like ssh-agent.
On sending side (host1) use script like this:
#!/bin/bash
echo 'starting transfer'
scp FILE USER#DST_SERVER:DST_PATH
OUT=$?
if [ $OUT = 0 ]; then
echo 'transfer successful'
touch successful
scp successful USER#DST_SERVER:DST_PATH
else
echo 'transfer faild'
fi
On receiving side (host2) make script like this:
#!/bin/bash
SLEEP_TIME=30
MAX_CNT=10
CNT=0
while [[ ! -e successful && $CNT < $MAX_CNT ]]; do
((CNT++))
sleep($SLEEP_TIME);
done;
if [[ -e successful ]]; then
echo 'successful'
rm successful
# do somethning with FILE
fi
With CNT and MAX_CNT you disable endless loop (in case file successful isn't transferred).
Product MAX_CNT and SLEEP_TIME should be equal or greater expected transfer time. In my example expected transfer time is less than 300 seconds.
A checksum (md5sum, sha256sum ,sha512sum) of the local and remote files would tell you if they're identical.
For the situation where you don't have SSH access to the remote system - like an FTP server - you can download the file after it's uploaded and compare the checksums. I do this for files I send from production scripts at work. Below is a snippet from the script in which I do this.
MD5SRC=$(md5sum $LOCALFILE | cut -c 1-32)
MD5TESTFILE=$(mktemp -p /ramdisk)
curl \
-o $MD5TESTFILE \
-sS \
-u $FTPUSER:$FTPPASS \
ftp://$FTPHOST/$REMOTEFILE
MD5DST=$(md5sum $MD5TESTFILE | cut -c 1-32)
if [ "$MD5SRC" == "$MD5DST" ]
then
echo "+Local and Remote files match!"
else
echo "-Local and Remote files don't match"
fi
if you use inotify-tools,
then the solution will looks like this:
while ! inotifywait -e close $(dirname ${bigfile_fullname}) 2>/dev/null | \
grep -Eo "CLOSE $(basename ${bigfile_fullname})$">/dev/null
do true
done
echo "File ${bigfile_fullname} closed"
After some investigation, and discussion of the problem on other forums I have found one more solution. Maybe it can help somebody.
There is a command "lsof". It lists open files. During copying the file will be opened, so the command
lsof | grep filename
will return non empty result.
So you might want to make a while loop to wait until lsof returns nothing and proceed with your task.
Example:
# provide your file name here
f=<nameOfYourFile>
lsofresult=`lsof | grep $f | wc -l`
while [ $lsofresult != 0 ]; do
echo still copying file $f...
sleep 5
lsofresult=`lsof | grep $f | wc -l`
done; echo copying file $f is finished: `ls $f`
For the duplicate question, How to check if file has been scp 100% to the remote location , which was for an expect script, to know if a file is transferred completely, we can add expect 100% .. .. i.e something like this ...
expect -c "
set timeout 1
spawn scp user#$REMOTE_IP:/tmp/my.file user#$HOST_IP:/home/.
expect yes/no { send yes\r ; exp_continue }
expect password: { send $SCP_PASSWORD\r }
expect 100%
sleep 1
exit
"
if [ -f "/home/my.file" ]; then
echo "Success"
fi
If avoiding a second SSH handshake is important, you can use something like the following:
ssh host cat \> bigfile \&\& touch complete < bigfile
Then wait for the "complete" file to get created on the remote end.

Check if NFS Directory Mounted without Large Hangs on Failure

I'm writing a shell script to check whether or not certain nfs mounts can be seen by nodes in a cluster.
The script works by doing an ls /nfs/"machine" |wc -l and if it's greater than 0 it will pass the test. My main concern with this solution is how long ls will hang if a disk is not mounted.
I had a go at the solution in this question "bash checking directory existence hanging when NFS mount goes down" but the results did not correspond with what was actually mounted.
I also tried doing a df -h /nfs/"machine" but that has a large hang if the disk isn't mounted.
Basically, is there an alternate way which can let me know if a disk is mounted or not without large hangs?
Alternatively, is there a way of restricting the time that a command can be executed for?
Thanks in advance!
Ok I managed to solve this using the timeout command, I checked back here to see that BroSlow updated his answer with a very similar solution. Thank you BroSlow for your help.
To solve the problem, the code I used is:
if [[ `timeout 5s ls /nfs/machine |wc -l` -gt 0 ]] ; then
echo "can see machine"
else
echo "cannot see machine"
fi
I then reduced this to a single line command so that it could be run through ssh and put inside of a loop (to loop through hosts and execute this command).
Couple of possibilities:
1)
find /nfs/machine -maxdepth 0 -empty should be a lot faster than ls /nfs/machine, though I'm not sure that's the problem in this case (also not sure sleep is needed, but might be some offset.
if [[ $(find /nfs/machine -maxdepth 0 -empty 2> /dev/null) == "" ]]; then
sleep 1 && [[ $(mount) == *"/nfs/machine"* ]] && echo "mounted" || echo "not mounted"
else
echo "mounted"
fi
2)
timeout 10 ls -A /nfs/machine | wc -l
if [[ $? > 0 ]]; then
echo "mounted"
else
echo "not mounted"
fi

Bash: Create a file if it does not exist, otherwise check to see if it is writeable

I have a bash program that will write to an output file. This file may or may not exist, but the script must check permissions and fail early. I can't find an elegant way to make this happen. Here's what I have tried.
set +e
touch $file
set -e
if [ $? -ne 0 ]; then exit;fi
I keep set -e on for this script so it fails if there is ever an error on any line. Is there an easier way to do the above script?
Why complicate things?
file=exists_and_writeable
if [ ! -e "$file" ] ; then
touch "$file"
fi
if [ ! -w "$file" ] ; then
echo cannot write to $file
exit 1
fi
Or, more concisely,
( [ -e "$file" ] || touch "$file" ) && [ ! -w "$file" ] && echo cannot write to $file && exit 1
Rather than check $? on a different line, check the return value immediately like this:
touch file || exit
As long as your umask doesn't restrict the write bit from being set, you can just rely on the return value of touch
You can use -w to check if a file is writable (search for it in the bash man page).
if [[ ! -w $file ]]; then exit; fi
Why must the script fail early? By separating the writable test and the file open() you introduce a race condition. Instead, why not try to open (truncate/append) the file for writing, and deal with the error if it occurs? Something like:
$ echo foo > output.txt
$ if [ $? -ne 0 ]; then die("Couldn't echo foo")
As others mention, the "noclobber" option might be useful if you want to avoid overwriting existing files.
Open the file for writing. In the shell, this is done with an output redirection. You can redirect the shell's standard output by putting the redirection on the exec built-in with no argument.
set -e
exec >shell.out # exit if shell.out can't be opened
echo "This will appear in shell.out"
Make sure you haven't set the noclobber option (which is useful interactively but often unusable in scripts). Use > if you want to truncate the file if it exists, and >> if you want to append instead.
If you only want to test permissions, you can run : >foo.out to create the file (or truncate it if it exists).
If you only want some commands to write to the file, open it on some other descriptor, then redirect as needed.
set -e
exec 3>foo.out
echo "This will appear on the standard output"
echo >&3 "This will appear in foo.out"
echo "This will appear both on standard output and in foo.out" | tee /dev/fd/3
(/dev/fd is not supported everywhere; it's available at least on Linux, *BSD, Solaris and Cygwin.)

Resources