this code work very well
mountpoint="/mnt/testnfs"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS mount stale. Removing..."
fi
If I try to put it into a loop for :
declare -a nfs_array=( "/mnt/testnfs1" "/mnt/testnfs2/" )
for i in "${nfs_array[#]}"
do
read -t1 < <(stat -t "$nfs_array" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS dead"
fi
done
Aim is to test all mounts points, this code test and read only the first entries from nfs_array. If I swapped testnfs1 with testnfs2 this code will test testnfs2 mount point and forget testnfs1 :-(
In your loop it should be:
read -r -t1 < <(stat -t "$i" 2>&-)
Currently it's just reading the first array value and $i isn't used.
If you really want to list all nfs mounts (the title tells so), then use either:
mount | grep ' type nfs' | ...
This may have false positives because a mount point or mounted path contains type nfs.
If the /proc/ file system is available, this is a better way:
awk '$3 ~ /^nfs/ {print}' /proc/mounts | ...
Here I'm not sure what happens, if mount point or mounted path contains a space -- I never had this situation.
Related
Suppose I have an iSCSI device /dev/sdat, how do I know the IP address of it's target?
The target driver is SCST, and the initiator is iSCSI. All I know is a device named /dev/sdat and nothing more. So how to get the IP address of it's target?
Well, I'm not proud of this, but it gets the job done. At least for some definitions of getting the job done.
The basic idea is this. You can get the target IQN from the output of lsscsi -t. (You'll need the lsscsi program if you don't already have it. I think you'll find it's essential in any kind of SCSI environment.)
# lsscsi -t
[2:0:0:0] disk iqn.2009-12.com.blockbridge:t-pjxfzufjkp-illoghjk,t,0x1 /dev/sda
[3:0:0:0] disk iqn.2009-12.com.blockbridge:t-pjxfzuecga-eajejghg,t,0x1 /dev/sdb
[4:0:0:0] disk iqn.2009-12.com.blockbridge:t-pjxfzufjjo-pokqaja,t,0x1 /dev/sdd
[5:0:0:0] disk iqn.2009-12.com.blockbridge:t-pjxfzufnfg-cqikkgl,t,0x1 /dev/sdc
Then, you can feed the target IQN into iscsiadm and grep around in the output for the target address.
# iscsiadm -m node -T iqn.2009-12.com.blockbridge:t-pjxfzufjkp-illoghjk | egrep 'node.conn.+address'
node.conn[0].address = 172.16.5.148
Putting it all together, you get a script like this. Of course, this is absent all kinds of error handling, and probably doesn't handle about 23 different cases. But, hey... It works in my environment!
#!/usr/bin/bash
if [[ -z $1 ]]; then
>&2 echo "Usage: devip.sh <device>"
exit 1
fi
iqn=$(sudo lsscsi -t | grep "$1" | grep iqn | awk '{print $3}' | awk -F , '{print $1}')
if [[ -z "$iqn" ]]; then
>&2 echo "IQN not found for \"$1\"."
exit 1
fi
sudo iscsiadm -m node -T $iqn | egrep 'node.conn.+address' | awk -F ' *= *' '{print $2}'
exit $?
EDIT: Working script below
I have used this site MANY times to get answers, but I am a little stumped with this.
I am tasked with writing a script, in bash, to log into roughly 2000 Unix servers (Solaris, AIX, Linux) and check the size of OS filesystems, most notable /var /usr /opt.
I have set some variables, which may be where I am going wrong right off the bat.
1.) First I am connecting to another server that has a list of all hosts in the infrastructure. Then I parse this data with some sed commands to get a list I can use properly
1.) Then I do a ping test, to see if the server is alive. If the server is decom. The idea behind this, is if the server is not pingable, I don't want it being reported on, or any attempt to be made to connect to it, as it is just wasting time. I feel I am doing this wrong, but don't know how to do it corectly (a re-occurring theme you will here in this post lol)
If any FS is over 80% mark, then it should output to a text file with the servername, filesystem, size on one line <== very important for me
If the FS is under 80% full, then I don't want it in my output, it can me omitted completely.
I have created something that I will post below, and am hoping to get some help in figuring out where I am going wrong. I am very new to bash scripting, but have experience as a Unix admin (i have never been good at scripting).
Can anyone provide some direction and teach me where I am going wrong?
I will upload my script that i can confirm is working hopefully tomorrow. thanks everyone for your input in this!
Here is my "disk usage" linux script, i hope that help you.
#!/bin/sh
df -H | awk '{ print $5 " " $6 }' | while read output;
do
echo $output
usep=$(echo $output | awk '{ print $1}' | cut -d'%' -f1 )
partition=$(echo $output | awk '{ print $2 }' )
if [ $usep -ge 90 ]; then
echo "Running out of space \"$partition ($usep%)\" on $(hostname) as on $(date)" |
mail -s "Warning! There is no space on the disk: $usep%" root#domain.com
fi
done
Some trouble is here:
ping -c 1 -W 3 $i > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "$i is offline" >> $LOG
fi
You need a continue statement inside that if. Your program isn't really treating non-pingable hosts differently, just logging they're not pingable.
Okay, now I'm looking a little deeper, and there's more naive stuff in here. These shouldn't work:
SOLVARFS=$(df -h /var |cut -f5 |grep -v capacity |awk '{print $5}')
SOLUSRFS=$(df -h /usr |cut -f5 |grep -v capacity |awk '{print $5}')
SOLOPTFS=$(df -h /opt |cut -f5 |grep -v capacity |awk '{print $5}')
etc...
The problem with these lines is, the command substitution gets assigned to the variables before the ssh session happens. So the content of each variable is the command's result on your local system, not the command itself. Since you're doing command substitution around your ssh calls, it might well work just to rewrite these lines as (note the backslash escapes on $5):
SOLVARFS="df -h /var |cut -f5 |grep -v capacity |awk '{print \$5}'"
SOLUSRFS="df -h /usr |cut -f5 |grep -v capacity |awk '{print \$5}'"
SOLOPTFS="df -h /opt |cut -f5 |grep -v capacity |awk '{print \$5}'"
etc...
The part where you're contacting another server has some more stuff to correct. You don't need three if statements per server, and there's no reason to echo anything to /dev/null. Here's a rewrite for the SunOS section. For each directory you're checking, it outputs the host name, the command name (so you can see which dir was being checked), and the result:
if [[ $UNAME = "SunOS" ]]; then
for SSH_COMMAND in SOLVARFS SOLUSRFS SOLOPTFS ; do
RESULT=`ssh -o PasswordAuthentication=no -o BatchMode=yes -o StrictHostKeyChecking=no -o ConnectTimeout=2 GSSAPIAuthentication=no -q $i ${!SSH_COMMAND}`
if ["$RESULT" -gt 80] ; do
echo "$i, $SSH_COMMAND, $RESULT" >> $LOG
fi
done
fi
Note that the ${!BLAH} construction is variable indirection. "Give me the contents of the variable named by BLAH".
Your original script does a bunch of things less-than-optimally. Rather than running an almost-identical block of code for each filesystem and each operating system, the thing to do would be to record the differences in a way that a SINGLE piece of code can iterate over all your objects, adapting as required.
Here's my take on this. Commands should appear ONCE, but
they get run multiple times by loops, and
they get run multiple ways using arrays.
The following script passes lint checks, but obviously this is untested, as I don't have your environment to test in.
You might still want to think about how your logging and notifications work.
#!/bin/bash
# Assign temp file, remove it automatically upon successful exit.
tmpfile=$(mktemp /tmp/${0##*/}.XXXX)
trap "rm '$tmpfile'" 0
#NOW=$(date +"%Y-%m-%d-%T")
NOW=$(date +"%F")
LOG=/usr/scripts/disk_usage/Unix_df_issues-$NOW.txt
printf '' > "$LOG"
# Use variables to refer to commonly accessed files. If you change a name, just do it once.
rawhostlist=all_vms.txt
host_os=${rawhostlist}_OS
# Commonly-used options need only be declared once. Use an array for easier management.
declare -a ssh_opts=()
ssh_opts+=(-o PasswordAuthentication=no)
ssh_opts+=(-o BatchMode=yes)
ssh_opts+=(-o StrictHostKeyChecking=no) # Eliminate prompts on new hosts
ssh_opts+=(-o ConnectTimeout=2) # This should make your `ping` unnecessary.
ssh_opts+=(-o GSSAPIAuthentication=no) # This is default. Do we really need it?
# Note: Associative arrays require Bash 4.x.
declare -A df_opts=(
[SunOS]="-h"
[Linux]="-hP"
[AIX]=""
)
declare -A df_column=(
[SunOS]=5
[Linux]=5
[AIX]=4
)
# Fetch host list from configserver, stripping /^adm/ on the remote end.
ssh "${ssh_opts[#]}" -q configserver "sed 's/^adm//' /reports/*/HOSTNAME" > "$rawhostlist"
# Confirm that our host_os cache is up to date and process any missing hosts.
awk '
NR==FNR { h[$1]; next } # Add everything in rawhostlist to an array...
{ delete h[$1] } # Then remove any entries that exist in host_os.
END {
for (i in h) print i # And print whatever remains.
}' "$rawhostlist" "$host_os" |
while read h; do
printf '%s\t%s\n' "$h" $(ssh "$h" "${ssh_opts[#]}" -q uname -s)
done >> "$host_os"
# Next, step through the host list and collect data.
while read host os; do
ssh "${ssh_opts[#]}" "$host" df "${df_opts[$os]}" /var /usr /opt |
awk -v column="${df_column[$os]}" -v host="$host" 'NR>1 { print host,$1,$column }'
)
done < "$host_os" > "$tmpfile"
# Now that we have all our data, check for warning/critical levels.
while read host filesystem usage; do
if [ "$usage" -gt 80 ]; then
status="CRITICAL"
elif [ "$usage" -gt 70 ]; then
status="WARNING"
else
continue
fi
# Log our results to our log file, AND send them to stderr.
printf "[%s] %s: %s:%s at %d%%\n" "$(date +"%F %T")" "$status" "$host" "$filesystem" "$usage" | tee -a "$LOG" >&2
done < "$tmpfile"
# Email and record our results.
if [ -s "$LOG" ]; then
mail -s "Daily Unix /var Report - $NOW" unixsystems#examplle.com < "$LOG"
mv "$LOG" /var/log/vm_reports/
fi
Consider this example code. If you like the way it looks, your next task is to debug it, or open new questions for parts that you're having trouble debugging. :-)
So I am writing a bash script which will run through all of the process ids in /proc/[pid] and read the executable that was used to run it.
From what I have had a looked at, the /proc filesystem contains the /proc/[pid]/exe symbolic link. Within the bash script I am trying work out how to read the value of "readlink /proc/[pid]/exe" to check if (deleted) or nothing is returned to find out whether the original executable exists on the disk or not.
Is there a way of doing this, so far I have?
#!/bin/bash
pid = "0"
while [ $pid -lt 32769 ]
do
if [-d /proc/$pid]; then
if [-f /proc/$pid/exe]; then
echo $pid
readlink /proc/$pid/exe
fi
fi
pid = $[$pid+1]
done
This fails to work and always returns nothing.I am trying to list all of the processes that no longer have their executables available on disk.
Will this work for you?
#!/bin/bash
for i in $(ls /proc | awk '/^[[:digit:]]+/{print $1}'); do
if [ -h /proc/$i/exe ]; then
echo -n "$i: "
if readlink /proc/$i/exe >/dev/null 2>&1 ; then
echo "executable exists"
else
echo "executable not found"
fi
fi
done
I've updated your script to make it work. Notice that -f checks whether a file name represents a regular file. I would return false for a symbolic link:
pid="0"
while [ $pid -lt 32769 ]
do
if [ -d /proc/$pid ]; then
if [ -h /proc/$pid/exe ]; then
echo $pid
readlink /proc/$pid/exe
fi
fi
pid=$[$pid+1]
done
you can read returned value after any command in shell by printing $? variable:
readlink
echo $?
if link is invalid, $? will be bigger than 0.
however if link exist and actual file is deleted, you can use something like:
ls `readlink somelink`
readlink -f `ls --dereference /proc/$pid/exe`
Generally on NFS Client, how to detect the Mounted-Point is no more available or DEAD from Server-end, by using the Bash Shell Script?
Normally i do:
if ls '/var/data' 2>&1 | grep 'Stale file handle';
then
echo "failing";
else
echo "ok";
fi
But the problem is, when especially the NFS Server is totally dead or stopped, even the, ls command, into that directory, at Client-side is hanged or died. Means, the script above is no more usable.
Is there any way to detect this again please?
"stat" command is a somewhat cleaner way:
statresult=`stat /my/mountpoint 2>&1 | grep -i "stale"`
if [ "${statresult}" != "" ]; then
#result not empty: mountpoint is stale; remove it
umount -f /my/mountpoint
fi
Additionally, you can use rpcinfo to detect whether the remote nfs share is available:
rpcinfo -t remote.system.net nfs > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo Remote NFS share available.
fi
Added 2013-07-15T14:31:18-05:00:
I looked into this further as I am also working on a script that needs to recognize stale mountpoints. Inspired by one of the replies to "Is there a good way to detect a stale NFS mount", I think the following may be the most reliable way to check for staleness of a specific mountpoint in bash:
read -t1 < <(stat -t "/my/mountpoint")
if [ $? -eq 1 ]; then
echo NFS mount stale. Removing...
umount -f -l /my/mountpoint
fi
"read -t1" construct reliably times out the subshell if stat command hangs for some reason.
Added 2013-07-17T12:03:23-05:00:
Although read -t1 < <(stat -t "/my/mountpoint") works, there doesn't seem to be a way to mute its error output when the mountpoint is stale. Adding > /dev/null 2>&1 either within the subshell, or in the end of the command line breaks it. Using a simple test: if [ -d /path/to/mountpoint ] ; then ... fi also works, and may preferable in scripts. After much testing it is what I ended up using.
Added 2013-07-19T13:51:27-05:00:
A reply to my question "How can I use read timeouts with stat?" provided additional detail about muting the output of stat (or rpcinfo) when the target is not available and the command hangs for a few minutes before it would time out on its own. While [ -d /some/mountpoint ] can be used to detect a stale mountpoint, there is no similar alternative for rpcinfo, and hence use of read -t1 redirection is the best option. The output from the subshell can be muted with 2>&-. Here is an example from CodeMonkey's response:
mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [[ -n "$REPLY" ]]; then
echo "NFS mount stale. Removing..."
umount -f -l "$mountpoint"
fi
Perhaps now this question is fully answered :).
The final answers give by Ville and CodeMonkey are almost correct. I'm not sure how no one noticed this, but a $REPLY string having content is a success, not a failure. Thus, an empty $REPLY string means the mount is stale. Thus, the conditional should use -z, not -n:
mountpoint="/my/mountpoint"
read -t1 < <(stat -t "$mountpoint" 2>&-)
if [ -z "$REPLY" ] ; then
echo "NFS mount stale. Removing..."
umount -f -l "$mountpoint"
fi
I have ran this multiple times with a valid and invalid mount point and it works. The -n check gave me reverse results, echoing the mount was stale when it was absolutely valid.
Also, the double bracket isn't necessary for a simple string check.
Building off the answers here, I found some issues in testing that would output bad data due to how the $REPLY var would get updated (or not, if the result was empty), and the inconsistency of the stat command as provided in the answers.
This uses the stat command to check the FS type which responds to changes pretty fast or instant, and checks the contents of $REPLY to make sure the fs is NFS [ ref: https://unix.stackexchange.com/questions/20523/how-to-determine-what-filesystem-a-directory-exists-on ]
read -t1 < <(timeout 1 stat -f -c %T "/mnt/nfsshare/" 2>&-);if [[ ! "${REPLY}" =~ "nfs" ]];then echo "NFS mount NOT WORKING...";fi
I am writing a small little script to clear space on my linux everyday via CRON if the cache directory grows too large.
Since I am really green at bash scripting, I will need a little bit of help from you linux gurus out there.
Here is basically the logic (pseudo-code)
if ( Drive Space Left < 5GB )
{
change directory to '/home/user/lotsa_cache_files/'
if ( current working directory = '/home/user/lotsa_cache_files/')
{
delete files in /home/user/lotsa_cache_files/
}
}
Getting drive space left
I plan to get the drive space left from the '/dev/sda5' command.
If returns the following value to me for your info :
Filesystem 1K-blocks Used Available Use% Mounted on<br>
/dev/sda5 225981844 202987200 11330252 95% /
So a little regex might be necessary to get the '11330252' out of the returned value
A little paranoia
The 'if ( current working directory = /home/user/lotsa_cache_files/)' part is just a defensive mechanism for the paranoia within me. I wanna make sure that I am indeed in '/home/user/lotsa_cache_files/' before I proceed with the delete command which is potentially destructive if the current working directory is not present for some reason.
Deleting files
The deletion of files will be done with the command below instead of the usual rm -f:
find . -name "*" -print | xargs rm
This is due to the inherent inability of linux systems to 'rm' a directory if it contains too many files, as I have learnt in the past.
Just another proposal (comments within code):
FILESYSTEM=/dev/sda1 # or whatever filesystem to monitor
CAPACITY=95 # delete if FS is over 95% of usage
CACHEDIR=/home/user/lotsa_cache_files/
# Proceed if filesystem capacity is over than the value of CAPACITY (using df POSIX syntax)
# using [ instead of [[ for better error handling.
if [ $(df -P $FILESYSTEM | awk '{ gsub("%",""); capacity = $5 }; END { print capacity }') -gt $CAPACITY ]
then
# lets do some secure removal (if $CACHEDIR is empty or is not a directory find will exit
# with error which is quite safe for missruns.):
find "$CACHEDIR" --maxdepth 1 --type f -exec rm -f {} \;
# remove "maxdepth and type" if you want to do a recursive removal of files and dirs
find "$CACHEDIR" -exec rm -f {} \;
fi
Call the script from crontab to do scheduled cleanings
I would do it this way:
# get the available space left on the device
size=$(df -k /dev/sda5 | tail -1 | awk '{print $4}')
# check if the available space is smaller than 5GB (5000000kB)
if (($size<5000000)); then
# find all files under /home/user/lotsa_cache_files and delete them
find /home/user/lotsa_cache_files -name "*" -delete
fi
Here's the script I use to delete old files in a directory to free up space...
#!/bin/bash
#
# prune_dir - prune directory by deleting files if we are low on space
#
DIR=$1
CAPACITY_LIMIT=$2
if [ "$DIR" == "" ]
then
echo "ERROR: directory not specified"
exit 1
fi
if ! cd $DIR
then
echo "ERROR: unable to chdir to directory '$DIR'"
exit 2
fi
if [ "$CAPACITY_LIMIT" == "" ]
then
CAPACITY_LIMIT=95 # default limit
fi
CAPACITY=$(df -k . | awk '{gsub("%",""); capacity=$5}; END {print capacity}')
if [ $CAPACITY -gt $CAPACITY_LIMIT ]
then
#
# Get list of files, oldest first.
# Delete the oldest files until
# we are below the limit. Just
# delete regular files, ignore directories.
#
ls -rt | while read FILE
do
if [ -f $FILE ]
then
if rm -f $FILE
then
echo "Deleted $FILE"
CAPACITY=$(df -k . | awk '{gsub("%",""); capacity=$5}; END {print capacity}')
if [ $CAPACITY -le $CAPACITY_LIMIT ]
then
# we're below the limit, so stop deleting
exit
fi
fi
fi
done
fi
To detect the occupation of a filesystem, I use this :
df -k $FILESYSTEM | tail -1 | awk '{print $5}'
that gives me the occupation percentage of the filesystem, this way, I don't need to compute it :)
If you use bash, you can use the pushd/popd operation to change directory and be sure to be in.
pushd '/home/user/lotsa_cache_files/'
do the stuff
popd
Here's what I do:
while read f; do rm -rf ${f}; done < movies-to-delete.txt