EDIT: Working script below
I have used this site MANY times to get answers, but I am a little stumped with this.
I am tasked with writing a script, in bash, to log into roughly 2000 Unix servers (Solaris, AIX, Linux) and check the size of OS filesystems, most notable /var /usr /opt.
I have set some variables, which may be where I am going wrong right off the bat.
1.) First I am connecting to another server that has a list of all hosts in the infrastructure. Then I parse this data with some sed commands to get a list I can use properly
1.) Then I do a ping test, to see if the server is alive. If the server is decom. The idea behind this, is if the server is not pingable, I don't want it being reported on, or any attempt to be made to connect to it, as it is just wasting time. I feel I am doing this wrong, but don't know how to do it corectly (a re-occurring theme you will here in this post lol)
If any FS is over 80% mark, then it should output to a text file with the servername, filesystem, size on one line <== very important for me
If the FS is under 80% full, then I don't want it in my output, it can me omitted completely.
I have created something that I will post below, and am hoping to get some help in figuring out where I am going wrong. I am very new to bash scripting, but have experience as a Unix admin (i have never been good at scripting).
Can anyone provide some direction and teach me where I am going wrong?
I will upload my script that i can confirm is working hopefully tomorrow. thanks everyone for your input in this!
Here is my "disk usage" linux script, i hope that help you.
#!/bin/sh
df -H | awk '{ print $5 " " $6 }' | while read output;
do
echo $output
usep=$(echo $output | awk '{ print $1}' | cut -d'%' -f1 )
partition=$(echo $output | awk '{ print $2 }' )
if [ $usep -ge 90 ]; then
echo "Running out of space \"$partition ($usep%)\" on $(hostname) as on $(date)" |
mail -s "Warning! There is no space on the disk: $usep%" root#domain.com
fi
done
Some trouble is here:
ping -c 1 -W 3 $i > /dev/null 2>&1
if [ $? -ne 0 ]; then
echo "$i is offline" >> $LOG
fi
You need a continue statement inside that if. Your program isn't really treating non-pingable hosts differently, just logging they're not pingable.
Okay, now I'm looking a little deeper, and there's more naive stuff in here. These shouldn't work:
SOLVARFS=$(df -h /var |cut -f5 |grep -v capacity |awk '{print $5}')
SOLUSRFS=$(df -h /usr |cut -f5 |grep -v capacity |awk '{print $5}')
SOLOPTFS=$(df -h /opt |cut -f5 |grep -v capacity |awk '{print $5}')
etc...
The problem with these lines is, the command substitution gets assigned to the variables before the ssh session happens. So the content of each variable is the command's result on your local system, not the command itself. Since you're doing command substitution around your ssh calls, it might well work just to rewrite these lines as (note the backslash escapes on $5):
SOLVARFS="df -h /var |cut -f5 |grep -v capacity |awk '{print \$5}'"
SOLUSRFS="df -h /usr |cut -f5 |grep -v capacity |awk '{print \$5}'"
SOLOPTFS="df -h /opt |cut -f5 |grep -v capacity |awk '{print \$5}'"
etc...
The part where you're contacting another server has some more stuff to correct. You don't need three if statements per server, and there's no reason to echo anything to /dev/null. Here's a rewrite for the SunOS section. For each directory you're checking, it outputs the host name, the command name (so you can see which dir was being checked), and the result:
if [[ $UNAME = "SunOS" ]]; then
for SSH_COMMAND in SOLVARFS SOLUSRFS SOLOPTFS ; do
RESULT=`ssh -o PasswordAuthentication=no -o BatchMode=yes -o StrictHostKeyChecking=no -o ConnectTimeout=2 GSSAPIAuthentication=no -q $i ${!SSH_COMMAND}`
if ["$RESULT" -gt 80] ; do
echo "$i, $SSH_COMMAND, $RESULT" >> $LOG
fi
done
fi
Note that the ${!BLAH} construction is variable indirection. "Give me the contents of the variable named by BLAH".
Your original script does a bunch of things less-than-optimally. Rather than running an almost-identical block of code for each filesystem and each operating system, the thing to do would be to record the differences in a way that a SINGLE piece of code can iterate over all your objects, adapting as required.
Here's my take on this. Commands should appear ONCE, but
they get run multiple times by loops, and
they get run multiple ways using arrays.
The following script passes lint checks, but obviously this is untested, as I don't have your environment to test in.
You might still want to think about how your logging and notifications work.
#!/bin/bash
# Assign temp file, remove it automatically upon successful exit.
tmpfile=$(mktemp /tmp/${0##*/}.XXXX)
trap "rm '$tmpfile'" 0
#NOW=$(date +"%Y-%m-%d-%T")
NOW=$(date +"%F")
LOG=/usr/scripts/disk_usage/Unix_df_issues-$NOW.txt
printf '' > "$LOG"
# Use variables to refer to commonly accessed files. If you change a name, just do it once.
rawhostlist=all_vms.txt
host_os=${rawhostlist}_OS
# Commonly-used options need only be declared once. Use an array for easier management.
declare -a ssh_opts=()
ssh_opts+=(-o PasswordAuthentication=no)
ssh_opts+=(-o BatchMode=yes)
ssh_opts+=(-o StrictHostKeyChecking=no) # Eliminate prompts on new hosts
ssh_opts+=(-o ConnectTimeout=2) # This should make your `ping` unnecessary.
ssh_opts+=(-o GSSAPIAuthentication=no) # This is default. Do we really need it?
# Note: Associative arrays require Bash 4.x.
declare -A df_opts=(
[SunOS]="-h"
[Linux]="-hP"
[AIX]=""
)
declare -A df_column=(
[SunOS]=5
[Linux]=5
[AIX]=4
)
# Fetch host list from configserver, stripping /^adm/ on the remote end.
ssh "${ssh_opts[#]}" -q configserver "sed 's/^adm//' /reports/*/HOSTNAME" > "$rawhostlist"
# Confirm that our host_os cache is up to date and process any missing hosts.
awk '
NR==FNR { h[$1]; next } # Add everything in rawhostlist to an array...
{ delete h[$1] } # Then remove any entries that exist in host_os.
END {
for (i in h) print i # And print whatever remains.
}' "$rawhostlist" "$host_os" |
while read h; do
printf '%s\t%s\n' "$h" $(ssh "$h" "${ssh_opts[#]}" -q uname -s)
done >> "$host_os"
# Next, step through the host list and collect data.
while read host os; do
ssh "${ssh_opts[#]}" "$host" df "${df_opts[$os]}" /var /usr /opt |
awk -v column="${df_column[$os]}" -v host="$host" 'NR>1 { print host,$1,$column }'
)
done < "$host_os" > "$tmpfile"
# Now that we have all our data, check for warning/critical levels.
while read host filesystem usage; do
if [ "$usage" -gt 80 ]; then
status="CRITICAL"
elif [ "$usage" -gt 70 ]; then
status="WARNING"
else
continue
fi
# Log our results to our log file, AND send them to stderr.
printf "[%s] %s: %s:%s at %d%%\n" "$(date +"%F %T")" "$status" "$host" "$filesystem" "$usage" | tee -a "$LOG" >&2
done < "$tmpfile"
# Email and record our results.
if [ -s "$LOG" ]; then
mail -s "Daily Unix /var Report - $NOW" unixsystems#examplle.com < "$LOG"
mv "$LOG" /var/log/vm_reports/
fi
Consider this example code. If you like the way it looks, your next task is to debug it, or open new questions for parts that you're having trouble debugging. :-)
Related
I am working on a bash script that uses pssh to run external commands, then join the output of the commands with the IP of each server. pssh has an option -o that writes a file for each server into a specified directory, but if the commands do not run, you just have an empty file. What I am having issues with is updating these empty files with something like "Server Unreachable" so that I know there was a connection issue reaching the server and to not cause problems with the rest of the script.
Here is what I have so far:
#!/bin/bash
file="/home/user/tools/test-host"
now=$(date +"%F")
folder="./cnxhwinfo-$now/"
empty="$(find ./cnxhwinfo-$now/ -maxdepth 1 -type f -name '*' -size 0 -printf '%f%2d')"
command="echo \$(uptime | awk -F'( |,|:)+' '{d=h=m=0; if (\$7==\"min\") m=\$6; else {if (\$7~/^day/) {d=\$6;h=\$8;m=\$9} else {h=\$6;m=\$7}}} {print d+0,\"days\",h+0,\"hours\",m+0,\"minutes\"}'), \$(hostname | awk '{print \$1}'), \$(sudo awk -F '=' 'FNR == 2 {print \$2}' /etc/connex-release/version.txt), \$(lscpu | awk -F: 'BEGIN{ORS=\", \";} NR==4 || NR==6 || NR==15 {print \$2}' | sed 's/ *//g') \$(free -k | awk '/Mem:/{print \$2}'), \$(df -Ph | awk '/var_lib/||/root/ {print \$2,\",\"\$5,\",\"}')"
pssh -h $file -l user -t 10 -i -o /home/user/tools/cnxhwinfo-$now -x -tt $command
echo "Server Unreachable" | tee "./cnxhwinfo-$now/$empty"
ls ./cnxhwinfo-$now >> ./cnx-data-$now
cat ./cnxhwinfo-$now/* >> ./cnx-list-$now
paste -d, ./cnx-data-$now ./cnx-list-$now >>./cnx-data-"$(date +"%F").csv"
I was trying to use find to locate the empty files and write "Server" unavailable using tee with this:
echo "Server Unreachable" | tee "./cnxhwinfo-$now/$empty"
if the folder specified doesn't already exist i get this error:
tee: ./cnxhwinfo-2019-09-03/: Is a directory
And if it does exist (ie, i run the script again), it instead creates a file named after the IP addresses returned by the find command, like this:
192.168.1.2 192.168.1.3 192.168.1.4 1
I've also tried:
echo "Server Unreachable" | tee <(./cnxhwinfo-$now/$empty)
The find command outputs the IP addresses on a single line with a space in between each one, so I thought that would be fine for tee to use, but I feel like I am either running into syntax issues, or am going about this the wrong way. I have another version of this same script that uses regular ssh and works great, just much slower than using pssh.
empty should be an array, assuming none of the file names will contain any whitespace in their names.
readarray -t empty < <(find ...)
echo "Server unreachable" | (cd ./cnxhwinfo-$now/; tee "${empty[#]}" > /dev/null)
Otherwise, you are building a single file name by concatenating the empty file names.
I am building a new CentOS 6.4 server.
I was wondering if there is a way I can receive a warning email when the use of any partition exceeds 80% in the server.
EDIT:
As Aaron Digulla pointed out, this question is better suited for Server Fault.
Please view or answer this question in the following post in Server Fault.
https://serverfault.com/questions/570647/linux-how-to-receive-warning-email-from-a-server-when-not-much-hard-drive-space
EDIT:
Server Fault put my post on hold. I guess I have no choice but continue this post here.
As Sayajin suggested, the following script can do the trick.
usage=$(df | awk '{print $1,$5}' | tail -n +2 | tr -d '%');
echo "$usage" | while read FS PERCENT; do [ "$PERCENT" -ge "80" ] && echo "$FS has used ${PERCENT}% Disk Space"; done
This is exactly what I want to do. However for my case, the df output looks something like this:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup-LogVol01
197836036 5765212 182021288 4% /
As you see, filesystem and Use% are not in the same line. This causes $1 and $5 are not the info I want to get. Any idea to fix this?
Thanks.
EDIT:
The trick is
df -P
I also found shell script example in the following link doing exactly the same thing:
http://bash.cyberciti.biz/monitoring/shell-script-monitor-unix-linux-diskspace/
Install a monitoring service like Nagios.
You could always create a bash script & then have it email you:
usage=$(df | awk '{print $1,$5}' | tail -n +2 | tr -d '%');
echo "$usage" | while read FS PERCENT; do [ "$PERCENT" -ge "80" ] && echo "$FS has used ${PERCENT}% Disk Space"; done
Obviously instead of the && echo "$FS has used ${PERCENT}% Disk Space" you would send the warning email.
For people who do not have a monitoring system like Nagios (as suggested by #Aaron Digulla), this simple script can do the job :
#!/bin/bash
CURRENT=$(df / | grep / | awk '{ print $5}' | sed 's/%//g')
THRESHOLD=90
if [ "$CURRENT" -gt "$THRESHOLD" ] ; then
mail -s 'Disk Space Alert' mailid#domainname.com << EOF
Your root partition remaining free space is critically low. Used: $CURRENT%
EOF
fi
Then just add a cron job.
I bought a NAS box which has a cut down version of debian on it.
It ran out of space the other day and I did not realise. I am basically wanting to write a bash script that will alert me whenever the disk gets over 90% full.
Is anyone aware of a script that will do this or give me some advice on writing one?
#!/bin/bash
source /etc/profile
# Device to check
devname="/dev/sdb1"
let p=`df -k $devname | grep -v ^File | awk '{printf ("%i",$3*100 / $2); }'`
if [ $p -ge 90 ]
then
df -h $devname | mail -s "Low on space" my#email.com
fi
Crontab this to run however often you want an alert
EDIT: For multiple disks
#!/bin/bash
source /etc/profile
# Devices to check
devnames="/dev/sdb1 /dev/sda1"
for devname in $devnames
do
let p=`df -k $devname | grep -v ^File | awk '{printf ("%i",$3*100 / $2); }'`
if [ $p -ge 90 ]
then
df -h $devname | mail -s "$devname is low on space" my#email.com
fi
done
I tried to use Erik's answer but had issues with devices having long names which wraps the numbers and causes script to fail, also the math looked wrong to me and didn't match the percentages reported by df itself.
Here's an update to his script:
#!/bin/bash
source /etc/profile
# Devices to check
devnames="/dev/sda1 /dev/md1 /dev/mapper/vg1-mysqldisk1 /dev/mapper/vg4-ctsshare1 /dev/mapper/vg2-jbossdisk1 /dev/mapper/vg5-ctsarchive1 /dev/mapper/vg3-muledisk1"
for devname in $devnames
do
let p=`df -Pk $devname | grep -v ^File | awk '{printf ("%i", $5) }'`
if [ $p -ge 70 ]
then
df -h $devname | mail -s "$devname is low on space" my#email.com
fi
done
Key changes are changed df -k to df -Pk to avoid line wrapping and simplified the awk to use pre-calc'd percent instead of recalcing.
You could also use Monit for this kind of job. It's a "free open source utility for managing and monitoring, processes, programs, files, directories and filesystems on a UNIX system".
Based on #Erik answer, here is my version with variables :
#!/bin/bash
DEVNAMES="/ /home"
THRESHOLD=80
EMAIL=you#email.com
host=$(hostname)
for devname in $DEVNAMES
do
current=$(df $devname | grep / | awk '{ print $5}' | sed 's/%//g')
if [ "$current" -gt "$THRESHOLD" ] ; then
mail -s "Disk space alert on $host" "$EMAIL" << EOF
WARNING: partition $devname on $host is $current% !!
To list big files (>100Mo) :
find $devname -xdev -type f -size +100M
EOF
fi
done
And if you do not have the mail command on your server, you can send email via SMPT with swaks :
swaks --from "$EMAIL" --to "$EMAIL" --server "TheServer" --auth LOGIN --auth-user "TheUser" --auth-password "ThePasswrd" --h-Subject "Disk space alert on $host" --body - << EOF
#!/bin/bash
DEVNAMES=$(df --output=source | grep ^/dev)
THRESHOLD=90
EMAIL=your#email
HOST=$(hostname)
for devname in $DEVNAMES
do
current=$(df $devname | awk 'NR>1 {printf "%i",$5}')
[ "$current" -gt "$THRESHOLD" ] && warn="WARNING: partition $devname on $HOST is $current% !! \n$warn"
done
[ "$warn" ] && echo -e "$warn" | mail -s "Disk space alert on $HOST" $EMAIL
Based on previous answers, here's my version with following changes:
Automatically checks all mounted devices
Sends only one mail per check, regardless of how many devices are over the threshold
Code generally tidied up
I'm looking for alternatives to working out the ping between two machine (mA and mB) and report this back to Nagios (on mC).
My current thoughts are to write a BASH script that will ping the machines in a cron job, output the data to a file then have another bash script that Nagios can use to read that file. This doesn't feel like the best/right way to do this though?
Here's the script I plan to run in the cron job:
#!/bin/bash
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ]
then
echo $0: usage: $0 file? ip? pingcount? deadline?
exit 126
else
FILE=$1
IP=$2
PCOUNT=$3
DLINE=$4
while read line
do
if [[ $line == rtt* ]]
then
#replace forward slash with underscore
line=${line////_}
#replace spaces with underscore
line=${line// /_}
#get the 8 item when splitting string on underscore
#echo $line| cut -d'_' -f 8 >> $FILE #Append
#echo $line| cut -d'_' -f 8 > $FILE #Overwrite
echo $line| cut -d'_' -f 8
fi
done < <(ping $IP -c $PCOUNT -q -w $DLINE) #-q output summary / -w deadline / -c pint count
I though about using trace route, but I think this would be produces a slower ping?, is there another way to achieve what I want?
Note: I know Nagios can directly ping a machine, but this isn't what I want to do and won't tell me what I want. Also this is my second script ever, so it's probably rubbish. Also, what alternative would I have if ICMP was blocked?
Have you looked at NRPE and check_ping? This would allow the nagios machine (mC) to ask mA to ping mB and then mA would report the results to mC. You would need to install and configure NRPE and the nagios-plugins on mA for this to work.
I run bash scripts from time to time on my servers, I am trying to write a script that monitors log folders and compress log files if folder exceeds defined capacity. I know there are better ways of doing what I am currently trying to do, your suggestions are more than welcome. The script below is throwing an error "unexpected end of file" .Below is my script.
dir_base=$1
size_ok=5000000
cd $dir_base
curr_size=du -s -D | awk '{print $1}' | sed 's/%//g' zipname=archivedate +%Y%m%d
if (( $curr_size > $size_ok ))
then
echo "Compressing and archiving files, Logs folder has grown above 5G"
echo "oldest to newest selected."
targfiles=( `ls -1rt` )
echo "rocess files."
for tfile in ${targfiles[#]}
do
let `du -s -D | awk '{print $1}' | sed 's/%//g' | tail -1`
if [ $curr_size -lt $size_ok ];
then
echo "$size_ok has been reached. Stopping processes"
break
else if [ $curr_size -gt $size_ok ];
then
zip -r $zipname $tfile
rm -f $tfile
echo "Added ' $tfile ' to archive'date +%Y%m%d`'.zip and removed"
else [ $curr_size -le $size_ok ];
echo "files in $dir_base are less than 5G, not archiving"
fi
Look into logrotate. Here is an example of putting it to use.
With what you give us, you lack a "done" to end the for loop and a "fi" to end the main if. Please reformat your code and You will get more precise answers ...
EDIT :
Looking at your reformatted script, it is as said : The "unexpected end of file" comes from the fact you have not closed your "for" loop neither your "if"
As it seems that you mimick the logrotate behaviour, check it as suggested by #Hank...
my2c
My du -s -D does not show % sign. So you can just do.
curr_size=$(du -s -D)
set -- $curr_size
curr_size=$1
saves you a few overheads instead of du -s -D | awk '{print $1}' | sed 's/%//g.
If it does show % sign, you can get rid of it like this
du -s -D | awk '{print $1+0}'. No need to use sed.
Use $() syntax instead of backticks whenever possible
For targfiles=(ls -1rt) , you can omit the -1. So it can be
targfiles=( $(ls -rt) )
Use quotes around your variables whenever possible. eg "$zipname" , "$tfile"