I managed to get this script working earlier but then it stopped working and now I always get this error and the logs don't show any information the .log file is empty
My script was working fine until I changed { df -k ${fileSystem}|tail -n1 } to { quota -u |tail -n1 } because it shows the correct usage assigned to me instead of the entire seedbox
Check Entire Seedbox disk free
user#hera:~/scripts$ df -k /home20/<user>|tail -n1
/dev/sdu1 15616058976 3311158640 12303321368 22% /home20
Check Seedbox Slot only disk free
user#hera:~/scripts$ quota -u <user>|tail -n1
/dev/sdu1 1629501728 1953497088 1953497088 4344 0 0
Log file
tail: deluge-disk-check.log: file truncated
[empty]
Error message
./deluge-disk-check.sh: line 23: let: freeSpacePct=100*/: syntax error: operand expected (error token is "/")
my script
#!/bin/bash
exec 3>&1 4>&2
trap 'exec 2>&4 1>&3' 0 1 2 3
exec 1>/home/<user>/scripts/deluge-disk-check.log 2>&1
# Adjust these parameters to your system
fileSystem="/home/<user>" # Filesystem you want to monitor
minFreeSpace1="78" # Seedbox Free space 1st threshold percentage
minFreeSpace2="77" # Seedbox Free space 2nd threshold percentage
checkInterval="3600" # Interval between checks in seconds
SERVICE='deluged'
while (:); do
# Get the output of df -k and put in a variable for parsing
dfOutPut=$(df -k ${fileSystem}|tail -n1)
# Exctract the fields containing total and available 1K-blocks
totalBlocksKb=$(echo "${dfOutPut}" | awk '{print $2}')
availableBlocksKb=$(echo "${dfOutPut}" | awk '{print $4}')
# Calculate percentage of free space
let freeSpacePct=100\*${availableBlocksKb}/${totalBlocksKb}
# Check if free space percentage is below threshold value
if [ "${freeSpacePct}" -lt "${minFreeSpace1}" ]; then
date +'%Y-%m-%d %H:%M:%S'
echo "You only have ${freeSpacePct}% free space on seedbox"
# Check whether the instance of thread exists:
if ps ax | grep -v grep | grep $SERVICE > /dev/null
then
echo "Deluge is running, Is there free space on device?"
pkill deluge
echo -e "No, Trying to stop Deluge...\nDeluge stopped, exiting"
else
if [ "${freeSpacePct}" -lt "${minFreeSpace2}" ]; then
echo "refreshing data..."
echo "Deluge is not running, Is there free space on device?"
echo "Yes, Threshold value is now ${minFreeSpace2}%"
echo "Trying to restart Deluge ..." && app-deluge restart
echo "Deluge is running, exiting"
sleep ${checkInterval}
fi
fi
fi
done
On line 23:
let freeSpacePct=100\*${availableBlocksKb}/${totalBlocksKb}
For arithmetic evaluations, use $(( ... ))) like this:
let freeSpacePct=$(( 100 * ${availableBlocksKb} / ${totalBlocksKb} ))
or more simply (without ${...} around variable names):
let freeSpacePct=$(( 100 * availableBlocksKb / totalBlocksKb ))
Related
I have this code, which basically does a loop inside the DF command to see which disks are more than 90% full.
df -H | sed 1d | awk '{ print $5 " " $1 }' | while read -r dfh;
do
#echo "$output"
op=$(echo "$dfh" | awk '{ print $1}' | cut -d'%' -f1 )
partition=$(echo "$dfh" | awk '{ print $2 }' )
if [ $op -ge 90 ];
then
echo ">> ### WARNING! Running out of space on \"$partition ($op%)\" on $(hostname) as on $(date)" >> LOGFILE
echo -e ">> ### WARNING! Running out of space on \"$partition ($op%)\" on $(hostname) as on $(date)"
echo ">> There is not enough left storage in the disk to perform the upgrade, exiting..." >> LOGFILE
echo -e ">> There is not enough left storage in the disk to perform the upgrade, exiting..."
exit 1
elif [ $op -ge 85 ];
then
echo -e ">> ### WARNING! Running out of space on \"$partition ($op%)\" on $(hostname) as on $(date)" >> LOGFILE
echo ">> ### WARNING! Running out of space on \"$partition ($op%)\" on $(hostname) as on $(date)"
echo ">> There enough left storage in the disk to perform the upgrade, but it is recommended to first increase the size of the disk $partition" >> LOGFILE
echo -e ">> There enough left storage in the disk to perform the upgrade, but it is recommended to first increase the size of the disk $partition"
fi
done
if [ "$?" -eq 1 ];
then
exit
else
echo -e ">> There is enough left storage in the disk to continue with the upgrade, OK"
fi
I want it to exit only if at least one disk is more than 90% full, that's the pourpose of the last if statement
The problem here is that the loop exits on the first disk recognised at more than 90%, this is bad because if I have 3 disks at more than 90% it will only report one of them (the first one in the loop) and then exit. Basically I want the script to report all the disks that are at 90% or more (and the ones that are at 85% too but without exiting, as you can read).
Is this possible?
Thank you in advance
Your script looks like it's screaming loudly to be refactored into Awk. But here is a more gentle refactoring.
rc=0
df -H |
awk 'NR>1 { n=$5; sub(/%/, "", n); print n, $1 }' |
while read -r op partition;
do
if [ $op -ge 90 ];
then
tee -a LOGFILE <<-________EOF
>> ### WARNING! Running out of space on "$partition ($op%)" on $(hostname) as on $(date)
>> There is not enough left storage in the disk to perform the upgrade, exiting...
________EOF
rc=1
elif [ $op -ge 85 ];
then
tee -a LOGFILE <<-________EOF
>> ### WARNING! Running out of space on "$partition ($op%)" on $(hostname) as on $(date)
>> There enough left storage in the disk to perform the upgrade, but it is recommended to first increase the size of the disk $partition
________EOF
fi
[ "$rc" -eq 0 ]
done &&
echo ">> There is enough left storage in the disk to continue with the upgrade, OK" ||
exit 1
This keeps track of rc while looping over all the partitions, then proceeds with the final condition only when the loop is done.
Generally, avoid echo -e in favor of printf, though here, since the -e wasn't doing anything useful anyway, a here document worked better.
You could do something like:
#!/bin/sh
warn=${1-85}
fail=${2-90}
test "$warn" -lt "$fail" || { echo Invalid parameters >&2; exit 1; }
check_disk(){
op=${1%%%}
partition=${2}
if test "$op" -ge "$warn"; then
tee -a LOGFILE <<-EOF
>> ### WARNING! Running out of space on "$partition ($op%)" on $(hostname) as on $(date)
>> There is not enough left storage in the disk to perform the upgrade, exiting...
EOF
fi
if test "$op" -ge "$fail"; then
return 1
fi
} 2> /dev/null
rv=0
df -H | awk 'NR > 1{ print $5 " " $1 }' \
| { while read -r op partition; do
if ! check_disk "$op" "$partition"; then rv=1; fi
done
test "$rv" -eq 0
} || exit 1
A few notes:
You must have literal hard tabs as indentation if you want the <<- to suppress the indentation in the output.
You need to put the while/done in the brackets to give the rv variable the full scope to be able to check it. If you just do df ... | while do/done, then the variable will be unset outside of the pipeline.
This is still a bit kludgy, and it would probably be better to do the whole thing in awk instead of having the while/do loop in the shell at all, but these are some ideas. In particular, you definitely do not want to be writing the same echo line 4 times!
Also note that parsing the output of df is really fragile. On my box, there are lines in which the filesystem column contains whitespace, so that the 5th column is not the current usage of the mountpoint. This script will probably not work as desired on such output.
You can do it in one check, like this:
$ a=10 b=5 c=7 ; (( a>=10 && b>=10 && c>=10 )) && echo 'all >= 10'
$ a=10 b=5 c=10; (( a>=10 && b>=10 && c>=10 )) && echo 'all >= 10'
$ a=10 b=10 c=10; (( a>=10 && b>=10 && c>=10 )) && echo 'all >= 10'
all >= 10
This question already has answers here:
Floating point comparison with variable in bash [duplicate]
(2 answers)
Closed 6 years ago.
In my custom bash script for server monitoring , which actually made to force my CentOS server take some actions and alert me if resources are overloaded more time than expected, I get the following error
line 17: [: 5.74: integer expression expected *
Now by definition all iostat results are float numbers and I already have used awk in my iostat command (WAIT) so how I can make my bash script to expect one instead of integer ?
** Value 5.74 represents current iostat result
#!/bin/bash
if [[ "`pidof -x $(basename $0) -o %PPID`" ]]; then
# echo "Script is already running with PID `pidof -x $(basename $0) -o %PPID`"
exit
fi
UPTIME=`cat /proc/uptime | awk '{print $1}' | cut -d'.' -f1`
WAIT=`iostat -c | head -4 |tail -1 | awk '{print $4}' |cut -d',' -f1`
LOAD=`cat /proc/loadavg |awk '{print $2}' | cut -d'.' -f1`
if [ "$UPTIME" -gt 600 ]
then
if [ "$WAIT" -gt 50 ]
then
if [ "$LOAD" -gt 4 ]
then
#action to take (reboot, restart service, save state sleep retry)
MAIL_TXT="System Status: iowait:"$WAIT" loadavg5:"$LOAD" uptime:"$UPTIME"!"
echo $MAIL_TXT | mail -s "Server Alert Status" "mymail#foe.foe"
/etc/init.d/httpd stop
# /etc/init.d/mysql stop
sleep 10
# /etc/init.d/mysql start
/etc/init.d/httpd start
fi
fi
fi
CentOS release 6.8 (Final) 2.6.32-642.13.1.el6.x86_64
Normally, you'd need to use something other than native shell math, as described in BashFAQ #22. However, since you're comparing to integers, this is easy: You can just truncate at the decimal point.
[ "${UPTIME%%.*}" -gt 600 ] # truncates your UPTIME at the decimal point
[ "${WAIT%%.*}" -gt 50 ] # likewise
I want to make a process monitor that runs in the background and does not take up a bunch of memory. I will be logging in to the remote computer via SSH and if at all possible would like to run the script on my local computer. I need it to throw an alert (audible?) when any running process goes above a predefined limit for CPU and MEM.
Is there any way to get values from 'top'? I have tried several 'ps' commands but not much luck..
ps should give you cpu and memory usage of a pid.
ps -p <pid> -o %cpu,%mem
Results :
%CPU %MEM
12.9 0.9
Something to get you going. This script (test.bash) will throw a message if the CPU limit is above 50% and MEM limit is above 20%. It takes pid as an argument.
#!/bin/bash
pid=$1
clim=50
mlim=20
ps -p $pid -o %cpu,%mem | grep "^[0-9].*" > /tmp/test.txt
while read cpu mem
do
if [ $(bc <<< "$cpu > $clim") == 1 ]; then
echo "CPU ($cpu) is above limit for PID:$pid"
fi
if [ $(bc <<< "$mem > $mlim") == 1 ]; then
echo "MEM ($mem) is above limit for PID:$pid"
fi
done < /tmp/test.txt
Run the script:
]$ ./test.bash 1918
CPU (50.2) is above limit for PID:1918
MEM (20.1) is above limit for PID:1918
Here is the script I pieced together. Hope some one finds this useful. You can change the max_percent variable at the top to suit your needs. sleeper variable is the interval (in seconds) at which the loop executes.
#!/bin/bash
# alarm.sh
max_percent=94
sleeper=1
frequency=1000
duration=300
# To enable the script:
# chmod u+x alert.sh
# get the total available memory:
function total_memory {
echo "Total memory available: "
TOTAL_MEM=$(grep MemTotal /proc/meminfo | awk '{print $2}')
#Another way of doing that:
#total_mem=$(awk '/MemTotal/ {print $2}' /proc/meminfo)
echo "---------- $TOTAL_MEM ---------------"
}
# alarm function params: frequency, duration
# Example:
# _alarm 400 200
_alarm() {
( \speaker-test --frequency $1 --test sine )&
pid=$!
\sleep 0.${2}s
\kill -9 $pid
}
function total_available_memory {
total_available_mem=$(</proc/meminfo grep MemTotal | grep -Eo '[0-9]+')
total_free_mem=$(</proc/meminfo grep MemFree | grep -Eo '[0-9]+')
total_used_mem=$((total_available_mem - total_free_mem))
#percent_used=$((total_available_mem / total_free_mem))
# print the free memory
# customize the unit based on the format of your /proc/meminfo
percent_used=$(printf '%i %i' $total_used_mem $total_available_mem | awk '{ pc=100*$1/$2; i=int(pc); print (pc-i<0.5)?i:i+1 }')
if [ $percent_used -gt $max_percent ]; then
echo "TOO MUCH MEMORY IS BEIGN USED!!!!!!!! KILL IT!"
_alarm $frequency $duration
fi
echo "Available: $total_available_mem kb - Used: $total_used_mem kb - Free: $total_free_mem kb - Percent Used: $percent_used %"
}
# RUN THE FUNCTIONS IN AN INFINITE LOOP:
# total_memory
echo "Press [CTRL+C] to stop.."
while :
do
total_available_memory
sleep $sleeper
done
I need to the following things to make sure my application server is
Tail a log file for a specific string
Remain blocked until that string is printed
However if the string is not printed for about 20 mins quit and throw and exception message like "Server took more that 20 mins to be up"
If string is printed in the log file quit the loop and proceed.
Is there a way to include time outs in a while loop ?
#!/bin/bash
tail -f logfile | grep 'certain_word' | read -t 1200 dummy_var
[ $? -eq 0 ] && echo 'ok' || echo 'server not up'
This reads anything written to logfile, searches for certain_word, echos ok if all is good, otherwise after waiting 1200 seconds (20 minutes) it complains.
You can do it like this:
start_time=$(date +"%s")
while true
do
elapsed_time=$(($(date +"%s") - $start_time))
if [[ "$elapsed_time" -gt 1200 ]]; then
break
fi
sleep 1
if [[ $(grep -c "specific string" /path/to/log/file.log) -ge 1 ]]; then
break
fi
done
You can use signal handlers from shell scripts (see http://www.ibm.com/developerworks/aix/library/au-usingtraps/index.html).
Basically, you'd define a function to be called on, say, signal 17, then put a sub-script in the background that will send that signal at some later time:
timeout(pid) {
sleep 1200
kill -SIGUSR1 $pid
}
watch_for_input() {
tail -f file | grep item
}
trap 'echo "Not found"; exit' SIGUSR1
timeout($$) &
watch_for_input
Then if you reach 1200 seconds, your function is called and you can choose what to do (like signal your tail/grep combo that is watching for your pattern in order to kill it)
time=0
found=0
while [ $time -lt 1200 ]; do
out=$(tail logfile)
if [[ $out =~ specificString ]]; then
found=1
break;
fi
let time++
sleep 1
done
echo $found
The accepted answer doesn't work and will never exit (because althouth read -t exits, the prior pipe commands (tail -f | grep) will only be notified of read -t exit when they try to write to output, which never happens until the string matches).
A one-liner is probably feasible, but here are scripted (working) approaches.
Logic is the same for each one, they use kill to terminate the current script after the timeout.
Perl is probably more widely available than gawk/read -t
#!/bin/bash
FILE="$1"
MATCH="$2"
# Uses read -t, kill after timeout
#tail -f "$FILE" | grep "$MATCH" | (read -t 1 a ; kill $$)
# Uses gawk read timeout ability (not available in awk)
#tail -f "$FILE" | grep "$MATCH" | gawk "BEGIN {PROCINFO[\"/dev/stdin\", \"READ_TIMEOUT\"] = 1000;getline < \"/dev/stdin\"; system(\"kill $$\")}"
# Uses perl & alarm signal
#tail -f "$FILE" | grep "$MATCH" | perl -e "\$SIG{ALRM} = sub { `kill $$`;exit; };alarm(1);<>;"
I am a shell scripting newbie trying to understand some code, but there are some lines that are too complexe for me. The piece of code I'm talking about can be found here: https://gist.github.com/447191
It's purpose is to start, stop and restart a server. That's pretty standard stuff, so it's worth taking some time to understand it. I commented those lines where I am unsure about the meaning or that I completely don't understand, hoping that somone could give me some explanation.
#!/bin/bash
#
BASE=/tmp
PID=$BASE/app.pid
LOG=$BASE/app.log
ERROR=$BASE/app-error.log
PORT=11211
LISTEN_IP='0.0.0.0'
MEM_SIZE=4
CMD='memcached'
# Does this mean, that the COMMAND variable can adopt different values, depending on
# what is entered as parameter? "memcached" is chosen by default, port, ip address and
# memory size are options, but what is -v?
COMMAND="$CMD -p $PORT -l $LISTEN_IP -m $MEM_SIZE -v"
USR=user
status() {
echo
echo "==== Status"
if [ -f $PID ]
then
echo
echo "Pid file: $( cat $PID ) [$PID]"
echo
# ps -ef: Display uid, pid, parent pid, recent CPU usage, process start time,
# controling tty, elapsed CPU usage, and the associated command of all other processes
# that are owned by other users.
# The rest of this line I don't understand, especially grep -v grep
ps -ef | grep -v grep | grep $( cat $PID )
else
echo
echo "No Pid file"
fi
}
start() {
if [ -f $PID ]
then
echo
echo "Already started. PID: [$( cat $PID )]"
else
echo "==== Start"
# Lock file that indicates that no 2nd instance should be started
touch $PID
# COMMAND is called as background process and ignores SIGHUP signal, writes it's
# output to the LOG file.
if nohup $COMMAND >>$LOG 2>&1 &
# The pid of the last background is saved in the PID file
then echo $! >$PID
echo "Done."
echo "$(date '+%Y-%m-%d %X'): START" >>$LOG
else echo "Error... "
/bin/rm $PID
fi
fi
}
# I don't understand this function :-(
kill_cmd() {
SIGNAL=""; MSG="Killing "
while true
do
LIST=`ps -ef | grep -v grep | grep $CMD | grep -w $USR | awk '{print $2}'`
if [ "$LIST" ]
then
echo; echo "$MSG $LIST" ; echo
echo $LIST | xargs kill $SIGNAL
# Why this sleep command?
sleep 2
SIGNAL="-9" ; MSG="Killing $SIGNAL"
if [ -f $PID ]
then
/bin/rm $PID
fi
else
echo; echo "All killed..." ; echo
break
fi
done
}
stop() {
echo "==== Stop"
if [ -f $PID ]
then
if kill $( cat $PID )
then echo "Done."
echo "$(date '+%Y-%m-%d %X'): STOP" >>$LOG
fi
/bin/rm $PID
kill_cmd
else
echo "No pid file. Already stopped?"
fi
}
case "$1" in
'start')
start
;;
'stop')
stop
;;
'restart')
stop ; echo "Sleeping..."; sleep 1 ;
start
;;
'status')
status
;;
*)
echo
echo "Usage: $0 { start | stop | restart | status }"
echo
exit 1
;;
esac
exit 0
1)
COMMAND="$CMD -p $PORT -l $LISTEN_IP -m $MEM_SIZE -v" — -v in Unix tradition very often is a shortcut for --verbose. All those dollar signs are variable expansion (their text values are inserted into the string assigned to new variable COMMAND).
2)
ps -ef | grep -v grep | grep $( cat $PID ) - it's a pipe: ps redirects its output to grep which outputs to another grep and the end result is printed to the standard output.
grep -v grep means "take all lines that do not contain 'grep'" (grep itself is a process, so you need to exclude it from output of ps). $( $command ) is a way to run command and insert its standard output into this place of script (in this case: cat $PID will show contents of file with name $PID).
3) kill_cmd.
This function is an endless loop trying to kill the LIST of 'memcached' processes' PIDs. First, it tries to send TERM signal (politely asking each process in $LIST to quit, saving its work and shutting down correctly), gives them 2 seconds (sleep 2) to do their shutdown job and then tries to make sure that all processes are killed using signal KILL (-9), which slays the process immediately using OS facilities: if a process has not done its shutdown work in 2 seconds, it's considered hung). If slaying with kill -9 was successful, it removes the PID file and quits the loop.
ps -ef | grep -v grep | grep $CMD | grep -w $USR | awk '{print $2}' prints all PIDs of processes with name $CMD ('memcached') and user $USR ('user'). -w option of grep means 'the Whole word only' (this excludes situations where the sought name is a part of another process name, like 'fakememcached'). awk is a little interpreter most often used to take a word number N from every line of input (you can consider it a selector for a column of a text table). In this case, it prints every second word in ps output lines, that means every PID.
If you have any other questions, I'll add answers below.
Here is an explanation of the pieces of code you do not understand:
1.
# Does this mean, that the COMMAND variable can adopt different values, depending on
# what is entered as parameter? "memcached" is chosen by default, port, ip address and
# memory size are options, but what is -v?
COMMAND="$CMD -p $PORT -l $LISTEN_IP -m $MEM_SIZE -v"
In the man, near -v:
$ man memcached
...
-v Be verbose during the event loop; print out errors and warnings.
...
2.
# ps -ef: Display uid, pid, parent pid, recent CPU usage, process start time,
# controling tty, elapsed CPU usage, and the associated command of all other processes
# that are owned by other users.
# The rest of this line I don't understand, especially grep -v grep
ps -ef | grep -v grep | grep $( cat $PID )
Print all processes details (ps -ef), exclude the line with grep (grep -v grep) (since you are running grep it will display itself in the process list) and filter by the text found in the file named $PID (/tmp/app.pid) (grep $( cat $PID )).
3.
# I don't understand this function :-(
kill_cmd() {
SIGNAL=""; MSG="Killing "
while true
do
## create a list with all the pid numbers filtered by command (memcached) and user ($USR)
LIST=`ps -ef | grep -v grep | grep $CMD | grep -w $USR | awk '{print $2}'`
## if $LIST is not empty... proceed
if [ "$LIST" ]
then
echo; echo "$MSG $LIST" ; echo
## kill all the processes in the $LIST (xargs will get the list from the pipe and put it at the end of the kill command; something like this < kill $SIGNAL $LIST > )
echo $LIST | xargs kill $SIGNAL
# Why this sleep command?
## some processes might take one or two seconds to perish
sleep 2
SIGNAL="-9" ; MSG="Killing $SIGNAL"
## if the file $PID still exists, delete it
if [ -f $PID ]
then
/bin/rm $PID
fi
## if list is empty
else
echo; echo "All killed..." ; echo
## get out of the while loop
break
fi
done
}
This function will kill all the processes related to memcached slowly and painfully (actually quite the opposite).
Above are the explanations.