Suspend/Pause process when out of space? - linux

I was wondering if it is possible to suspend/pause a process in Bash when the disk is running out of space. For example if the free disk space on the server I am working falls below 100 Gb to pause it instead of having it crash when 0 available disk space is reached.
I couldn't find any question similar to mine, but if there is kindly link it. I am very new to Informatics as I have recently started a thesis in Genomics, so I am not even sure if this is possible.
Cheers
David

Sure, why not - you can "pause" any process in linux. In pseudocode, even with some state variable:
#!/bin/bash
bash some_bash_process &
pid=$!
trap 'kill "$pid"' EXIT
paused=false
while process_still_running "$pid"; do
if free_disc_space_below 100G; then
if ! "$paused"; then
paused=true
kill -s SIGSTOP "$pid"
fi
else
if "$paused"; then
paused=false
kill -s SIGCONT "$pid"
fi
fi
sleep 1
done
Research running process in the background in shell, signals and SIGSTOP and SIGCONT signals and their behavior for more information.

Use df to check the disk usage periodically. If you find that there is not enough space, pause the process by sending a signal.
processThatMightRunOutOfDiskSpace &
pid=$!
while
[ -d "/proc/$pid" ] &&
[ "$(df --output=avail / | tail -n1)" -gt 1048576 ]
do
sleep 5
done
kill -TSTP "$pid"
Make sure that you specify the correct file system for df. Here I used /. But when your process writes data to another file system (for instance a separate file system mounted on /home) then you have to specify that instead.
df reports the free space in 1KB-Blocks. 1048576 is 1024*1024, the number of 1KB-blocks needed to fill one 1GB.
Here we used the the signal SIGTSTP because I assumed that your process is a bash script. SIGSTP is the signal bash sends to a foreground process when you hit ctrlz. However, this signal can be ignored. If it doesn't work, use SIGSTOP (kill -STOP) instead.
In both cases, to continue the stopped process use kill -CONT "$pid".

Related

bash sub process won't be killed when main process been killed

everyone.
I have wrote a bash script to monitor cpu, memory and network information. Everything is just fine with cpu and memory parts. But when it comes to network part, things become interesting.
I use "ifstat" to monitor network. "ifstat" is a block thread that will continuously print network IO on the screen. My bash script is like below:
#!/bin/bash
#ignore other less important codes
......
ifstat > network.info &
while true
do
...
done
I use
bash xx.sh
to run it and use ctrl + c to kill it. And the odd thing appears, although this bash process has been killed but ifstat process is still running on the background. I use
ps -e | grep ifstat
to check it out. It's always there util I killed it manually.
In my opinion, ifstat process is a sub process of xx.sh, so I expect it be killed when I kill xx.sh. But obviously it is not !!!
Somebody can tell me why?
And how to kill it automatically when I kill xx.sh process ?
trap termination and propogate the kill.
#ignore other less important codes
ifstat > network.info &
IFSTAT_PID=$!
trap "kill $IFSTAT_PID $$" TERM INT HUP 0
while true
do
...
done

Check if process runs if not execute script.sh

I am trying to find a way to monitor a process. If the process is not running it should be checked again to make sure it has really crashed. If it has really crashed run a script (start.sh)
I have tried monit with no succes, I have also tried adding this script in crontab: I made it executable with chmod +x monitor.sh
the actual program is called program1
case "$(pidof program | wc -w)" in
0) echo "Restarting program1: $(date)" >> /var/log/program1_log.txt
/home/user/files/start.sh &
;;
1) # all ok
;;
*) echo "Removed double program1: $(date)" >> /var/log/program1_log.txt
kill $(pidof program1 | awk '{print $1}')
;;
esac
The problem is this script does not work, I added it to crontab and set it to run every 2 minutes. If I close the program it won't restart.
Is there any other way to check a process, and run start.sh when it has crashed?
Not to be rude, but have you considered a more obvious solution?
When a shell (e.g. bash or tcsh) starts a subprocess, by default it waits for that subprocess to complete.
So why not have a shell that runs your process in a while(1) loop? Whenever the process terminates, for any reason, legitimate or not, it will automatically restart your process.
I ran into this same problem with mythtv. The backend keeps crashing on me. It's a Heisenbug. Happens like once a month (on average). Very hard to track down. So I just wrote a little script that I run in an xterm.
The, ahh, oninter business means that control-c will terminate the subprocess and not my (parent-process) script. Similarly, the sleep is in there so I can control-c several times to kill the subprocess and then kill the parent-process script while it's sleeping...
Coredumpsize is limited just because I don't want to fill up my disk with corefiles that I cannot use.
#!/bin/tcsh -f
limit coredumpsize 0
while( 1 )
echo "`date`: Running mythtv-backend"
# Now we cannot control-c this (tcsh) process...
onintr -
# This will let /bin/ls directory-sort my logfiles based on day & time.
# It also keeps the logfile names pretty unique.
mythbackend |& tee /....../mythbackend.log.`date "+%Y.%m.%d.%H.%M.%S"`
# Now we can control-c this (tcsh) process.
onintr
echo "`date`: mythtv-backend exited. Sleeping for 30 seconds, then restarting..."
sleep 30
end
p.s. That sleep will also save you in the event your subprocess dies immediately. Otherwise the constant respawning without delay will drive your IO and CPU through the roof, making it difficult to correct the problem.

Automating Killall then Killall level 9

Sometimes I want to killall of a certain process, but running killall doesn't work. So when I try to start the process again, it fails because the previous session is still running. Then I have to tediously run killall -9 on it. So to simplify my life, I created a realkill script and it goes like this:
PIDS=$(ps aux | grep -i "$#" | awk '{ print $2 }') # Get matching pid's.
kill $PIDS 2> /dev/null # Try to kill all pid's.
sleep 3
kill -9 $PIDS 2> /dev/null # Force quit any remaining pid's.
So, Is this the best way to be doing this? In what ways can I improve this script?
Avoid killall if you can since there is not a consistent implementation across all UNIX platforms. Proctools' pkill and pgrep are preferable:
for procname; do
pkill "$procname"
done
sleep 3
for procname; do
# Why check if the process exists if you're just going to `SIGKILL` it?
pkill -9 "$procname"
done
(Edit) If you have processes that are supposed to restart after being killed, you may not want to blindly kill them, so you can gather the PIDs first:
pids=()
for procname; do
pids+=($(pgrep "$procname"))
done
# then proceed with `kill`
That said, you should really try to avoid using SIGKILL if you can. It does not give software a chance to clean up after itself. If a program won't quit shortly after receiving a SIGTERM it is probably waiting for something. Find out what it's waiting for (hardware interrupt? open file?) and fix that, and you can let it close cleanly.
Without understanding what exactly the process does, I would say it probably isn't ideal cos you may have a situation where the processes you are killing are really doing some useful shutdown/cleanup work. Forcing it down with kill -9 may short-circuit that work and could cause corruption if your process is in fact writing data.
Assuming there is no danger of data corruption and it's ok to short-circuit the shutdown, can you just kill -9 the process the first time and be done with it. Do you have access to the developers of the process you are killing to understand what is going on that might prevent the shutdown from happening? The process might have blocked the INT and TERM for good reason.
It is unlikely, but it is possible that in that 3 second wait, a new process could have taken over that PID and the second kill would kill it.

syncing a shell script with kernel operations

For stopping activity in my embedded Linux system, I have the following shell script (interpreted by busybox):
#!/bin/sh
pkill usefulp_program
swapoff /home/.swapfile
umount -l /home
sleep 3 # While I can't find a way to sync, sleep
If I take off the sleep line, the script returns immediately, without waiting even for the umount (which is lazy, as for some reason it refuses to unmount otherwise). Do you know how can I wait for all the three operations to complete before finishing the script? Resorting to an arbitrary sleep does not look like a good solution.
Also, any hint on why I can not umount without the -l?
You need to wait for the killed process to terminate. As per your comment...
wait <pid>
...doesn't work! So, could loop ala:
while ps -p <pid> > /dev/null; do sleep 1; done
to wait for the killed process to terminate before doing the swapoff and umount.
As others already mentioned you should and only the -l when the process is terminated. An option if it takes long/it just ignores you polite request to stop itself is using a different signal. The option would be -9 to the kill/killall/pkill command to send the SIGKILL instead of SIGTERM. If you dont want to use the hammer on your first try you could do something like
pkill your_programm
sleep 10
pkill -9 your_programm

Limiting the time a program runs in Linux

In Linux I would like to run a program but only for a limited time, like 1 second. If the program exceeds this running time I would like to kill the process and show an error message.
Ah well. timeout(1).
DESCRIPTION
Start COMMAND, and kill it if still running after DURATION.
StackOverflow won't allow me to delete my answer since it's the accepted one. It's garnering down-votes since it's at the top of the list with a better solution below it. If you're on a GNU system, please use timeout instead as suggested by #wRAR. So in the hopes that you'll stop down-voting, here's how it works:
timeout 1s ./myProgram
You can use s, m, h or d for seconds (the default if omitted), minutes, hours or days. A nifty feature here is that you may specify another option -k 30s (before the 1s above) in order to kill it with a SIGKILL after another 30 seconds, should it not respond to the original SIGTERM.
A very useful tool. Now scroll down and up-vote #wRAR's answer.
For posterity, this was my original - inferior - suggestion, it might still be if some use for someone.
A simple bash-script should be able to do that for you
./myProgram &
sleep 1
kill $! 2>/dev/null && echo "myProgram didn't finish"
That ought to do it.
$! expands to the last backgrounded process (through the use of &), and kill returns false if it didn't kill any process, so the echo is only executed if it actually killed something.
2>/dev/null redirects kill's stderr, otherwise it would print something telling you it was unable to kill the process.
You might want to add a -KILL or whichever signal you want to use to get rid of your process too.
EDIT
As ephemient pointed out, there's a race here if your program finishes and the some other process snatches the pid, it'll get killed instead. To reduce the probability of it happening, you could react to the SIGCHLD and not try to kill it if that happens. There's still chance to kill the wrong process, but it's very remote.
trapped=""
trap 'trapped=yes' SIGCHLD
./myProgram &
sleep 1
[ -z "$trapped" ] && kill $! 2>/dev/null && echo '...'
Maybe CPU time limit (ulimit -t/setrlimit(RLIMIT_CPU)) will help?
you could launch it in a shell script using &
your_program &
pid=$!
sleep 1
if [ `pgrep $pid` ]
then
kill $pid
echo "killed $pid because it took too long."
fi
hope you get the idea, I'm not sure this is correct my shell skills need some refresh :)
tail -f file & pid=$!
sleep 10
kill $pid 2>/dev/null && echo '...'
If you have the sources, you can fork() early in main() and then have the parent process measure the time and possibly kill the child process. Just use standard system calls fork(), waitpid(), kill(), ... maybe some standard Unix signal handling. Not too complicated but takes some effort.
You can also script something on the shell although I doubt it will be as accurate with respect to the time of 1 second.
If you just want to measure the time, type time <cmd ...> on the shell.
Ok, so just write a short C program that forks, calls execlp or something similar in the child, measures the time in the parent and kills the child. Should be easy ...

Resources