Defusing fork bomb: kill forking processes - linux

I want to execute user-provided (potentially unsafe) code in shell using docker. Every run should be limited by time, say, 10 seconds. So after 10 seconds I want to stop the command and all the forks made by it.
I've set up appropriate limit of max number of processes using ulimit -u 30 and now forkbomb doesn't made any damage to the system.
I run docker by user with id=1000, user inside container will have id=3000, so user who starts the command doesn't blocked by the fork bomb.
Now I want to deal with timeout. Unfortunately, timeout 10s docker run -u 3000 ... forkbomb doesn't work. What I have to do is to kill all the processes owned by the user with id=3000, but these processes fork infinitely and I'm worried that killall -KILL -u 3000 won't always work. killall isn't atomic, is it?
So, I'm looking for a way to kill all the processes owned by some specific user even if the processes fork. I found a blog post which have following code:
cd /proc;
for p in [0-9]*; do read CMDLINE < $p/cmdline; if [[ $CMDLINE == "processWithBombName" ]];
then kill -s SIGSTOP $p; fi; done
for p in [0-9]*; do read CMDLINE < $p/cmdline; if [[ $CMDLINE == "processWithBombName" ]];
then kill -s SIGKILL $p; fi; done
First, it the code above doesn't have concurrency issues? For example, while we do stop, the loop read processes with id={1, 2, 3} and while it's stopping pid=1, the process with id 2 has died and pid=4 has been created. Will pid=4 be killed?
Second, how can I kill all the processes for the user and be sure that no new processes have been created?

Related

How to kill child processes in Bash? [duplicate]

This question already has answers here:
How to kill all subprocesses of shell?
(9 answers)
Closed 3 years ago.
I am trying to do an operation in linux trying to burn cpu using openssl speed
this is my code from netflix simian army
#!/bin/bash
# Script for BurnCpu Chaos Monkey
cat << EOF > /tmp/infiniteburn.sh
#!/bin/bash
while true;
do openssl speed;
done
EOF
# 32 parallel 100% CPU tasks should hit even the biggest EC2 instances
for i in {1..32}
do
nohup /bin/bash /tmp/infiniteburn.sh &
done
so this is Netflix simian army code to do burn cpu, this executes properly but the issue is I cannot kill all 32 processes, I tried everything
pkill -f pid/process name
killall -9 pid/process name
etc.,
the only successful way I killed the process is through killing it via user
pkill -u username
How can I kill these process without using username?
any help is greatly appreciated
finally, I found a solution to my own question,
kill -- -$(ps -o pgid= $PID | grep -o [0-9]*)
where PID is the process ID of any of the one processes running, this works fine but I am open to hear any other options available
source: http://fibrevillage.com/sysadmin/237-ways-to-kill-parent-and-child-processes-in-one-command
Killing a process doesn't automatically kill its children. Killing your bash script won't kill the openssl speed processes.
You can either cast a wider net with your kill call, which is what you're doing with pkill -u. Or you could use trap in your script and add an error handler.
cleanup() {
# kill children
}
trap cleanup EXIT
I had a similar problem and solution where I needed to kill a NodeJS server after some amount of time.
To do this, I enabled Job control, and killed async processes by group id with jobs:
set -m
./node_modules/.bin/node src/index.js &
sleep 2
kill -- -$(jobs -p)

Killing process started by bash script but not script itself

So basically I have one script that is keeping a server alive. It starts the server process and then starts it again after the process stops. Although sometimes the server becomes non responsive. For that I want to have another script which would ping the server and would kill the process if it wouldn't respond in 60 seconds.
The problem is that if I kill the server process the bash script also gets terminated.
The start script is just while do: sh Server.sh. It calls other shell script that has additional parameters for starting the server. The server is using java so it starts a java process. If the server hangs I use kill -9 pid because nothing else stops it. If the server doesn't hang and does the usual restart it gracefully stops and the bash script start second loop.
Doing The Right Thing
Use a real process supervision system -- your Linux distribution almost certainly includes one.
Directly monitoring the supervised process by PID
An awful, ugly, moderately buggy approach (for instance, able to kill the wrong process in the event of a PID collision) is the following:
while :; do
./Server.sh & server_pid=$!
echo "$server_pid" > server.pid
wait "$server_pid"
done
...and, to kill the process:
#!/bin/bash
# ^^^^ - DO NOT run this with "sh scriptname"; it must be "bash scriptname".
server_pid="$(<server.pid)"; [[ $server_pid ]] || exit
# allow 5 seconds for clean shutdown -- adjust to taste
for (( i=0; i<5; i++ )); do
if kill -0 "$server_pid"; then
sleep 1
else
exit 0 # server exited gracefully, nothing else to do
fi
done
# escalate to a SIGKILL
kill -9 "$server_pid"
Note that we're storing the PID of the server in our pidfile, and killing that directly -- thus, avoiding inadvertently targeting the supervision script.
Monitoring the supervised process and all children via lockfile
Note that this is using some Linux-specific tools -- but you do have linux on your question.
A more robust approach -- which will work across reboots even in the case of pidfile reuse -- is to use a lockfile:
while :; do
flock -x Server.lock sh Server.sh
done
...and, on the other end:
#!/bin/bash
# kill all programs having a handle on Server.lock
fuser -k Server.lock
for ((i=0; i<5; i++)); do
if fuser -s Server.lock; then
sleep 1
else
exit 0
fi
done
fuser -k -KILL Server.lock

nohup node service using cron job on CentOS 7 [duplicate]

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

BASH - why the infinite loop is not infinite and failing to restart the crashed process? [duplicate]

I have a python script that'll be checking a queue and performing an action on each item:
# checkqueue.py
while True:
check_queue()
do_something()
How do I write a bash script that will check if it's running, and if not, start it. Roughly the following pseudo code (or maybe it should do something like ps | grep?):
# keepalivescript.sh
if processidfile exists:
if processid is running:
exit, all ok
run checkqueue.py
write processid to processidfile
I'll call that from a crontab:
# crontab
*/5 * * * * /path/to/keepalivescript.sh
Avoid PID-files, crons, or anything else that tries to evaluate processes that aren't their children.
There is a very good reason why in UNIX, you can ONLY wait on your children. Any method (ps parsing, pgrep, storing a PID, ...) that tries to work around that is flawed and has gaping holes in it. Just say no.
Instead you need the process that monitors your process to be the process' parent. What does this mean? It means only the process that starts your process can reliably wait for it to end. In bash, this is absolutely trivial.
until myserver; do
echo "Server 'myserver' crashed with exit code $?. Respawning.." >&2
sleep 1
done
The above piece of bash code runs myserver in an until loop. The first line starts myserver and waits for it to end. When it ends, until checks its exit status. If the exit status is 0, it means it ended gracefully (which means you asked it to shut down somehow, and it did so successfully). In that case we don't want to restart it (we just asked it to shut down!). If the exit status is not 0, until will run the loop body, which emits an error message on STDERR and restarts the loop (back to line 1) after 1 second.
Why do we wait a second? Because if something's wrong with the startup sequence of myserver and it crashes immediately, you'll have a very intensive loop of constant restarting and crashing on your hands. The sleep 1 takes away the strain from that.
Now all you need to do is start this bash script (asynchronously, probably), and it will monitor myserver and restart it as necessary. If you want to start the monitor on boot (making the server "survive" reboots), you can schedule it in your user's cron(1) with an #reboot rule. Open your cron rules with crontab:
crontab -e
Then add a rule to start your monitor script:
#reboot /usr/local/bin/myservermonitor
Alternatively; look at inittab(5) and /etc/inittab. You can add a line in there to have myserver start at a certain init level and be respawned automatically.
Edit.
Let me add some information on why not to use PID files. While they are very popular; they are also very flawed and there's no reason why you wouldn't just do it the correct way.
Consider this:
PID recycling (killing the wrong process):
/etc/init.d/foo start: start foo, write foo's PID to /var/run/foo.pid
A while later: foo dies somehow.
A while later: any random process that starts (call it bar) takes a random PID, imagine it taking foo's old PID.
You notice foo's gone: /etc/init.d/foo/restart reads /var/run/foo.pid, checks to see if it's still alive, finds bar, thinks it's foo, kills it, starts a new foo.
PID files go stale. You need over-complicated (or should I say, non-trivial) logic to check whether the PID file is stale, and any such logic is again vulnerable to 1..
What if you don't even have write access or are in a read-only environment?
It's pointless overcomplication; see how simple my example above is. No need to complicate that, at all.
See also: Are PID-files still flawed when doing it 'right'?
By the way; even worse than PID files is parsing ps! Don't ever do this.
ps is very unportable. While you find it on almost every UNIX system; its arguments vary greatly if you want non-standard output. And standard output is ONLY for human consumption, not for scripted parsing!
Parsing ps leads to a LOT of false positives. Take the ps aux | grep PID example, and now imagine someone starting a process with a number somewhere as argument that happens to be the same as the PID you stared your daemon with! Imagine two people starting an X session and you grepping for X to kill yours. It's just all kinds of bad.
If you don't want to manage the process yourself; there are some perfectly good systems out there that will act as monitor for your processes. Look into runit, for example.
Have a look at monit (http://mmonit.com/monit/). It handles start, stop and restart of your script and can do health checks plus restarts if necessary.
Or do a simple script:
while true
do
/your/script
sleep 1
done
In-line:
while true; do <your-bash-snippet> && break; done
This will restart continuously <your-bash-snippet> if it fails: && break will stop the loop if <your-bash-snippet> stop gracefully (return code 0).
To restart <your-bash-snippet> in all cases:
while true; do <your-bash-snippet>; done
e.g. #1
while true; do openconnect x.x.x.x:xxxx && break; done
e.g. #2
while true; do docker logs -f container-name; sleep 2; done
The easiest way to do it is using flock on file. In Python script you'd do
lf = open('/tmp/script.lock','w')
if(fcntl.flock(lf, fcntl.LOCK_EX|fcntl.LOCK_NB) != 0):
sys.exit('other instance already running')
lf.write('%d\n'%os.getpid())
lf.flush()
In shell you can actually test if it's running:
if [ `flock -xn /tmp/script.lock -c 'echo 1'` ]; then
echo 'it's not running'
restart.
else
echo -n 'it's already running with PID '
cat /tmp/script.lock
fi
But of course you don't have to test, because if it's already running and you restart it, it'll exit with 'other instance already running'
When process dies, all it's file descriptors are closed and all locks are automatically removed.
You should use monit, a standard unix tool that can monitor different things on the system and react accordingly.
From the docs: http://mmonit.com/monit/documentation/monit.html#pid_testing
check process checkqueue.py with pidfile /var/run/checkqueue.pid
if changed pid then exec "checkqueue_restart.sh"
You can also configure monit to email you when it does do a restart.
if ! test -f $PIDFILE || ! psgrep `cat $PIDFILE`; then
restart_process
# Write PIDFILE
echo $! >$PIDFILE
fi
watch "yourcommand"
It will restart the process if/when it stops (after a 2s delay).
watch -n 0.1 "yourcommand"
To restart it after 0.1s instead of the default 2 seconds
watch -e "yourcommand"
To stop restarts if the program exits with an error.
Advantages:
built-in command
one line
easy to use and remember.
Drawbacks:
Only display the result of the command on the screen once it's finished
I'm not sure how portable it is across operating systems, but you might check if your system contains the 'run-one' command, i.e. "man run-one".
Specifically, this set of commands includes 'run-one-constantly', which seems to be exactly what is needed.
From man page:
run-one-constantly COMMAND [ARGS]
Note: obviously this could be called from within your script, but also it removes the need for having a script at all.
I've used the following script with great success on numerous servers:
pid=`jps -v | grep $INSTALLATION | awk '{print $1}'`
echo $INSTALLATION found at PID $pid
while [ -e /proc/$pid ]; do sleep 0.1; done
notes:
It's looking for a java process, so I
can use jps, this is much more
consistent across distributions than
ps
$INSTALLATION contains enough of the process path that's it's totally unambiguous
Use sleep while waiting for the process to die, avoid hogging resources :)
This script is actually used to shut down a running instance of tomcat, which I want to shut down (and wait for) at the command line, so launching it as a child process simply isn't an option for me.
I use this for my npm Process
#!/bin/bash
for (( ; ; ))
do
date +"%T"
echo Start Process
cd /toFolder
sudo process
date +"%T"
echo Crash
sleep 1
done

Sleep in a while loop gets its own pid

I have a bash script that does some parallel processing in a loop. I don't want the parallel process to spike the CPU, so I use a sleep command. Here's a simplified version.
(while true;do sleep 99999;done)&
So I execute the above line from a bash prompt and get something like:
[1] 12345
Where [1] is the job number and 12345 is the process ID (pid) of the while loop. I do a kill 12345 and get:
[1]+ Terminated ( while true; do
sleep 99999;
done )
It looks like the entire script was terminated. However, I do a ps aux|grep sleep and find the sleep command is still going strong but with its own pid! I can kill the sleep and everything seems fine. However, if I were to kill the sleep first, the while loop starts a new sleep pid. This is such a surprise to me since the sleep is not parallel to the while loop. The loop itself is a single path of execution.
So I have two questions:
Why did the sleep command get its own process ID?
How do I easily kill the while loop and the sleep?
Sleep gets its own PID because it is a process running and just waiting. Try which sleep to see where it is.
You can use ps -uf to see the process tree on your system. From there you can determine what the PPID (parent PID) of the shell (the one running the loop) of the sleep is.
Because "sleep" is a process, not a build-in function or similar
You could do the following:
(while true;do sleep 99999;done)&
whilepid=$!
kill -- -$whilepid
The above code kills the process group, because the PID is specified as a negative number (e.g. -123 instead of 123). In addition, it uses the variable $!, which stores the PID of the most recently executed process.
Note:
When you execute any process in background on interactive mode (i.e. using the command line prompt) it creates a new process group, which is what is happening to you. That way, it's relatively easy to "kill 'em all", because you just have to kill the whole process group. However, when the same is done within a script, it doesn't create any new group, because all new processes belong to the script PID, even if they are executed in background (jobs control is disabled by default). To enable jobs control in a script, you just have to put the following at the beginning of the script:
#!/bin/bash
set -m
Have you tried doing kill %1, where 1 is the number you get after launching the command in background?
I did it right now after launching (while true;do sleep 99999;done)& and it correctly terminated it.
"ps --ppid" selects all processes with the specified parent pid, eg:
$ (while true;do sleep 99999;done)&
[1] 12345
$ ppid=12345 ; kill -9 $ppid $(ps --ppid $ppid -o pid --no-heading)
You can kill the process group.
To find the process group of your process run:
ps --no-headers -o "%r" -p 15864
Then kill the process group using:
kill -- -[PGID]
You can do it all in one command. Let's try it out:
$ (while true;do sleep 99999;done)&
[1] 16151
$ kill -- -$(ps --no-headers -o "%r" -p 16151)
[1]+ Terminated ( while true; do
sleep 99999;
done )
To kill the while loop and the sleep using $! you can also use a trap signal handler inside the subshell.
(trap 'kill ${!}; exit' TERM; while true; do sleep 99999 & wait ${!}; done)&
kill -TERM ${!}

Resources