How to check if a daemon successfully started as part of a script? - linux

Let's say I have this script:
start daemon1 &
start daemon2 &
echo "Running..."
daemon2 can only be started if daemon1 was started successfully.
if daemon1 did not start successfully, then the script most be aborted
"Running..." should be displayed only if daemon2 started successfully.
if daemon2 did not start successfully, then the script most be aborted
How can I make this with a shell script ?

You can check the PID of the started process to see if it is running
start daemon1 &
P=$!
if kill -0 $P > /dev/null 2>&1 ; then
start daemon2 &
P=$!
if kill -0 $P > /dev/null 2>&1 ; then
echo "Running..."
fi
fi
Untested code. Comment if something is not right

I propose you capture the daemon's pid (process id) and then determine if the pid exists (after some delay in case daemon1 takes a while to process to start and crash). So here is a way of achieving that (in Linux, I'm ignoring the 'start' in your commands since I'm not familiar with the windows cmdline environment ):
start daemon1 &
pid1=$!
sleep 3 # give daemon1 some time to get going
if
kill -0 $pid1 2>/dev/null
then
start daemon2 &
pid2=$!
sleep 3 # give daemon2 some time to get going
if
kill -0 $pid2 2>/dev/null
then
echo "Running..."
fi
fi
The necessary ingredients for this recipe are:
$! returns the child's pid (of the last background process run)
kill -0 <pid> is a way of determining if a pid is valid (in the process table)

Related

How to detect PID

I trying detect PID in my shell script.
#!/bin/bash
npm run serve-prod &
pid=$(pgrep serve-prod)
echo $pid
Then a while
echo "waiting for webserver"
while ! nc -z localhost 9000; do
sleep 1 # wait for 1/10 of the second before check again
done
Then call a script
npm run print-map
And finish kill process
pkill serve-prod
Problem is process is still runing
Try this for detect PID:
#!/bin/bash
pid=0
npm run serve-prod &
pid=$!
echo $pid

Bash: Killing all processes in subprocess

In bash I can get the process ID (pid) of the last subprocess through the $! variable. I can then kill this subprocess before it finishes:
(sleep 5) & pid=$!
kill -9 $pid
This works as advertised. If I now extend the subprocess with more commands after the sleep, the sleep command continues after the subprocess is killed, even though the other commands never get executed.
As an example, consider the following, which spins up a subprocess and monitor its assassination using ps:
# Start subprocess and get its pid
(sleep 5; echo done) & pid=$!
# grep for subprocess
echo "grep before kill:"
ps aux | grep "$pid\|sleep 5"
# Kill the subprocess
echo
echo "Killing process $pid"
kill -9 $pid
# grep for subprocess
echo
echo "grep after kill:"
ps aux | grep "$pid\|sleep 5"
# Wait for sleep to finish
sleep 6
# grep for subprocess
echo
echo "grep after sleep is finished:"
ps aux | grep "$pid\|sleep 5"
If I save this to a file named filename and run it, I get this printout:
grep before kill:
username 7464 <...> bash filename
username 7466 <...> sleep 5
username 7467 <...> grep 7464\|sleep 5
Killing process 7464
grep after kill:
username 7466 <...> sleep 5
username 7469 <...> grep 7464\|sleep 5
grep after sleep is finished:
username 7472 <...> grep 7464\|sleep 5
where unimportant information from the ps command is replaced with <...>. It looks like the kill has killed the overall bash execution of filename, while leaving sleep running.
How can I correctly kill the entire subprocess?
You can set a trap in the subshell to kill any active jobs before exiting:
(trap 'kill $(jobs -p)' EXIT; sleep 5; echo done ) & pid=$!
I don't know exactly why that sleep process gets orphaned, anyway instead kill you can use pkill with -P flag to also kill all children
pkill -TERM -P $pid
EDIT:
that means that in order to kill a process and all it's children you should use instead
CPIDS=`pgrep -P $pid` # gets pids of child processes
kill -9 $pid
for cpid in $CPIDS ; do kill -9 $cpid ; done
You can have a look at rkill that seems to meet your requirements :
http://www.unix.com/man-page/debian/1/rkill/
rkill [-SIG] pid/name...
When invoked as rkill, this utility does not display information about the processes, but
sends them all a signal instead. If not specified on the command line, a terminate
(SIGTERM) signal is sent.

Cleanup after the background process finished its work on Linux

I have a script-launcher (bash) that executes Python scripts in the background, so I can start it and then close terminal/ssh connection, leaving the script working.
It accepts the name of the script to run and optional arguments to pass there. Then it starts the Python script (detached) and creates a file with PID (of the Python script) in the same directory, so I can later reconnect to the server and kill this background process by using the PID from this file.
Also this PID file is used to prevent the same script been started if it's already running (singleton).
The problem is that I can't figure out how to delete this PID file after the Python script finished its work. I need this to be implemented in bash script, no Python solutions (since I want to use it for all cases) or screen tool. This supervisor (that will delete PID file after the script finished work) also should be run in the background (!), so I can do the same thing: close terminal session.
What I've tried so far:
#!/bin/bash
PIDFILE=$1.pid
if [ -f $PIDFILE ]; then
echo "Process is already running, PID: $(< $PIDFILE)"
exit 1
else
nohup python $1 "${#:2}" > /dev/null 2>&1 &
PID=$!
echo $PID > $PIDFILE
# supervisor
nohup sh -c "wait $PID; rm -f $PIDFILE" > /dev/null 2>&1 &
fi
In this example the PID file is deleted immediately, because wait command returns immediately (I think it's because the new process isn't a child of the current one, so wait doesn't work in this case as I expect).
Do you have any thoughts about how it can be implemented?
Basically, I need something to replace this line
nohup sh -c "wait $PID; rm -f $PIDFILE" > /dev/null 2>&1 &
that will wait until the previous script (Python's in this case) will finish its work and then delete PID file.
UPD: OK, the problem was with wait command, because it can't wait for non-child processes. The working solution is to replace it with while loop:
#!/bin/bash
function cleanup {
while [ -e /proc/$1 ]; do
sleep 1;
done
rm -f $PIDFILE
}
PIDFILE=$1.pid
if [ -f $PIDFILE ]; then
echo "Process is already running, PID: $(< $PIDFILE)"
exit 1
else
python $1 "${#:2}" > /dev/null 2>&1 &
PID=$!
echo $PID > $PIDFILE
cleanup $PID > /dev/null 2>&1 &
disown
fi
For shell scripts, use traps:
#!/bin/bash
function finish {
wait $PID
rm $PIDFILE > /dev/null 2>&1 &
}
trap finish EXIT
trap "finish; exit 2" SIGINT
PIDFILE=$1.pid
if [ -f $PIDFILE ]; then
echo "Process is already running, PID: $(< $PIDFILE)"
exit 1
else
nohup python $1 "${#:2}" > /dev/null 2>&1 &
PID=$!
echo $PID > $PIDFILE
fi
Traps allow you to catch signals and respond to them, so in the code above, the EXIT signal (normal completion) will execute finish, removing the $PIDFILE. On SIGINT (user requested exit with ctrl-c), the script will remove the $PIDFILE and exit with 2.
Directly in python: if you want to handle it manually take a look at atexit. I haven't looked at the source, but it looks like it implements traps in order to register cleanup functions:
import atexit
import os
def cleanup():
os.unlink(pidfile)
atexit.register(cleanup)
Or to automate pidfile handling checkout pid which will handle preventing simultaneous execution all on its own:
from pid import PidFile
with PidFile():
do_something()
or better yet
from pid.decorator import pidfile
#pidfile()
def main():
pass
if __name__ == "__main__":
main()

Wait for arbitrary process and get its exit code in Linux

Is there a way to wait until a process finishes if I'm not the one who started it?
e.g. if I ran "ps -ef" and pick any PID (assuming I have rights to access process information) - is there a way I can wait until the PID completes and get its exit code?
You could use strace, which tracks signals and system calls. The following command waits until a program is done, then prints its exit code:
$ strace -e none -e exit_group -p $PID # process calls exit(1)
Process 23541 attached - interrupt to quit
exit_group(1) = ?
Process 23541 detached
$ strace -e none -e exit_group -p $PID # ^C at the keyboard
Process 22979 attached - interrupt to quit
--- SIGINT (Interrupt) # 0 (0) ---
Process 22979 detached
$ strace -e none -e exit_group -p $PID # kill -9 $PID
Process 22983 attached - interrupt to quit
+++ killed by SIGKILL +++
Signals from ^Z, fg and kill -USR1 get printed too. Either way, you'll need to use sed if you want to use the exit code in a shell script.
If that's too much shell code, you can use a program I hacked together in C a while back. It uses ptrace() to catch signals and exit codes of pids. (It has rough edges and may not work in all situations.)
I hope that helps!
is there a way I can wait until the PID completes and get its exit code
Yes, if the process is not being ptraced by somebody else, you can PTRACE_ATTACH to it, and get notified about various events (e.g. signals received), and about its exit.
Beware, this is quite complicated to handle properly.
If you can live without the exit code:
tail --pid=$pid -f /dev/null
If you know the process ID you can make use of the wait command which is a bash builtin:
wait PID
You can get the PID of the last command run in bash using $!. Or, you can grep for it with from the output of ps.
In fact, the wait command is a useful way to run parralel command in bash. Here's an example:
# Start the processes in parallel...
./script1.sh 1>/dev/null 2>&1 &
pid1=$!
./script2.sh 1>/dev/null 2>&1 &
pid2=$!
./script3.sh 1>/dev/null 2>&1 &
pid3=$!
./script4.sh 1>/dev/null 2>&1 &
pid4=$!
# Wait for processes to finish...
echo -ne "Commands sent... "
wait $pid1
err1=$?
wait $pid2
err2=$?
wait $pid3
err3=$?
wait $pid4
err4=$?
# Do something useful with the return codes...
if [ $err1 -eq 0 -a $err2 -eq 0 -a $err3 -eq 0 -a $err4 -eq 0 ]
then
echo "pass"
else
echo "fail"
fi

Get exit code of a background process

I have a command CMD called from my main bourne shell script that takes forever.
I want to modify the script as follows:
Run the command CMD in parallel as a background process (CMD &).
In the main script, have a loop to monitor the spawned command every few seconds. The loop also echoes some messages to stdout indicating progress of the script.
Exit the loop when the spawned command terminates.
Capture and report the exit code of the spawned process.
Can someone give me pointers to accomplish this?
1: In bash, $! holds the PID of the last background process that was executed. That will tell you what process to monitor, anyway.
4: wait <n> waits until the process with PID <n> is complete (it will block until the process completes, so you might not want to call this until you are sure the process is done), and then returns the exit code of the completed process.
2, 3: ps or ps | grep " $! " can tell you whether the process is still running. It is up to you how to understand the output and decide how close it is to finishing. (ps | grep isn't idiot-proof. If you have time you can come up with a more robust way to tell whether the process is still running).
Here's a skeleton script:
# simulate a long process that will have an identifiable exit code
(sleep 15 ; /bin/false) &
my_pid=$!
while ps | grep " $my_pid " # might also need | grep -v grep here
do
echo $my_pid is still in the ps output. Must still be running.
sleep 3
done
echo Oh, it looks like the process is done.
wait $my_pid
# The variable $? always holds the exit code of the last command to finish.
# Here it holds the exit code of $my_pid, since wait exits with that code.
my_status=$?
echo The exit status of the process was $my_status
This is how I solved it when I had a similar need:
# Some function that takes a long time to process
longprocess() {
# Sleep up to 14 seconds
sleep $((RANDOM % 15))
# Randomly exit with 0 or 1
exit $((RANDOM % 2))
}
pids=""
# Run five concurrent processes
for i in {1..5}; do
( longprocess ) &
# store PID of process
pids+=" $!"
done
# Wait for all processes to finish, will take max 14s
# as it waits in order of launch, not order of finishing
for p in $pids; do
if wait $p; then
echo "Process $p success"
else
echo "Process $p fail"
fi
done
The pid of a backgrounded child process is stored in $!.
You can store all child processes' pids into an array, e.g. PIDS[].
wait [-n] [jobspec or pid …]
Wait until the child process specified by each process ID pid or job specification jobspec exits and return the exit status of the last command waited for. If a job spec is given, all processes in the job are waited for. If no arguments are given, all currently active child processes are waited for, and the return status is zero. If the -n option is supplied, wait waits for any job to terminate and returns its exit status. If neither jobspec nor pid specifies an active child process of the shell, the return status is 127.
Use wait command you can wait for all child processes finish, meanwhile you can get exit status of each child processes via $? and store status into STATUS[]. Then you can do something depending by status.
I have tried the following 2 solutions and they run well. solution01 is
more concise, while solution02 is a little complicated.
solution01
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
PIDS+=($!)
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS+=($?)
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
solution02
#!/bin/bash
# start 3 child processes concurrently, and store each pid into array PIDS[].
i=0
process=(a.sh b.sh c.sh)
for app in ${process[#]}; do
./${app} &
pid=$!
PIDS[$i]=${pid}
((i+=1))
done
# wait for all processes to finish, and store each process's exit code into array STATUS[].
i=0
for pid in ${PIDS[#]}; do
echo "pid=${pid}"
wait ${pid}
STATUS[$i]=$?
((i+=1))
done
# after all processed finish, check their exit codes in STATUS[].
i=0
for st in ${STATUS[#]}; do
if [[ ${st} -ne 0 ]]; then
echo "$i failed"
else
echo "$i finish"
fi
((i+=1))
done
As I see almost all answers use external utilities (mostly ps) to poll the state of the background process. There is a more unixesh solution, catching the SIGCHLD signal. In the signal handler it has to be checked which child process was stopped. It can be done by kill -0 <PID> built-in (universal) or checking the existence of /proc/<PID> directory (Linux specific) or using the jobs built-in (bash specific. jobs -l also reports the pid. In this case the 3rd field of the output can be Stopped|Running|Done|Exit . ).
Here is my example.
The launched process is called loop.sh. It accepts -x or a number as an argument. For -x is exits with exit code 1. For a number it waits num*5 seconds. In every 5 seconds it prints its PID.
The launcher process is called launch.sh:
#!/bin/bash
handle_chld() {
local tmp=()
for((i=0;i<${#pids[#]};++i)); do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
else tmp+=(${pids[i]})
fi
done
pids=(${tmp[#]})
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
For more explanation see: Starting a process from bash script failed
#/bin/bash
#pgm to monitor
tail -f /var/log/messages >> /tmp/log&
# background cmd pid
pid=$!
# loop to monitor running background cmd
while :
do
ps ax | grep $pid | grep -v grep
ret=$?
if test "$ret" != "0"
then
echo "Monitored pid ended"
break
fi
sleep 5
done
wait $pid
echo $?
I would change your approach slightly. Rather than checking every few seconds if the command is still alive and reporting a message, have another process that reports every few seconds that the command is still running and then kill that process when the command finishes. For example:
#!/bin/sh
cmd() { sleep 5; exit 24; }
cmd & # Run the long running process
pid=$! # Record the pid
# Spawn a process that coninually reports that the command is still running
while echo "$(date): $pid is still running"; do sleep 1; done &
echoer=$!
# Set a trap to kill the reporter when the process finishes
trap 'kill $echoer' 0
# Wait for the process to finish
if wait $pid; then
echo "cmd succeeded"
else
echo "cmd FAILED!! (returned $?)"
fi
Our team had the same need with a remote SSH-executed script which was timing out after 25 minutes of inactivity. Here is a solution with the monitoring loop checking the background process every second, but printing only every 10 minutes to suppress an inactivity timeout.
long_running.sh &
pid=$!
# Wait on a background job completion. Query status every 10 minutes.
declare -i elapsed=0
# `ps -p ${pid}` works on macOS and CentOS. On both OSes `ps ${pid}` works as well.
while ps -p ${pid} >/dev/null; do
sleep 1
if ((++elapsed % 600 == 0)); then
echo "Waiting for the completion of the main script. $((elapsed / 60))m and counting ..."
fi
done
# Return the exit code of the terminated background process. This works in Bash 4.4 despite what Bash docs say:
# "If neither jobspec nor pid specifies an active child process of the shell, the return status is 127."
wait ${pid}
A simple example, similar to the solutions above. This doesn't require monitoring any process output. The next example uses tail to follow output.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'sleep 30; exit 5' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh &
[1] 7454
$ pid=$!
$ wait $pid
[1]+ Exit 5 ./tmp.sh
$ echo $?
5
Use tail to follow process output and quit when the process is complete.
$ echo '#!/bin/bash' > tmp.sh
$ echo 'i=0; while let "$i < 10"; do sleep 5; echo "$i"; let i=$i+1; done; exit 5;' >> tmp.sh
$ chmod +x tmp.sh
$ ./tmp.sh
0
1
2
^C
$ ./tmp.sh > /tmp/tmp.log 2>&1 &
[1] 7673
$ pid=$!
$ tail -f --pid $pid /tmp/tmp.log
0
1
2
3
4
5
6
7
8
9
[1]+ Exit 5 ./tmp.sh > /tmp/tmp.log 2>&1
$ wait $pid
$ echo $?
5
Another solution is to monitor processes via the proc filesystem (safer than ps/grep combo); when you start a process it has a corresponding folder in /proc/$pid, so the solution could be
#!/bin/bash
....
doSomething &
local pid=$!
while [ -d /proc/$pid ]; do # While directory exists, the process is running
doSomethingElse
....
else # when directory is removed from /proc, process has ended
wait $pid
local exit_status=$?
done
....
Now you can use the $exit_status variable however you like.
With this method, your script doesnt have to wait for the background process, you will only have to monitor a temporary file for the exit status.
FUNCmyCmd() { sleep 3;return 6; };
export retFile=$(mktemp);
FUNCexecAndWait() { FUNCmyCmd;echo $? >$retFile; };
FUNCexecAndWait&
now, your script can do anything else while you just have to keep monitoring the contents of retFile (it can also contain any other information you want like the exit time).
PS.: btw, I coded thinking in bash
My solution was to use an anonymous pipe to pass the status to a monitoring loop. There are no temporary files used to exchange status so nothing to cleanup. If you were uncertain about the number of background jobs the break condition could be [ -z "$(jobs -p)" ].
#!/bin/bash
exec 3<> <(:)
{ sleep 15 ; echo "sleep/exit $?" >&3 ; } &
while read -u 3 -t 1 -r STAT CODE || STAT="timeout" ; do
echo "stat: ${STAT}; code: ${CODE}"
if [ "${STAT}" = "sleep/exit" ] ; then
break
fi
done
how about ...
# run your stuff
unset PID
for process in one two three four
do
( sleep $((RANDOM%20)); echo hello from process $process; exit $((RANDOM%3)); ) & 2>&1
PID+=($!)
done
# (optional) report on the status of that stuff as it exits
for pid in "${PID[#]}"
do
( wait "$pid"; echo "process $pid complemted with exit status $?") &
done
# (optional) while we wait, monitor that stuff
while ps --pid "${PID[*]}" --ppid "${PID[*]}" --format pid,ppid,command,pcpu
do
sleep 5
done | xargs -i date '+%x %X {}'
# return non-zero if any are non zero
SUCCESS=0
for pid in "${PID[#]}"
do
wait "$pid" && ((SUCCESS++)) && echo "$pid OK" || echo "$pid returned $?"
done
echo "success for $SUCCESS out of ${#PID} jobs"
exit $(( ${#PID} - SUCCESS ))
This may be extending beyond your question, however if you're concerned about the length of time processes are running for, you may be interested in checking the status of running background processes after an interval of time. It's easy enough to check which child PIDs are still running using pgrep -P $$, however I came up with the following solution to check the exit status of those PIDs that have already expired:
cmd1() { sleep 5; exit 24; }
cmd2() { sleep 10; exit 0; }
pids=()
cmd1 & pids+=("$!")
cmd2 & pids+=("$!")
lasttimeout=0
for timeout in 2 7 11; do
echo -n "interval-$timeout: "
sleep $((timeout-lasttimeout))
# you can only wait on a pid once
remainingpids=()
for pid in ${pids[*]}; do
if ! ps -p $pid >/dev/null ; then
wait $pid
echo -n "pid-$pid:exited($?); "
else
echo -n "pid-$pid:running; "
remainingpids+=("$pid")
fi
done
pids=( ${remainingpids[*]} )
lasttimeout=$timeout
echo
done
which outputs:
interval-2: pid-28083:running; pid-28084:running;
interval-7: pid-28083:exited(24); pid-28084:running;
interval-11: pid-28084:exited(0);
Note: You could change $pids to a string variable rather than array to simplify things if you like.

Resources