Linux: get the exit code from the kill command - linux

If I send a SIGTERM signal to a process using the kill command, I expect an exit code, but I always get 0 (zero) when running the below command after killing a process:
echo $?
According to the answer in this post, I should get 143 when sending a SIGTERM to a process: Always app Java end with "Exit 143" Ubuntu
But I donĀ“t get that exit code. Why?

The exit code you get is for the kill command itself. 0 means it succeeded, i.e. the other process got the signal you sent it. kill just doesn't report the exit status for the process, since it can't even be sure the other process exited as a result of the signal it sent.

Related

The Linux timeout command and exit codes

In a Linux shell script I would like to use the timeout command to end another command if some time limit is reached. In general:
timeout -s SIGTERM 100 command
But I also want that my shell script exits when the command is failing for some reason. If the command is failing early enough, the time limit will not be reached, and timeout will exit with exit code 0. Thus the error cannot be trapped with trap or set -e, as least I have tried it and it did not work. How can I achieve what I want to do?
Your situation isn't very clear because you haven't included your code in the post.
timeout does exit with the exit code of the command if it finishes before the timeout value.
For example:
timeout 5 ls -l non_existent_file
# outputs ERROR: ls: cannot access non_existent_file: No such file or directory
echo $?
# outputs 2 (which is the exit code of ls)
From man timeout:
If the command times out, and --preserve-status is not set, then
exit with status 124. Otherwise, exit with the status of COMMAND. If
no signal is specified, send the TERM signal upon timeout. The TERM
signal kills any process that does not block or catch that signal.
It may be necessary to use the KILL (9) signal, since this signal
cannot be caught, in which case the exit status is 128+9 rather than
124.
See BashFAQ105 to understand the pitfalls of set -e.

ctrl+c to kill a bash script with child processes

I have a script whose internals boil down to:
trap "exit" SIGINT SIGTERM
while :
do
mplayer sound.mp3
sleep 3
done
(yes, it is a bit more meaningful than the above, but that's not relevant to the problem). Several instances of the script may be running at the same time.
Sometimes I want to ^C the script... but that does not succeed. As I understand, when ^C kills mplayer, it continues to sleep, and when ^C kills sleep, it continues to mplayer, and I never happen to catch it in between. As I understand, trap just never works.
How do I terminate the script?
You can get the PID of mplayer and upon trapping send the kill signal to mplayer's PID.
function clean_up {
# Perform program exit housekeeping
KILL $MPLAYER_PID
exit
}
trap clean_up SIGHUP SIGINT SIGTERM
mplayer sound.mp3 &
MPLAYER_PID=$!
wait $MPLAYER_PID
mplayer returns 1 when it is stopped with Ctrl-C so:
mplayer sound.mp3 || break
will do the work.
One issue of this method is that if mplayer exits 1 for another reason (i.e., sound file has a bad format), it will exit anyway, and it's maybe not the desired behaviour.

Starting a process from bash script failed

I have a central server where I periodically start a script (from cron) which checks remote servers. The check is performed serially, so first, one server then another ... .
This script (from the central server) starts another script(lets call it update.sh) on the remote machine, and that script(on the remote machine) is doing something like this:
processID=`pgrep "processName"`
kill $processID
startProcess.sh
The process is killed and then in the script startProcess.sh started like this:
pidof "processName"
if [ ! $? -eq 0 ]; then
nohup "processName" "processArgs" >> "processLog" &
pidof "processName"
if [! $? -eq 0]; then
echo "Error: failed to start process"
...
The update.sh, startprocess.sh and the actual binary of the process that it starts is on a NFS mounted from the central server.
Now what happens sometimes, is that the process that I try to start within the startprocess.sh is not started and I get the error. The strange part is that it is random, sometime the process on one machine starts and another time on that same machine doesn't start. I'm checking about 300 servers and the errors are always random.
There is another thing, the remote servers are at 3 different geo locations (2 in America and 1 in Europe), the central server is in Europe. From what I discover so far is that the servers in America have much more errors than those in Europe.
First I thought that the error has to have something to do with kill so I added a sleep between the kill and the startprocess.sh but that didn't make any difference.
Also it seems that the process from startprocess.sh is not started at all, or something happens to it right when it is being started, because there is no output in the logfile and there should be an output in the logfile.
So, here I'm asking for help
Does anybody had this kind of problem, or know what might be wrong?
Thanks for any help
(Sorry, but my original answer was fairly wrong... Here is the correction)
Using $? to get the exit status of the background process in startProcess.sh leads to wrong result. Man bash states:
Special Parameters
? Expands to the status of the most recently executed foreground
pipeline.
As You mentioned in your comment the proper way of getting the background process's exit status is using the wait built in. But for this bash has to process the SIGCHLD signal.
I made a small test environment for this to show how it can work:
Here is a script loop.sh to run as a background process:
#!/bin/bash
[ "$1" == -x ] && exit 1;
cnt=${1:-500}
while ((++c<=cnt)); do echo "SLEEPING [$$]: $c/$cnt"; sleep 5; done
If the arg is -x then it exits with exit status 1 to simulate an error. If arg is num, then waits num*5 seconds printing SLEEPING [<PID>] <counter>/<max_counter> to stdout.
The second is the launcher script. It starts 3 loop.sh scripts in the background and prints their exit status:
#!/bin/bash
handle_chld() {
local tmp=()
for i in ${!pids[#]}; do
if [ ! -d /proc/${pids[i]} ]; then
wait ${pids[i]}
echo "Stopped ${pids[i]}; exit code: $?"
unset pids[i]
fi
done
}
set -o monitor
trap "handle_chld" CHLD
# Start background processes
./loop.sh 3 &
pids+=($!)
./loop.sh 2 &
pids+=($!)
./loop.sh -x &
pids+=($!)
# Wait until all background processes are stopped
while [ ${#pids[#]} -gt 0 ]; do echo "WAITING FOR: ${pids[#]}"; sleep 2; done
echo STOPPED
The handle_chld function will handle the SIGCHLD signals. Setting option monitor enables for a non-interactive script to receive SIGCHLD. Then the trap is set for SIGCHLD signal.
Then background processes are started. All of their PIDs are remembered in pids array. If SIGCHLD is received then it is checked amongst the /proc/ directories which child process was stopped (the missing one) (it could be also checked using kill -0 <PID> bash built-in). After wait the exit status of the background process is stored in the famous $? pseudo variable.
The main script waits for all pids to stop (otherwise it could not get the exit status of its children) and the it stops itself.
An example output:
WAITING FOR: 13102 13103 13104
SLEEPING [13103]: 1/2
SLEEPING [13102]: 1/3
Stopped 13104; exit code: 1
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13103]: 2/2
SLEEPING [13102]: 2/3
WAITING FOR: 13102 13103
WAITING FOR: 13102 13103
SLEEPING [13102]: 3/3
Stopped 13103; exit code: 0
WAITING FOR: 13102
WAITING FOR: 13102
WAITING FOR: 13102
Stopped 13102; exit code: 0
STOPPED
It can be seen that the exit codes are reported correctly.
I hope this can help a bit!

Signal handling in a shell script

Following is a shell script (myscript.sh) I have:
#!/bin/bash
sleep 500 &
Aprogram arg1 arg2 # Aprogram is a program which runs for an hour.
echo "done"
I launched this in one terminal, and from another terminal I issued 'kill -INT 12345'. 12345 is the pid of myscript.sh.
After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
Can anyone explain why is this behavior?
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening? Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?
You need to use trap to catch signals:
To just ignore SIGINT use:
trap '' 2
if you want to specify some special action for this you can make it that in line:
trap 'some commands here' 2
or better wrap it into a function
function do_for_sigint() {
...
}
trap 'do_for_sigint' 2
and if you wish to allow your script to finish all it's tasks first:
keep_running="yes"
trap 'keep_running="no"' 2
while [ $keep_running=="yes" ]; do
# main body of your script here
done
You start sleep in the background. As such, it is not killed when you kill the script.
If you want to kill sleep too when the script is terminated, you'd need to trap it.
sleep 500 &
sid=($!) # Capture the PID of sleep
trap "kill ${sid[#]}" INT # Define handler for SIGINT
Aprogram arg1 arg2 & # Aprogram is a program which runs for an hour.
sid+=($!)
echo "done"
Now sending SIGINT to your script would cause sleep to terminate as well.
After a while I can see that both myscript.sh and Aprogram have been dead. However 'sleep 500 &' is still running.
As soon as Aprogram is finished myscript.sh prints "Done" and is also finised. sleep 500 gets process with PID 1 as a parent. That is it.
Can anyone explain why is this behavior?
SIGINT is not deliverd to Aprogram when myscript.sh gets it. Use strace to make sure that Aprogram does not receive a signal.
Also, when I issued SIGINT signal to the 'myscript.sh' what exactly is happening?
I first thought that it is the situation like when a user presses Ctrl-C and read this http://www.cons.org/cracauer/sigint.html. But it is not exactly the same situation. In your case shell received SIGINT but the child process didn't. However, shell had at that moment a child process and it did not do anything and kept waiting for a child. This is strace output on my computer after sending SIGINT to a shell script waiting for a child process:
>strace -p 30484
Process 30484 attached - interrupt to quit
wait4(-1, 0x7fffc0cd9abc, 0, NULL) = ? ERESTARTSYS (To be restarted)
--- SIGINT (Interrupt) # 0 (0) ---
rt_sigreturn(0x2) = -1 EINTR (Interrupted system call)
wait4(-1,
Why is 'Aprogram' getting killed and why not 'sleep' ? How is the signal INT getting transmitted to it's child processes?
As far as I can see with strace a child program like your Aprogram is not getting killed. It did not receive SIGINT and finished normally. As soon as it finished your shell script also finished.

Detect when process quits or is being killed due out of memory

My bash script is running some program in background and with wait command waits for it to stop. But there is a high possibility that the background process will be killed because it takes too much memory. I want my script to react differently for a process that ended up gently and for a killed one. How do I check this condition?
Make sure your command signals success (with exit code 0) when it succeeds, and failure (non-zero) when it fails.
When a process is killed with SIGKILL by the OOM killer, signaling failure is automatic. (The shell will consider the exit code of signal terminated processes to be 128 + the signal number, so 128+9=137 for SIGKILL).
You then use the fact that wait somepid exits with the same code as the command it waits on in an if statement:
yourcommand &
pid=$!
....
if wait $pid
then
echo "It exited successfully"
else
echo "It exited with failure"
fi
usually they shutdown with a signal, try to have some signal hander function to handle unpredictable shutdowns, or worst case have another monitoring process, like a task manager.
did you try anything?
by the way some signals cant be handled, like segmentation faults, SIGSEGV
Simpler solution is
yourcommand
if [ $? -eq 0 ] ; then
echo "It exited successfully"
else
echo "It exited with failure, exitcode $?"
fi

Resources