How to know if a thread uses die in Perl - multithreading

I am creating Perl threads in a "master" script that call a "slave" perl script through system calls. If this is bad, feel free to enlighten me. Sometimes the slave script being called will fail and die. How can I know this in the master script so that I can kill the master?
Is there a way I can return a message to the master thread that will indicate the slave completed properly? I understand it is not good practice to use exit in a thread though. Please help.
==================================================================================
Edit:
For clarification, I have about 8 threads that each run once. There are dependencies between them, so I have barriers that prevent certain threads from running before the initial threads are complete.
Also the system calls are done with tee, so that might be part of the reason the return value is difficult to get at.
system("((" . $cmd . " 2>&1 1>&3 | tee -a $error_log) 3>&1) > $log; echo done | tee -a $log"

The way you have described your problem, I don't think using threads is the way to go. I would be more inclined to fork. Calling 'system' is going to fork anyway.
use POSIX ":sys_wait_h";
my $childPid = fork();
if (! $childPid) {
# This is executed in the parent
# use exec rather than system, so that the child process is replaced, rather than
# forking a new subprocess (or maybe even shell) to run your child process
exec("/my/child/script") or die "Failed to run child script: $!";
}
# Code here is executed in the parent process
# you can find out what happened to the parent process by calling wait
# or waitpid. If you want to be able to continue processing in the
# parent process then call waitpid with second argument WNOHANG
# EG. inside some event loop, do this
if (waitpid($childPid, WNOHANG)) {
# $? now contains the exit status of child process
warn "Child had a problem: $?" if $?;
}

There is probably CPAN module that is well suited for what you are trying do. Maybe Proc::Daemon - Run Perl program(s) as a daemon process..

Related

Concurrency with shell scripts in failure-prone environments

Good morning all,
I am trying to implement concurrency in a very specific environment, and keep getting stuck. Maybe you can help me.
this is the situation:
-I have N nodes that can read/write in a shared folder.
-I want to execute an application in one of them. this can be anything, like a shell script, an installed code, or whatever.
-To do so, I have to send the same command to all of them. The first one should start the execution, and the rest should see that somebody else is running the desired application and exit.
-The execution of the application can be killed at any time. This is important because does not allow relying on any cleaning step after the execution.
-if the application gets killed, the user may want to execute it again. He would then send the very same command.
My current approach is to create a shell script that wraps the command to be executed. This could also be implemented in C. Not python or other languages, to avoid library dependencies.
#!/bin/sh
# (folder structure simplified for legibility)
mutex(){
lockdir=".lock"
firstTask=1 #false
if mkdir "$lockdir" &> /dev/null
then
controlFile="controlFile"
#if this is the first node, start coordinator
if [ ! -f $controlFile ]; then
firstTask=0 #true
#tell the rest of nodes that I am in control
echo "some info" > $controlFile
fi
# remove control File when script finishes
trap 'rm $controlFile' EXIT
fi
return $firstTask
}
#The basic idea is that a task executes the desire command, stated as arguments to this script. The rest do nothing
if ! mutex ;
then
exit 0
fi
#I am the first node and the only one reaching this, so I execute whatever
$#
If there are no failures, this wrapper works great. The problem is that, if the script is killed before the execution, the trap is not executed and the control file is not removed. Then, when we execute the wrapper again to restart the task, it won't work as every node will think that somebody else is running the application.
A possible solution would be to remove the control script just before the "$#" call, but that it would lead to some race condition.
Any suggestion or idea?
Thanks for your help.
edit: edited with correct solution as future reference
Your trap syntax looks wrong: According to POSIX, it should be:
trap [action condition ...]
e.g.:
trap 'rm $controlFile' HUP INT TERM
trap 'rm $controlFile' 1 2 15
Note that $controlFile will not be expanded until the trap is executed if you use single quotes.

How to kill shell script without killing currently executed line

I am running a shell script, something like sh script.sh in bash. The script contains many lines, some of which take seconds and others take days to execute. How can I kill the sh command but not kill its command currently running (the current line from the script)?
You haven't specified exactly what should happen when you 'kill' your script., but I'm assuming that you'd like the currently executing line to complete and then exit before doing any more work.
This is probably best achieved only by coding your script to behave in such a way as to receive such a kill command and respond in an appropriate way - I don't think that there is any magic to do this in linux.
for example:
You could trap a signal and then set a variable
Check for existence of a file (e.g touch /var/tmp/trigger)
Then after each line in your script, you'd need to check to see if each the trap had been called (or your trigger file created) - and then exit. If the trigger has not been set, then you continue on and do the next piece of work.
To the best of my knowledge, you can't trap a SIGKILL (-9) - if someone sends that to your process, then it will die.
HTH, Ace
The only way I can think of achieving this is for the parent process to trap the kill signal, set a flag, and then repeatedly check for this flag before executing another command in your script.
However the subprocesses need to also be immune to the kill signal. However bash seems to behave different to ksh in this manner and the below seems to work fine.
#!/bin/bash
QUIT=0
trap "QUIT=1;echo 'term'" TERM
function terminated {
if ((QUIT==1))
then
echo "Terminated"
exit
fi
}
function subprocess {
typeset -i N
while ((N++<3))
do
echo $N
sleep 1
done
}
while true
do
subprocess
terminated
sleep 3
done
I assume you have your script running for days and then you don't just want to kill it without knowing if one of its children finished.
Find the pid of your process, using ps.
Then
child=$(pgrep -P $pid)
while kill -s 0 $child
do
sleep 1
done
kill $pid

BASH: How monitor a script for execution failure

I'm using Linux to watch a script execution in order for it to be respawned when the script runs into an execution failure. Given is a simple 1-line script which should help demonstrate the problem.
Here's my script
#!/bin/bash
echo '**************************************'
echo '* Run IRC Bot *'
echo '**************************************'
echo '';
if [ -z "$1" ]
then
echo 'Example usage: ' $0 'intelbot'
fi
until `php $1.php`;
do
echo "IRC bot '$1' crashed with the code $?. Respawning.." >&2;
sleep 5
done;
What kill option should I use to say to until, hey I want this process to be killed and I want you to get it working again!
Edit
The aim here was to manually check for a script-execution failure so the IRC Bot can be re-spawned. The posted answer is very detailed so +1 to the contributor - a supervisor is indeed the best way to tackle this problem.
First -- don't do this at all; use a proper process supervision system to automate restarting your program for you, not a shell script. Your operating system will ship with one, be it SysV init's /etc/inittab (which, yes, will restart programs so listed when they exit if given an appropriate flag), or the more modern upstart (shipped with Ubuntu), systemd (shipped with current Fedora and Arch Linux), runit, daemontools, supervisord, launchd (shipped with MacOS X), etc.
Second: The backticks actually make your code behave in unpredictable ways; so does the lack of quotes on an expansion.
`php $1.php`
...does the following:
Substitutes the value of $1 into a string; let's say it's my * code.php.
String-splits that value; in this case, it would change it into three separate arguments: my, *, and code.php
Glob-expands those arguments; in this case, the * would be replaced with a separate argument for each file in the current directory
Runs the resulting program
Reads the output that program wrote to stdout, and runs that output as a separate command
Returns the exit status of that separate command.
Instead:
until php "$1.php"; do
echo "IRC bot '$1' crashed with the code $?. Respawning.." >&2;
sleep 5
done;
Now, the exit status returned by PHP when it receives a SIGTERM is something that can be controlled by PHP's signal handler -- unless you tell us how your PHP code is written, only codes which can't be handled (such as SIGKILL) will behave in a manner that's entirely consistent, and because they can't be handled, they're dangerous if your program needs to do any kind of safe shutdown or cleanup.
If you want your PHP code to install a signal handler, so you can control its exit status when signaled, see http://php.net/manual/en/function.pcntl-signal.php

Bash: Why does parent script not terminate on SIGINT when child script traps SIGINT?

script1.sh:
#!/bin/bash
./script2.sh
echo after-script
script2.sh:
#!/bin/bash
function handler {
exit 130
}
trap handler SIGINT
while true; do true; done
When I start script1.sh from a terminal, and then use Ctrl+C to send SIGINT to its process group, the signal is trapped by script2.sh and when script2.sh terminates, script1.sh prints "after-script". However, I would have expected script1.sh to immediately terminate after the line that invokes script2.sh. Why is this not the case in this example?
Additional remarks (edit):
As script1.sh and script2.sh are in the same process group, SIGINT gets sent to both scripts when Ctrl+C is pressed on the command line. That's why I wouldn't expect script1.sh to continue when script2.sh exits.
When the line "trap handler SIGINT" in script2.sh is commented out, script1.sh does exit immediately after script2.sh exists. I want to know why it behaves differently then, as script2.sh produces just the same exit code (130) then.
New answer:
This question is far more interesting than I originally suspected. The answer is essentially given here:
What happens to a SIGINT (^C) when sent to a perl script containing children?
Here's the relevant tidbit. I realize you're not using Perl, but I assume Bash is using C's convention.
Perl’s builtin system function works just like the C system(3)
function from the standard C library as far as signals are concerned.
If you are using Perl’s version of system() or pipe open or backticks,
then the parent — the one calling system rather than the one called by
it — will IGNORE any SIGINT and SIGQUIT while the children are
running.
This explanation is the best I've seen about the various choices that can be made. It also says that Bash does the WCE approach. That is, when a parent process receives SIGINT, it waits until its child process returns. If that process handled exited from a SIGINT, it also exits with SIGINT. If the child exited any other way it ignores SIGINT.
There is also a way that the calling shell can tell whether the called
program exited on SIGINT and if it ignored SIGINT (or used it for
other purposes). As in the WUE way, the shell waits for the child to
complete. It figures whether the program was ended on SIGINT and if
so, it discontinue the script. If the program did any other exit, the
script will be continued. I will call the way of doing things the
"WCE" (for "wait and cooperative exit") for the rest of this document.
I can't find a reference to this in the Bash man page, but I'll keep looking in the info docs. But I'm 99% confident this is the correct answer.
Old answer:
A nonzero exit status from a command in a Bash script does not terminate the program. If you do an echo $? after ./script2.sh it will show 130. You can terminate the script by using set -e as phs suggests.
$ help set
...
-e Exit immediately if a command exits with a non-zero status.
The second part of #seanmcl's updated answer is correct and the link to http://www.cons.org/cracauer/sigint.html is a really good one to read through carefully.
From that link, "You cannot 'fake' the proper exit status by an exit(3) with a special numeric value, even if you look up the numeric value for your system". In fact, that's what is being attempted in #Hermann Speiche's script2.sh.
One answer is to modify function handler in script2.sh as follows:
function handler {
# ... do stuff ...
trap INT
kill -2 $$
}
This effectively removes the signal handler and "rethrows" the SIGINT, causing the bash process to exit with the appropriate flags such that its parent bash process then correctly handles the SIGINT that was originally sent to it. This way, using set -e or any other hack is not actually required.
It's also worth noting that if you have an executable that behaves incorrectly when sent a SIGINT (it doesn't conform to "How to be a proper program" in the above link, e.g. it exits with a normal return-code), one way of working around this is to wrap the call to that process with a script like the following:
#!/bin/bash
function handler {
trap INT
kill -2 $$
}
trap handler INT
badprocess "$#"
The reason is your script1.sh doesn't terminate is that script2.sh is running in a subshell. To make the former script exit, you can either set -e as suggested by phs and seanmcl or force the script2.sh to run in the same shell by saying:
. ./script2.sh
in your first script. What you're observing would be apparent if you were to do set -x before executing your script. help set tells:
-x Print commands and their arguments as they are executed.
You can also let your second script send a terminating signal on its parent script by SIGHUP, or other safe and usable signals like SIGQUIT in which the parent script may consider or trap as well (sending SIGINT doesn't work).
script1.sh:
#!/bin/bash
trap 'exit 0' SIQUIT ## We could also just accept SIGHUP if we like without traps but that sends a message to the screen.
./script2.sh ## or "bash script.sh" or "( . ./script.sh; ) which would run it on another process
echo after-script
script2.sh:
#!/bin/bash
SLEEPPID=''
PID=$BASHPID
read PPID_ < <(exec ps -p "$PID" -o "$ppid=")
function handler {
[[ -n $SLEEPPID ]] && kill -s SIGTERM "$SLEEPPID" &>/dev/null
kill -s SIGQUIT "$PPID_"
exit 130
}
trap handler SIGINT
# better do some sleeping:
for (( ;; )); do
[[ -n $SLEEPPID ]] && kill -s 0 "$SLEEPPID" &>/dev/null || {
sleep 20 &
SLEEPPID=$!
}
wait
done
Your original last line in your script1.sh could have just like this as well depending on your scripts intended implementation.
./script2.sh || exit
...
Or
./script2.sh
[[ $? -eq 130 ]] && exit
...
The correct way this should work is through setpgrp(). All children of shell should be placed in the same pgrp. When SIGINT is signaled by the tty driver, it will be summarily delivered to all processes. The shell at any level should note the receipt of the signal, wait for children to exit and then kill themselves, again, with no signal handler, with sigint, so that their exit code is correct.
Additionally, when SIGINT is set to ignore at startup by their parent process, they should ignore SIGINT.
A shell should not "check if a child exited with sigint" as any part of the logic. The shell should always just honor the signal it receives directly as the reason to act and then exit.
Back in the day of real UNIX, SIGINT stopped the shell and all sub processes with a single key stroke. There was never any problem with the exit of a shell and child processes continuing to run, unless they themselves had set SIGINT to ignore.
For any shell pipeline, their should be a child process relationship created from pipelines going right to left. The right most command is the immediate child of the shell since thats the last process to exit normally. Each command line before that, is a child of the process immediately to the right of the next pipe symbol or && or || symbol. There are obvious groups of children around && and || which fall out naturally.
in the end, process groups keep things clean so that nohup works as well as all children receiving SIGINT or SIGQUIT or SIGHUP or other tty driver signals.

Making linux "Wait" command wait for ALL child processes

Wait is not waiting for all child processes to stop. This is my script:
#!/bin/bash
titlename=`echo "$#"|sed 's/\..\{3\}$//'`
screen -X title "$titlename"
/usr/lib/process.bash -verbose $#
wait
bash -c "mail.bash $#"
screen -X title "$titlename.Done"
I don't have access to /usr/lib/process.bash, but it is a script that changes frequently, so I would like to reference it... but in that script:
#!/bin/ksh
#lots of random stuff
/usr/lib/runall $path $auto $params > /dev/null 2>&1&
My problem is that runall creates a log file... and mail.bash is suppose to mail me that log file, but the wait isn't waiting for runall to finish, it seems to only be waiting for process.bash to finish. Is there anyway, without access to process.bash, or trying to keep my own up to date version of process.bash, to make the wait properly wait for runall to finish? (The log file overwrites the previous run, so I can't just check for the presence of the log file, since there is always one there)
Thanks,
Dan
(
. /usr/lib/process.bash -verbose $#
wait
)
Instead of letting the OS start process.bash, this creates a subshell, runs all the commands in process.bash as if they were entered into our shell script, and waits within that subshell.
There are some caveats to this, but it should work if you're not doing anything unusual.
wait only waits for direct children; if any children spawn their own children, it won't wait for them.
The main problem is that because process.bash has exited the runall process will be orphaned and owned by init (PID 1). If you look at the process list runall won't have any visible connection to your process any more since the intermediate process.bash script exited. There's no way to use ps --ppid or anything similar to search for this "grandchild" process once it's orphaned.
You can wait on a specific PID. Do you know the PID of the runall process? If there's only one such process you could try this, which will wait for all running runalls:
wait `pidof runall`
You could recuperate the PID of the process for whom you want to wait
And then pass this PID as an argument to the command Wait

Resources