Timing traps in a shell script - linux

I have a shell script background process that runs "nohupped". This process shall receive signals in a trap, but when playing around with some code, I noticed that some signals are ignored if the interval between them is too small. The execution of the trap function takes too much time and therefore the subsequent signal goes
unserved. Unfortunately, the trap command doesn't have some kind of signal queue, that's why I am asking: What is the best way to solve this problem?
A simple example:
function receive_signal()
{
local TIMESTAMP=`date '+%Y%m%d%H%M%S'`
echo "some text" > $TIMESTAMP
}
trap receive_signal USR1
while :
do
sleep 5
done

The easiest change, without redesigning your approach, is to use realtime signals, which queue.
This is not portable. Realtime signals themselves are an optional extension, and shell and utility support for them are not required by the extension in any case. However, it so happens that the relevant GNU utilities on Linux — bash(1) and kill(1) — do support realtime signals in a commonsense way. So, you can say:
trap sahandler RTMIN+1
and, elsewhere:
$ kill RTMIN+1 $pid_of_my_process

Did you consider multiple one line trap statements? One for each signal you want to block or process?
trap dosomething 15
trap segfault SEGV
Also you want to have the least possible code in a signal handler for the reason you just encountered.
Edit - for bash you can code your own error handling / signal handling in C, or anything else using modern signal semantics if you want with dynamically loadable modules:
http://cfajohnson.com/shell/articles/dynamically-loadable/

Related

Can a function that executes periodically and when it receives a signal be a critical region?

I have a script executed as a service with a function do_firewall that runs every 10 seconds. The same function could be called when the script process receives a SIGUSR1 signal. Is this function a critical region?
The code looks like this:
SIGUSR1=10
trap do_firewall SIGUSR1
do_firewall() {
# manage some iptables commands implemented in an idempotent
# mechanism
# is this a critical region?
}
###################################################################
# Main block
###################################################################
while true; do
# If there is no do_firewall process running
do_firewall
sleep 10
done
Now I have already verified that when SIGUSR1 catch the signal it is managed by the same process running the main loop. My doubt is if it could happen that while I am in the middle of a do_firewall function and SIGUSR1 is received, race conditions could occur in it.
I think no, but I am not 100% sure.
In case, do I have to add something like this in the do_firewall function:
do_firewall() {
(
flock -e 200
# critical region
) 200>/var/run/lockfile
}
Moreover, is 200 a safe file descriptor to use?
Yes, it's entirely possible for SIGUSR1 to arrive while executing do_firewall. It needs to be reentrant if you're going to use it as both a regular function call and a signal handler.
Moreover, is 200 a safe file descriptor to use?
Sure, use whatever you want. You don't have to go so high. It's fine to use 3. File descriptor numbers are under your script's control. 3 is most common. Some people start at 10. Bash starts at 63 and counts down if you use its >{var} syntax to auto-assign arbitrary FD numbers.

What is the actual signal behind ERR

I've read in several places (including SO) that -e is considered "poor form" and is unreliable for exiting a script on any error. A better way to handle errors seems to be using trap, as such:
trap "echo there was an error; exit 1;" ERR
I can't seem to locate in the man pages what signal ERR is actually? I'm assuming it's SIGQUIT but I can't find for sure.
man 7 signal
only has the normal signals you would expect SIGTERM SIGQUIT SIGINT, etc.
man trap
has references to ERR signal, but doesn't seem to define it.
ex: "A trap on ERR, if set, is executed before the shell exits."
man bash
is similar to man trap in that is makes references to ERR but doesn't define it from what I've seen.
What is the actual signal behind the shortcut ERR? (in normal signals as seen in man 7 signal).
I'd prefer to trap the actual signal name instead of the shorthand version, although I realize they would produce the same results (catching any error from a command in a script then throwing to the handler).
There is no signal corresponding to the trap signal specification ERR.
ERR is one of the signal specifications implemented internally by bash. [Note 1] If trap ERR is enabled, then bash will invoke the corresponding handler in exactly the same cases as it would have exited had set -e been enabled. (Consequently, it is no more "reliable" then set -e but it is a lot more flexible.)
Other special trap names which do not correspond to any signal are EXIT, DEBUG and RETURN.
help trap will explain the meaning of these signal specifications.
Notes:
Actually, all of the signal specifications are implemented by bash, but most of them are implemented by bash trapping the signal and then executing the signal handler. The special ones just involve executing the signal handler.

why is POSIX::SigSet is needed here?

!/usr/bin/env perl
use POSIX;
my $sig_set = POSIX::SigSet->new(POSIX::SIGINT);
my $sig_act = POSIX::SigAction->new(sub { print "called\n"; exit 0 },$sig_set);
POSIX::sigaction(SIGINT,$sig_act);
sleep(15);
Why do I need to use POSIX::SigSet if I already tell POSIX::sigaction that I want SIGINT?
Basically I'm trying to respond with my coderef to each of the signal I add to SigSet, looking at POSIX::sigaction signature, it must accept a singal as the first parametner, which doesnt seems reasonable to be if I already tell POSIX::SigAction about my POSIX::SigSet.
I'm sure I am missing something here.
thanks,
The answer to your question
The POSIX::SigSet specifies additional signals to mask off (to ignore) during the execution of your signal handler sub. It corresponds to the sa_mask member of the underlying struct passed to the C version of sigaction.
Now, SIGINT (well, the first argument to sigaction) will be masked off by default, unless you explicitly request otherwise via the SA_NODEFER.
A better approach?
However, if all you want to do it to register a signal handler whose execution won't be interrupted by the signal for which it was registered (e.g., don't allow SIGINT during your SIGINT handler), you can skip the POSIX module entirely:
$SIG{INT} = sub { print "called\n"; exit 0; }; # Won't be interrupted by SIGINT
Where it can, Perl's signal dispatching emulates the traditional UNIX semantics of blocking a signal during its handler execution. (And on Linux, it certainly can. sigprocmask() is called before executing the handler, and then a scope-guard function is registered to re-allow that signal at the end of the user-supplied sub.)

when a process is killed is this information recorded anywhere?

Question:
When a process is killed, is this information recorded anywhere (i.e., in kernel), such as syslog (or can be configured to be recorded syslog.conf)
Is the information of the killer's PID, time and date when killed and reason
update - you have all giving me some insight, thank you very much|
If your Linux kernel is compiled with the process accounting (CONFIG_BSD_PROCESS_ACT) option enabled, you can start recording process accounting info using the accton(8) command and use sa(8) to access the recorded info. The recorded information includes the 32 bit exit code which includes the signal number.
(This stuff is not widely known / used these days, but I still remember it from the days of 4.x Bsd on VAXes ...)
Amended:
In short, the OS kernel does not care if the process is killed. That is dependant on whether the process logs anything. All the kernel cares about at this stage is reclaiming memory. But read on, on how to catch it and log it...
As per caf and Stephen C's mention on their comments...
If you are running BSD accounting daemon module in the kernel, everything gets logged. Thanks to Stephen C for pointing this out! I did not realize that functionality as I have this switched off/disabled.
In my hindsight, as per caf's comment - the two signals that cannot be caught are SIGKILL and SIGSTOP, and also the fact that I mentioned atexit, and I described in the code, that should have been exit(0);..ooops Thanks caf!
Original
The best way to catch the kill signal is you need to use a signal handler to handle a few signals , not just SIGKILL on its own will suffice, SIGABRT (abort), SIGQUIT (terminal program quit), SIGSTOP and SIGHUP (hangup). Those signals together is what would catch the command kill on the command line. The signal handler can then log the information stored in /var/log/messages (environment dependant or Linux distribution dependant). For further reference, see here.
Also, see here for an example of how to use a signal handler using the sigaction function.
Also it would be a good idea to adopt the usage of atexit function, then when the code exits at runtime, the runtime will execute the last function before returning back to the command line. Reference for atexit is here.
When the C function exit is used, and executed, the atexit function will execute the function pointer where applied as in the example below. - Thanks caf for this!
An example usage of atexit as shown:
#include <stdlib.h>
int main(int argc, char **argv){
atexit(myexitfunc); /* Beginning, immediately right after declaration(s) */
/* Rest of code */
return 0;
exit(0);
}
int myexitfunc(void){
fprintf(stdout, "Goodbye cruel world...\n");
}
Hope this helps,
Best regards,
Tom.
I don't know of any logging of signals sent to processes, unless the OOM killer is doing it.
If you're writing your own program you can catch the kill signal and write to a logfile before actually dying. This doesn't work with kill -9 though, just the normal kill.
You can see some details over thisaway.
If you use sudo, it will be logged. Other than that, the killed process can log some information (unless it's being terminated with extreme prejudice). You could even hack the kernel to log signals.
As for recording the reason a process was killed, I've yet to see a psychic program.
Kernel hacking is not for the weak of heart, but hella fun. You'd need to patch the signal dispatch routines to log information using printk(9) when kill(3), sigsend(2) or the like is called. Read "The Linux Signals Handling Model" for more information on how signals are handled.
If the process is getting it via kill(2), then unless the process is already logging the only external trace would be a kernel mod. It's pretty simple; just do a printk(), it's like printf(). Find the output in dmesg.
If the process is getting it via /bin/kill, then it would be a relatively easy matter to install a wrapper executable that did logging. But this (signal delivery via /bin/kill) is unlikely because kill is also a bash built-in.
By the way, if a process is killed with a signal is announced by the kernel to the parent process via de wait(2) system call. The value returned by this call is the exit status of the child (the lower byte) and some signal related info in the upper byte in case this process has been killed. See wait(2) for more information.

Broken pipe no longer ends programs?

When you pipe two process and kill the one at the "output" of the pipe, the first process used to receive the "Broken Pipe" signal, which usually terminated it aswell. E.g. running
$> do_something_intensive | less
and then exiting less used to return you immediately to a responsive shell, on a SuSE8 or former releases.
when i'm trying that today, do_something_intensive is obviously still running until i kill it manually. It seems that something has changed (glib ? shell ?) that makes program ignore "broken pipes" ...
Anyone of you has hints on this ? how to restore the former behaviour ? why it has been changed (or why it always existed multiple semantics) ?
edit : further tests (using strace) reveal that "SIGPIPE" is generated, but that the program is not interrupted. A simple
#include <stdio.h>
int main()
{
while(1) printf("dumb test\n");
exit(0);
}
will go on with an endless
--- SIGPIPE (Broken pipe) # 0 (0) ---
write(1, "dumb test\ndumb test\ndumb test\ndu"..., 1024) = -1 EPIPE (Broken pipe)
when less is killed. I could for sure program a signal handler in my program and ensure it terminates, but i'm more looking for some environment variable or a shell option that would force programs to terminate on SIGPIPE
edit again: it seems to be a tcsh-specific issue (bash handles it properly) and terminal-dependent (Eterm 0.9.4)
Well, if there is an attempt to write to a pipe after the reader has gone away, a SIGPIPE signal gets generated. The application has the ability to catch this signal, but if it doesn't, the process is killed.
The SIGPIPE won't be generated until the calling process attempts to write, so if there's no more output, it won't be generated.
Has "do something intensive" changed at all?
As Daniel has mentioned SIGPIPE is not a magic "your pipe went away" signal but rather a "nice try, you can no longer read/write that pipe" signal.
If you have control of "do something intensive" you could change it to write out some "progress indicator" output as it spins. This would raise the SIGPIPE in a timely fashion.
Thanks for your advices, the solution is getting closer...
According to the manpage of tcsh, "non-login shells inherit the terminate behavior from their parents. Other signals have the values which the shell inherited from its parent."
Which suggest my terminal is actually the root of the problem ... if it ignored SIGPIPE, the shell itself will ignore SIGPIPE as well ...
edit: i have the definitive confirmation that the problem only arise with Eterm+tcsh and found a suspiciously missing signal(SIGPIPE,SIG_DFL) in Eterm source code. I think that close the case.

Resources