Broken pipe no longer ends programs? - linux

When you pipe two process and kill the one at the "output" of the pipe, the first process used to receive the "Broken Pipe" signal, which usually terminated it aswell. E.g. running
$> do_something_intensive | less
and then exiting less used to return you immediately to a responsive shell, on a SuSE8 or former releases.
when i'm trying that today, do_something_intensive is obviously still running until i kill it manually. It seems that something has changed (glib ? shell ?) that makes program ignore "broken pipes" ...
Anyone of you has hints on this ? how to restore the former behaviour ? why it has been changed (or why it always existed multiple semantics) ?
edit : further tests (using strace) reveal that "SIGPIPE" is generated, but that the program is not interrupted. A simple
#include <stdio.h>
int main()
{
while(1) printf("dumb test\n");
exit(0);
}
will go on with an endless
--- SIGPIPE (Broken pipe) # 0 (0) ---
write(1, "dumb test\ndumb test\ndumb test\ndu"..., 1024) = -1 EPIPE (Broken pipe)
when less is killed. I could for sure program a signal handler in my program and ensure it terminates, but i'm more looking for some environment variable or a shell option that would force programs to terminate on SIGPIPE
edit again: it seems to be a tcsh-specific issue (bash handles it properly) and terminal-dependent (Eterm 0.9.4)

Well, if there is an attempt to write to a pipe after the reader has gone away, a SIGPIPE signal gets generated. The application has the ability to catch this signal, but if it doesn't, the process is killed.
The SIGPIPE won't be generated until the calling process attempts to write, so if there's no more output, it won't be generated.

Has "do something intensive" changed at all?
As Daniel has mentioned SIGPIPE is not a magic "your pipe went away" signal but rather a "nice try, you can no longer read/write that pipe" signal.
If you have control of "do something intensive" you could change it to write out some "progress indicator" output as it spins. This would raise the SIGPIPE in a timely fashion.

Thanks for your advices, the solution is getting closer...
According to the manpage of tcsh, "non-login shells inherit the terminate behavior from their parents. Other signals have the values which the shell inherited from its parent."
Which suggest my terminal is actually the root of the problem ... if it ignored SIGPIPE, the shell itself will ignore SIGPIPE as well ...
edit: i have the definitive confirmation that the problem only arise with Eterm+tcsh and found a suspiciously missing signal(SIGPIPE,SIG_DFL) in Eterm source code. I think that close the case.

Related

In Linux, how can I wait until a process I didn't start finishes?

I have a monitoring program that I'd like to check on various processes in the system, and know when they terminate. I'd also like to know their exit code, in case they crash. However, my program is not a parent of the processes to be monitored.
In Windows, this is easy: OpenProcess for SYNCHRONIZE rights, WaitForMultipleObjectsEx to wait for any of them to terminate, then GetExitCodeProcess to find out why it terminated (with NTSTATUS error codes if the reason was an exception).
But in Linux, the equivalent of these, waitpid, only works on your own child processes, not unrelated processes. We tried ptrace, but this caused its own issues, such as greatly slowing down signal processing.
This program is intended to run as root.
Is there a way to implement this, other than just polling /proc/12345 until it disappears?
Can't think of an easy way to collect the termination statuses, but as for simple death events, you can, as root, inject an open call to a file you'll have the other end of and then you can do select on your end of the file descriptor.
When the other end dies, it'll generate a close event on the filedescriptor you have the other end of.
A (very ugly) example:
mkfifo /tmp/fifo #A channel to communicate death events
sleep 1000 & #Simulate your victim process
echo $! #Make note of the pid you want
#In another terminal
sudo gdb -ex "attach $thePid" -ex ' call open("/tmp/fifo",0,0)' -ex 'quit'
exec 3>/tmp/fifo
ruby -e 'fd = IO.select([IO.for_fd(3)]); puts "died" '
#In yet another terminal
kill $thePid #the previous terminal will print `died` immediately
#even though it's not the parent of $thePid

Signal handling: printing something and then taking the default behaviour

My requirement is that whenever a program terminates in any way other than its normal completion [i.e. exit() system call at the end], I need to handle it (say, hook a print "Hello" statement) before it actually terminates.
For example, when I hit Ctrl+C while running a program, it should print Hello and continue the way a SIGINT must have been handled.
If I use my custom signal handler function (having a print statement logic) in my source code, it would alter the default behavior i.e. how SIGINT would have ideally terminated.
1) Can anyone help me achieve both of this? What other signals I need to handle explicitly (maybe SIGTERM ?) which can cause termination of a running process?
2) How can I generate/test them ?(say, SIGINT can be generated by hitting Ctrl + C in linux)
there are several signals supported in unix/linux.
Except SIGKILL, SIGSTOP, all others can be interpreted and handled.
process of registering the handler to a particular signumber should be same.
We can use kill command to send signals to other process.
For example: it sends TERM signal to processid 1234
kill -s TERM 1234
The sigaction(2) man page has some useful info. For one thing, every signal but SIGKILL and SIGSTOP can be caught.
In your signal handler, you have two options:
puts(3) and then manually do something (exit() or raise(SIGSTOP) or something).
puts(3) and then try to get the default signal behaviour by setting the handler back to SIG_DFL, and sending the signal to yourself with raise(3). I'm not sure whether you can just sigaction() to restore your signal handler right after raise() from inside that signal handler, and whether that would be portable even if it happens to work on Linux.
List all signals with kill -l
Send a signal with kill -INT 1234, or in the shell you started a background process from: kill -INT %1. Or to avoid copy/pasting a PID every time: pkill -INT process_name (pkill and pgrep are related.)

Preventing threaded subprocess.popen from terminating my main script when child is killed?

Python 2.7.3 on Solaris 10
Questions
When my subprocess has an internal Segmentation Fault(core) issue or a user externally kills it from the shell with a SIGTERM or SIGKILL, my main program's signal handler handles a SIGTERM(-15) and my parent program exits. Is this real? or is it a bad python build?
Background and Code
I have a python script that first spawns a worker management thread. The worker management thread then spawns one or more worker threads. I have other stuff going on in my main thread that I cannot block. My management thread stuff and worker threads are rock-solid. My services run for years without restarts but then we have this subprocess.Popen scenario:
In the run method of the worker thread, I am using:
class workerThread(threading.Thread):
def __init__(self) :
super(workerThread, self).__init__()
...
def run(self)
...
atempfile = tempfile.NamedTempFile(delete=False)
myprocess = subprocess.Popen( ['third-party-cmd', 'with', 'arguments'], shell=False, stdin=subprocess.PIPE, stdout=atempfile, stderr=subprocess.STDOUT,close_fds=True)
...
I need to use myprocess.poll() to check for process termination because I need to scan the atempfile until I find relevant information (the file may be > 1 GiB) and I need to terminate the process because of user request or because the process has been running too long. Once I find what I am looking for, I will stop checking the stdout temp file. I will clean it up after the external process is dead and before the worker thread terminates. I need the stdin PIPE in case I need to inject a response to something interactive in the child's stdin stream.
In my main program, I set a SIGINT and SIGTERM handler for me to perform cleanup, if my main python program is terminated with SIGTERM or SIGINT(Ctrl-C) if running from the shell.
Does anyone have a solid 2.x recipe for child signal handling in threads?
ctypes sigprocmask, etc.
Any help would be very appreciated. I am just looking for an 'official' recipe or the BEST hack, if one even exists.
Notes
I am using a restricted build of Python. I must use 2.7.3. Third-party-cmd is a program I do not have source for - modifying it is not possible.
There are many things in your description that look strange. First thing, you have a couple of different threads and processes. Who is crashing, who's receinving SIGTERM and who's receiving SIGKILL and due to which operations ?
Second: why does your parent receive SIGTERM ? It can't be implicitly sent. Someone is calling kill to your parent process, either directly or indirectly (for example, by killing the whole parent group).
Third point: how's your program terminating when you're handling SIGTERM ? By definition, the program terminates if it's not handled. If it's handled, it's not terminated. What's really happenning ?
Suggestions:
$ cat crsh.c
#include <stdio.h>
int main(void)
{
int *f = 0x0;
puts("Crashing");
*f = 0;
puts("Crashed");
return 0;
}
$ cat a.py
import subprocess, sys
print('begin')
p = subprocess.Popen('./crsh')
a = raw_input()
print(a)
p.wait()
print('end')
$ python a.py
begin
Crashing
abcd
abcd
end
This works. No signal delivered to the parent. Did you isolate the problem in your program ?
If the problem is a signal sent to multiple processes: can you use setpgid to set up a separate process group for the child ?
Is there any reason for creating the temporary file ? It's 1 GB files being created in your temporary directory. Why not piping stdout ?
If you're really sure you need to handle signals in your parent program (why didn't you try/except KeyboardInterrupt, for example ?): could signal() unspecified behavior with multi threaded programs be causing those problems (for example, dispatching a signal to a thread that does not handle signals) ?
NOTES
The effects of signal() in a multithreaded process are unspecified.
Anyway, try to explain with more precision what are the threads and process of your program, what they do, how were the signal handlers set up and why, who is sending signals, who is receiving, etc, etc, etc, etc, etc.

Timing traps in a shell script

I have a shell script background process that runs "nohupped". This process shall receive signals in a trap, but when playing around with some code, I noticed that some signals are ignored if the interval between them is too small. The execution of the trap function takes too much time and therefore the subsequent signal goes
unserved. Unfortunately, the trap command doesn't have some kind of signal queue, that's why I am asking: What is the best way to solve this problem?
A simple example:
function receive_signal()
{
local TIMESTAMP=`date '+%Y%m%d%H%M%S'`
echo "some text" > $TIMESTAMP
}
trap receive_signal USR1
while :
do
sleep 5
done
The easiest change, without redesigning your approach, is to use realtime signals, which queue.
This is not portable. Realtime signals themselves are an optional extension, and shell and utility support for them are not required by the extension in any case. However, it so happens that the relevant GNU utilities on Linux — bash(1) and kill(1) — do support realtime signals in a commonsense way. So, you can say:
trap sahandler RTMIN+1
and, elsewhere:
$ kill RTMIN+1 $pid_of_my_process
Did you consider multiple one line trap statements? One for each signal you want to block or process?
trap dosomething 15
trap segfault SEGV
Also you want to have the least possible code in a signal handler for the reason you just encountered.
Edit - for bash you can code your own error handling / signal handling in C, or anything else using modern signal semantics if you want with dynamically loadable modules:
http://cfajohnson.com/shell/articles/dynamically-loadable/

when a process is killed is this information recorded anywhere?

Question:
When a process is killed, is this information recorded anywhere (i.e., in kernel), such as syslog (or can be configured to be recorded syslog.conf)
Is the information of the killer's PID, time and date when killed and reason
update - you have all giving me some insight, thank you very much|
If your Linux kernel is compiled with the process accounting (CONFIG_BSD_PROCESS_ACT) option enabled, you can start recording process accounting info using the accton(8) command and use sa(8) to access the recorded info. The recorded information includes the 32 bit exit code which includes the signal number.
(This stuff is not widely known / used these days, but I still remember it from the days of 4.x Bsd on VAXes ...)
Amended:
In short, the OS kernel does not care if the process is killed. That is dependant on whether the process logs anything. All the kernel cares about at this stage is reclaiming memory. But read on, on how to catch it and log it...
As per caf and Stephen C's mention on their comments...
If you are running BSD accounting daemon module in the kernel, everything gets logged. Thanks to Stephen C for pointing this out! I did not realize that functionality as I have this switched off/disabled.
In my hindsight, as per caf's comment - the two signals that cannot be caught are SIGKILL and SIGSTOP, and also the fact that I mentioned atexit, and I described in the code, that should have been exit(0);..ooops Thanks caf!
Original
The best way to catch the kill signal is you need to use a signal handler to handle a few signals , not just SIGKILL on its own will suffice, SIGABRT (abort), SIGQUIT (terminal program quit), SIGSTOP and SIGHUP (hangup). Those signals together is what would catch the command kill on the command line. The signal handler can then log the information stored in /var/log/messages (environment dependant or Linux distribution dependant). For further reference, see here.
Also, see here for an example of how to use a signal handler using the sigaction function.
Also it would be a good idea to adopt the usage of atexit function, then when the code exits at runtime, the runtime will execute the last function before returning back to the command line. Reference for atexit is here.
When the C function exit is used, and executed, the atexit function will execute the function pointer where applied as in the example below. - Thanks caf for this!
An example usage of atexit as shown:
#include <stdlib.h>
int main(int argc, char **argv){
atexit(myexitfunc); /* Beginning, immediately right after declaration(s) */
/* Rest of code */
return 0;
exit(0);
}
int myexitfunc(void){
fprintf(stdout, "Goodbye cruel world...\n");
}
Hope this helps,
Best regards,
Tom.
I don't know of any logging of signals sent to processes, unless the OOM killer is doing it.
If you're writing your own program you can catch the kill signal and write to a logfile before actually dying. This doesn't work with kill -9 though, just the normal kill.
You can see some details over thisaway.
If you use sudo, it will be logged. Other than that, the killed process can log some information (unless it's being terminated with extreme prejudice). You could even hack the kernel to log signals.
As for recording the reason a process was killed, I've yet to see a psychic program.
Kernel hacking is not for the weak of heart, but hella fun. You'd need to patch the signal dispatch routines to log information using printk(9) when kill(3), sigsend(2) or the like is called. Read "The Linux Signals Handling Model" for more information on how signals are handled.
If the process is getting it via kill(2), then unless the process is already logging the only external trace would be a kernel mod. It's pretty simple; just do a printk(), it's like printf(). Find the output in dmesg.
If the process is getting it via /bin/kill, then it would be a relatively easy matter to install a wrapper executable that did logging. But this (signal delivery via /bin/kill) is unlikely because kill is also a bash built-in.
By the way, if a process is killed with a signal is announced by the kernel to the parent process via de wait(2) system call. The value returned by this call is the exit status of the child (the lower byte) and some signal related info in the upper byte in case this process has been killed. See wait(2) for more information.

Resources