What was the reason my process disappeared?

What was the reason my process disappeared? - linux

I experienced a problem with my program at the customer site. It seems that the process is suddenly disappearing. I'm trying to find out why. Program is written in C++ and running on modern Linux systems (RHEL/Centos).
What I checked so far:
program prints nothing on standard output or standard error, which it does when exception is thrown as my handler prints backtrace before aborting
dmesg does not include anything meaningful (like OOM killer message or any other information the process was killed).
I have very limited access to the customer environment and I asked them to run a gdb and provide us with the log. The gdb script attaches to the process, catches throw, signals: SIGTERM SIGUSR1 SIGUSR2 SIGINT SIGSEGV SIGABRT SIGBUS SIGILL SIGQUIT as well puts a breakpoint on exit and _exit. The gdb log also does not contain any information regarding process catching any of these, nor receiving SIGKILL (and I believe this would normally be logged).
Any other ideas what else I could check?

Related

Disabling SIGABRT for a program run (Valgrind)

I have the task to debug a program using Valgrind. The program becomes very slow due to the Valgrind usage. This is a problem, because the program has a watcher thread that kills slow threads with SIGABRT if they spend too much time in certain functions. The program is in a valid state when it exits in that way, so I would like to keep it running even if SIGABRT is cast. I cannot change the program to switch off the watcher thread from the source code.
Now my question:
Does Valgrind, or a tool compatible with Valgrind, give me the option to say to the program: "If you receive SIGABRT, treat it as a null op and go on?"

You might achieve what you want by running your program under valgrind + gdb, using vgdb.
With gdb, you can then control what to do with the SIGABRT signal.
For example, launch your program with:
valgrind --vgdb-stop-at=startup your_program
In another window, launch gdb:
(gdb) handle SIGABRT nostop print nopass
(gdb) target remote | vgdb
(gdb) continue
See http://www.valgrind.org/docs/manual/manual-core-adv.html#manual-core-adv.gdbserver for more information.

Long Running Python Script in VSCode Exits with 'Polite quit request'

I have a long running Python script which is running in Visual Studio Code.
After a while the script stops running, there are no errors just this statement:
"fish: “/usr/bin/python3 /home/ubuntu/.…” terminated by signal SIGTERM (Polite quit request)"
What is happening here?

If a process recieves SIGTERM, some other process sent that signal. That is what happened in your case.
The SIGTERM signal is sent to a process to request its termination. Unlike the SIGKILL signal, it can be caught and interpreted or ignored by the process. This allows the process to perform nice termination releasing resources and saving state if appropriate. SIGINT is nearly identical to SIGTERM.
SIGTERM is not sent automatically by the system. There are a few signals that are sent automatically like SIGHUP when a terminal goes away, SIGSEGV/SIGBUS/SIGILL when a process does things it shouldn't be doing, SIGPIPE when it writes to a broken pipe/socket, etc.
SIGTERM is the signal that is typically used to administratively terminate a process.
That's not a signal that the kernel would send, but that's the signal a process would typically send to terminate (gracefully) another process. It is sent by default by the kill, pkill, killall, fuser -k commands.
Possible reasons why your process recieved such signal are:
execution of the process takes too long
insufficient memory or system resources to continue the execution of the process
But these are some possibilities. In your case, the root of the issue might be related with something different. You can avoid from a SIGTERM signal by telling the procces to ignore the signal but it is not suggested to do.
Refer to this link for more information.
Check this similar question for additional information.

SIGABRT vs. ENOTSUP

On OpenBSD:
I want to harden an OpenBSD install. For this imho:
sysctl -w kern.wxabort=1
would be more secure, the default is 0.
W^X violations are no longer permitted by default. A kernel log message
is generated, and mprotect/mmap return ENOTSUP. If the sysctl(8) flag
kern.wxabort is set then a SIGABRT occurs instead, for gdb use or
coredump creation.
so:
SIGABRT Abnormal termination
ENOTSUP Operation not supported (POSIX.1)
so for me (not a programmer) means that maybe SIGABRT is better, since it will kill (?) the process, not just an informational message. From security perspective, killing the badly behaving process is more secure.
Question: Is this true? Is using SIGABRT is more secure? Does SIGABRT really kills the process? Or they (SIGABRT vs. ENOTSUP) are almost the same and doesn't kill the process?

Preventing the operation is where you get security. Killing the process is bonus punishment. We're talking about processes not people, though, so punishment isn't necessary.
The question is whether the processes you're interested in handle errors well. If getting an error code back causes them to derail and do undesirable things, then you may want to send them a signal. Or, as the documentation says, if you want a coredump or want to break in with a debugger, SIGABRT would be useful.
Keep in mind that SIGABRT can be caught. Processes can ignore the signal if they want.
Bottom line, there's no real added security from enabling this option.

Where linux signals are sent or processed inside the kernel?

How is the signalling(interrupts) mechanism handled in kernel? The cause why I ask is: somehow a SIGABRT signal is received by my application and I want to find where does that come from..

You should be looking in your application for the cause, not in the kernel.
Usually a process receives SIGABRT when it directly calls abort or when an assert fails. Finding exactly the piece of the kernel that delivers the signal will gain you nothing.
In conclusion, your code or a library your code is using is causing this. See abort(3) and assert.

cnicutar's answer is the best guess IMHO.
It is possible that the signal has been emitted by another process, although in the case of SIGBART it most likely to be emitted by the same process which receives it via the abort(3) libc function.
In doubt, you can run your application with strace -e kill yourapp you args ... to quickly check if that kill system call is indeed invoked from within your program or dependent libraries. Or use gdb catch syscall.
Note that in some cases the kernel itself can emit signals, such as a SIGKILL when the infamous "OOM killer" goes into action.
BTW, signals are delivered asynchronously, they disrupt the normal workflow of your program. This is why they're painful to trace. Besides machinery such as SystemTap I don't know how to trace or log signals emission and delivery within the kernel.

How does SIGINT relate to the other termination signals such as SIGTERM, SIGQUIT and SIGKILL?

On POSIX systems, termination signals usually have the following order (according to many MAN pages and the POSIX Spec):
SIGTERM - politely ask a process to terminate. It shall terminate gracefully, cleaning up all resources (files, sockets, child processes, etc.), deleting temporary files and so on.
SIGQUIT - more forceful request. It shall terminate ungraceful, still cleaning up resources that absolutely need cleanup, but maybe not delete temporary files, maybe write debug information somewhere; on some system also a core dump will be written (regardless if the signal is caught by the app or not).
SIGKILL - most forceful request. The process is not even asked to do anything, but the system will clean up the process, whether it like that or not. Most likely a core dump is written.
How does SIGINT fit into that picture? A CLI process is usually terminated by SIGINT when the user hits CRTL+C, however a background process can also be terminated by SIGINT using KILL utility. What I cannot see in the specs or the header files is if SIGINT is more or less forceful than SIGTERM or if there is any difference between SIGINT and SIGTERM at all.
UPDATE:
The best description of termination signals I found so far is in the GNU LibC Documentation. It explains very well that there is an intended difference between SIGTERM and SIGQUIT.
It says about SIGTERM:
It is the normal way to politely ask a program to terminate.
And it says about SIGQUIT:
[...] and produces a core dump when it terminates the process, just like a program error signal.
You can think of this as a program error condition “detected” by the user. [...]
Certain kinds of cleanups are best omitted in handling SIGQUIT. For example, if the program
creates temporary files, it should handle the other termination requests by deleting the temporary
files. But it is better for SIGQUIT not to delete them, so that the user can examine them in
conjunction with the core dump.
And SIGHUP is also explained well enough. SIGHUP is not really a termination signal, it just means the "connection" to the user has been lost, so the app cannot expect the user to read any further output (e.g. stdout/stderr output) and there is no input to expect from the user any longer. For most apps that mean they better quit. In theory an app could also decide that it goes into daemon mode when a SIGHUP is received and now runs as a background process, writing output to a configured log file. For most daemons already running in the background, SIGHUP usually means that they shall reexamine their configuration files, so you send it to background processes after editing config files.
However there is no useful explanation of SIGINT on this page, other than that it is sent by CRTL+C. Is there any reason why one would handle SIGINT in a different way than SIGTERM? If so what reason would this be and how would the handling be different?

SIGTERM and SIGKILL are intended for general purpose "terminate this process" requests. SIGTERM (by default) and SIGKILL (always) will cause process termination. SIGTERM may be caught by the process (e.g. so that it can do its own cleanup if it wants to), or even ignored completely; but SIGKILL cannot be caught or ignored.
SIGINT and SIGQUIT are intended specifically for requests from the terminal: particular input characters can be assigned to generate these signals (depending on the terminal control settings). The default action for SIGINT is the same sort of process termination as the default action for SIGTERM and the unchangeable action for SIGKILL; the default action for SIGQUIT is also process termination, but additional implementation-defined actions may occur, such as the generation of a core dump. Either can be caught or ignored by the process if required.
SIGHUP, as you say, is intended to indicate that the terminal connection has been lost, rather than to be a termination signal as such. But, again, the default action for SIGHUP (if the process does not catch or ignore it) is to terminate the process in the same way as SIGTERM etc. .
There is a table in the POSIX definitions for signal.h which lists the various signals and their default actions and purposes, and the General Terminal Interface chapter includes a lot more detail on the terminal-related signals.

man 7 signal
This is the convenient non-normative manpage of the Linux man-pages project that you often want to look at for Linux signal information.
Version 3.22 mentions interesting things such as:
The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
and contains the table:
Signal Value Action Comment
----------------------------------------------------------------------
SIGHUP 1 Term Hangup detected on controlling terminal
or death of controlling process
SIGINT 2 Term Interrupt from keyboard
SIGQUIT 3 Core Quit from keyboard
SIGILL 4 Core Illegal Instruction
SIGABRT 6 Core Abort signal from abort(3)
SIGFPE 8 Core Floating point exception
SIGKILL 9 Term Kill signal
SIGSEGV 11 Core Invalid memory reference
SIGPIPE 13 Term Broken pipe: write to pipe with no
readers
SIGALRM 14 Term Timer signal from alarm(2)
SIGTERM 15 Term Termination signal
SIGUSR1 30,10,16 Term User-defined signal 1
SIGUSR2 31,12,17 Term User-defined signal 2
SIGCHLD 20,17,18 Ign Child stopped or terminated
SIGCONT 19,18,25 Cont Continue if stopped
SIGSTOP 17,19,23 Stop Stop process
SIGTSTP 18,20,24 Stop Stop typed at tty
SIGTTIN 21,21,26 Stop tty input for background process
SIGTTOU 22,22,27 Stop tty output for background process
which summarizes signal Action that distinguishes e.g. SIGQUIT from SIGQUIT, since SIGQUIT has action Core and SIGINT Term.
The actions are documented in the same document:
The entries in the "Action" column of the tables below specify the default disposition for each signal, as follows:
Term Default action is to terminate the process.
Ign Default action is to ignore the signal.
Core Default action is to terminate the process and dump core (see core(5)).
Stop Default action is to stop the process.
Cont Default action is to continue the process if it is currently stopped.
I cannot see any difference between SIGTERM and SIGINT from the point of view of the kernel since both have action Term and both can be caught. It seems that is just a "common usage convention distinction":
SIGINT is what happens when you do CTRL-C from the terminal
SIGTERM is the default signal sent by kill
Some signals are ANSI C and others not
A considerable difference is that:
SIGINT and SIGTERM are ANSI C, thus more portable
SIGQUIT and SIGKILL are not
They are described on section "7.14 Signal handling " of the C99 draft N1256:
SIGINT receipt of an interactive attention signal
SIGTERM a termination request sent to the program
which makes SIGINT a good candidate for an interactive Ctrl + C.
POSIX 7
POSIX 7 documents the signals with the signal.h header: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html
This page also has the following table of interest which mentions some of the things we had already seen in man 7 signal:
Signal Default Action Description
SIGABRT A Process abort signal.
SIGALRM T Alarm clock.
SIGBUS A Access to an undefined portion of a memory object.
SIGCHLD I Child process terminated, stopped,
SIGCONT C Continue executing, if stopped.
SIGFPE A Erroneous arithmetic operation.
SIGHUP T Hangup.
SIGILL A Illegal instruction.
SIGINT T Terminal interrupt signal.
SIGKILL T Kill (cannot be caught or ignored).
SIGPIPE T Write on a pipe with no one to read it.
SIGQUIT A Terminal quit signal.
SIGSEGV A Invalid memory reference.
SIGSTOP S Stop executing (cannot be caught or ignored).
SIGTERM T Termination signal.
SIGTSTP S Terminal stop signal.
SIGTTIN S Background process attempting read.
SIGTTOU S Background process attempting write.
SIGUSR1 T User-defined signal 1.
SIGUSR2 T User-defined signal 2.
SIGTRAP A Trace/breakpoint trap.
SIGURG I High bandwidth data is available at a socket.
SIGXCPU A CPU time limit exceeded.
SIGXFSZ A File size limit exceeded.
BusyBox init
BusyBox's 1.29.2 default reboot command sends a SIGTERM to processes, sleeps for a second, and then sends SIGKILL. This seems to be a common convention across different distros.
When you shutdown a BusyBox system with:
reboot
it sends a signal to the init process.
Then, the init signal handler ends up calling:
static void run_shutdown_and_kill_processes(void)
{
/* Run everything to be run at "shutdown". This is done _prior_
* to killing everything, in case people wish to use scripts to
* shut things down gracefully... */
run_actions(SHUTDOWN);
message(L_CONSOLE | L_LOG, "The system is going down NOW!");
/* Send signals to every process _except_ pid 1 */
kill(-1, SIGTERM);
message(L_CONSOLE, "Sent SIG%s to all processes", "TERM");
sync();
sleep(1);
kill(-1, SIGKILL);
message(L_CONSOLE, "Sent SIG%s to all processes", "KILL");
sync();
/*sleep(1); - callers take care about making a pause */
}
which prints to the terminal:
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Here is a minimal concrete example of that.
Signals sent by the kernel
SIGKILL:
OOM killer: What is RSS and VSZ in Linux memory management

As DarkDust noted many signals have the same results, but processes can attach different actions to them by distinguishing how each signal is generated. Looking at the FreeBSD kernel source code (kern_sig.c) I see that the two signals are handled in the same way, they terminate the process and are delivered to any thread.
SA_KILL|SA_PROC, /* SIGINT */
SA_KILL|SA_PROC, /* SIGTERM */

After a quick Google search for sigint vs sigterm, it looks like the only intended difference between the two is whether it was initiated by a keyboard shortcut or by an explicit call to kill.
As a result, you could, for example, intercept sigint and do something special with it, knowing that it was likely sent by a keyboard shortcut. Perhaps refresh the screen or something, instead of dying (not recommended, as people expect ^C to kill the program, just an example).
I also learned that ^\ should send sigquit, which I may start using myself. Looks very useful.

Using kill (both the system call and the utility) you can send almost any signal to any process, given you've got the permission. A process cannot distinguish how a signal came to life and who has sent it.
That being said, SIGINT really is meant to signal the Ctrl-C interruption, while SIGTERM is the general terminal signal. There is no concept of a signal being "more forceful", with the only exception that there are signals that cannot be blocked or handled (SIGKILL and SIGSTOP, according to the man page).
A signal can only be "more forceful" than another signal with respect to how a receiving process handles the signal (and what the default action for that signal is). For example, by default, both SIGTERM and SIGINT lead to termination. But if you ignore SIGTERM then it will not terminate your process, while SIGINT still will.

With the exception of a few signals, signal handlers can catch the various signals, or the default behavior upon receipt of a signal can be modified. See the signal(7) man page for details.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string