I installed a SIGSEV and SIGABRT signal handler which forks a child process that:
1. stops its parent process with SIGSTOP.
2. invokes gdb on the parent process to gather crash diagnostics.
Problem is, fork is not async signal safe on glibc thanks to ptmalloc installing pthread_atfork handlers. Now my signal handler has the potential to freeze because fork() tries to allocate memory, which in turn may grab a mutex that's already locked.
I want to work around this problem by calling the fork system call directly, bypassing any libc wrappers and therefore bypassing any atfork handlers. How do I do that? The following code works on Linux, but doesn't seem to work on OS X. It always returns the child PID, never 0, or is it supposed to do that? I'm also not sure whether I'm capturing the return value correctly because the definition is int syscall(...) but fork returns an integer of type pid_t.
pid = syscall(SYS_fork);
My app runs on many platforms, including Linux and OS X.
EDIT: fix typo: s/thread safe/async signal safe/.
On Linux, if you just want to automatically fire a debugger on software core dumping signals, you could make your core dump piping into some script, according to core(5) you just need to start your coredump_filter with a | (pipe character followed by command).
This trick avoids any extra programming (except for the script you make for that)
Related
To verify the behavior of a third party binary distributed software I'd like to use, I'm implementing a kernel module whose objective is to keep track of each child this software produces and terminates.
The target binary is a Golang produced one, and it is heavily multi thread.
The kernel module I wrote installs hooks on the kernel functions _do_fork() and do_exit() to keep track of each process/thread this binary produces and terminates.
The LKM works, more or less.
During some conditions, however, I have a scenario I'm not able to explain.
It seems like a process/thread could terminate without passing through do_exit().
The evidence I collected by putting printk() shows the process creation but does not indicate the process termination.
I'm aware that printk() can be slow, and I'm also aware that messages can be lost in such situations.
Trying to prevent message loss due to slow console (for this particular application, serial tty 115200 is used), I tried to implement a quicker console, and messages have been collected using netconsole.
The described setup seems to confirm a process can terminate without pass through the do_exit() function.
But because I wasn't sure my messages couldn't be lost on the printk() infrastructure, I decided to repeat the same test but replacing printk() with ftrace_printk(), which should be a leaner alternative to printk().
Still the same result, occasionally I see processes not passing through the do_exit(), and verifying if the PID is currently running, I have to face the fact that it is not running.
Also note that I put my hook in the do_exit() kernel function as the first instruction to ensure the function flow does not terminate inside a called function.
My question is then the following:
Can a Linux process terminate without its flow pass through the do_exit() function?
If so, can someone give me a hint of what this scenario can be?
After a long debug session, I'm finally able to answer my own question.
That's not all; I'm also able to explain why I saw the strange behavior I described in my scenario.
Let's start from the beginning: monitoring a heavily multithreading application. I observed rare cases where a PID that suddenly stops exists without observing its flow to pass through the Linux Kernel do_exit() function.
Because this my original question:
Can a Linux process terminate without pass through the do_exit() function?
As for my current knowledge, which I would by now consider reasonably extensive, a Linux process can not end its execution without pass through the do_exit() function.
But this answer is in contrast with my observations, and the problem leading me to this question is still there.
Someone here suggested that the strange behavior I watched was because my observations were somehow wrong, alluding my method was inaccurate, as for my conclusions.
My observations were correct, and the process I watched didn't pass through the do_exit() but terminated.
To explain this phenomenon, I want to put on the table another question that I think internet searchers may find somehow useful:
Can two processes share the same PID?
If you'd asked me this a month ago, I'd surely answered this question with: "definitively no, two processes can not share the same PID."
Linux is more complex, though.
There's a situation in which, in a Linux system, two different processes can share the same PID!
https://elixir.bootlin.com/linux/v4.19.20/source/fs/exec.c#L1141
Surprisingly, this behavior does not harm anyone; when this happens, one of these two processes is a zombie.
updated to correct an error
The circumstances of this duplicate PID are more intricate than those described previously. The process must flush the previous exec context if a threaded process forks before invoking an execve (the fork copies also the threads). If the intention is to use the execve() function to execute a new text, the kernel must first call the flush_old_exec() function, which then calls the de_thread() function for each thread in the process other than the task leader. Except the task leader, all the process' threads are eliminated as a result. Each thread's PID is changed to that of the leader, and if it is not immediately terminated, for example because it needs to wait an operation completion, it keeps using that PID.
end of the update
That was what I was watching; the PID I was monitoring did not pass through the do_exit() because when the corresponding thread terminated, it had no more the PID it had when it started, but it had its leader's.
For people who know the Linux Kernel's mechanics very well, this is nothing to be surprised for; this behavior is intended and hasn't changed since 2.6.17.
Current 5.10.3, is still this way.
Hoping this to be useful to internet searchers; I'd also like to add that this also answers the followings:
Question: Can a Linux process/thread terminate without pass through do_exit()? Answer: NO, do_exit() is the only way a process has to end its execution — both intentional than unintentional.
Question: Can two processes share the same PID? Answer: Normally don't. There's some rare case in which two schedulable entities have the same PID.
Question: Do Linux kernel have scenarios where a process change its PID? Answer: yes, there's at least one scenario where a Process changes its PID.
Can a Linux process terminate without its flow pass through the do_exit() function?
Probably not, but you should study the source code of the Linux kernel to be sure. Ask on KernelNewbies. Kernel threads and udev or systemd related things (or perhaps modprobe or the older hotplug) are probable exceptions. When your /sbin/init of pid 1 terminates (that should not happen) strange things would happen.
The LKM works, more or less.
What does that means? How could a kernel module half-work?
And in real life, it does happen sometimes that your Linux kernel is panicking or crashes (and it could happen with your LKM, if it has not been peer-reviewed by the Linux kernel community). In such a case, there is no more any notion of processes, since they are an abstraction provided by a living Linux kernel.
See also dmesg(1), strace(1), proc(5), syscalls(2), ptrace(2), clone(2), fork(2), execve(2), waitpid(2), elf(5), credentials(7), pthreads(7)
Look also inside the source code of your libc, e.g. GNU libc or musl-libc
Of course, see Linux From Scratch and Advanced Linux Programming
And verifying if the PID is currently running,
This can be done is user land with /proc/, or using kill(2) with a 0 signal (and maybe also pidfd_send_signal(2)...)
PS. I still don't understand why you need to write a kernel module or change the kernel code. My intuition would be to avoid doing that when possible.
after attaching a pthread using its pid and manipulating the content of its debug registers, while waiting using waitpid(-1, &status, __WALL) ; I would like to be able to stop that thread and make additional manipulations (defining another breakpoint etc).
when I try sending a signal using kill() and waiting for the thread to be ready for additional ptrace requests, for just one target thread, it works fine. on the other hand, when the number of traced threads increase, i got stuck within waitpid() call and never get unblocked.
is there a safe and fast mechanism to stop an attached thread that is running for additional modifications?
cheers.
When sending a signal to a thread, do not use the pid. Sending a signal to a process (which is what you are doing) sends it to some random thread within that process, which is almost certainly not what you would like to do. The tool to send threads signals is ptrhread_kill.
That's where things become a little more hairy. The ptrace interface uses "thread ID" (or tid). These are framed in the same context as process IDs, i.e. - integers. pthread_kill, on the other hand, uses the pthread_t type, which is an opaque, and is not the same thing.
Since using ptrace means you are in dark magic land already, the simplest solution is to use tgkill. Just place your tid and pid in the relevant fields, and you're golden.
Of course, tgkill is not an exported function. You'll need to wrap it in syscall in order to invoke it.
In am writing an SDK in Go that integrators will communicate with via a local socket connection.
From the integrating application I need a way to start the SDK as a process but more importantly, I need to be able to cancel that process when the main application is closing too.
This question is language agnostic (I think) as I think the challenge is linux related. i.e. How to start a program and cancel it at a later stage.
Some possible approaches:
I am thinking that it's a case of starting the program via exec, getting it's PID or some ID then using that to kill later. Sudo may be required to do this, which is not ideal. Also, not good practice as you will be effectively force closing the SDK, offering no time for cleanup.
Start the program via any means. Once ready to close, just send a "shutdown" command via the SDK API which will allow the SDK to cleanup, manage state then exit the application.
What is best practice for this please?
Assuming you're using Linux or similar Unix:
You are on the right track. You won't need sudo. The comments thus far are pointing in the right direction, but not spelling out the details.
See section 2 of the manual pages (man 2 ...) for details on the functions mentioned here. They are documented for calling from C. I don't have experience with Go to help determine how to use them there.
The integrator application will be called the "parent" process. The SDK-as-a-process will be called the "child" process. A process creates a child and becomes its parent by calling fork(). The new process is a duplicate of the parent, running the same code, and having for the most part all the same state (data in memory). But fork() returns different values to parent and child, so each can determine its role in the relationship. This includes informing the parent of the process identifier (pid) of the child. Hang on to this value. Then, the child uses exec() to execute a different program within the existing process, i.e. your SDK binary. An alternative to fork-then-exec is posix_spawn(), which has rather involved parameters (but gives greater control if you need it).
Designing the child to shutdown in response to a signal, rather than a command through the API, will allow processes other than the parent to initiate clean shutdown in standard fashion. For example, this might be useful for the administrator or user; it enables sending the shutdown signal from the shell command-line or script.
The child installs a signal handler function, that will be called when the child process receives a signal, by calling signal() (or the more complex sigaction() recommended for its portability). There are different signals that can be sent/received, identified by different integer values (and also given names like SIGTERM). You indicate which you're interested in receiving when calling signal(). When your signal handler function is invoked, you've received the signal, and can initiate clean shutdown.
When the parent wants the child to shut down cleanly, the parent sends a signal to the child using the unfortunately named kill(). Unfortunately named because signals can be used for other purposes. Anyway, you pass to kill() the pid (returned by fork()) and the specific signal (e.g. SIGTERM) you want to send.
The parent can also determine when the child has completely shut down by calling waitpid(), again passing the pid returned by fork(); or alternately by registering to receive signal SIGCHLD. Register to receive SIGCHLD before fork()/exec() or you might miss the signal.
Actually, it's important that you do call waitpid(), optionally after receiving SIGCHLD, in order to deallocate a resource holding the child process's exit status, so the OS can cleanup that last remnant of the process. Failing to do so keeps the child as a "zombie" process, unable to be fully reclaimed. Too many zombies and the OS will be unable to launch new processes.
If a process refuses to shut down cleanly or as quickly as you require, you may force it to quit (without executing its cleanup code) by sending the signal SIGKILL.
There are variants of exec(), waitpid() and posix_spawn(), with different names and behaviors, mentioned in their man pages.
Is there any way to list all the killed processes in a linux device?
I saw this answer suggesting:
check in:
/var/log/kern.log
but it is not generic. there is any other way to do it?
What I want to do:
list thread/process if it got killed. What function in the kernel should I edit to list all the killed tid/pid and their names, or alternitavily is there a sysfs does it anyway?
The opposite of do_fork is do_exit, here:
do_exit kernel source
I'm not able to find when threads are exiting, other than:
release_task
I believe "task" and "thread" are (almost) synonymous in Linux.
First, task and thread contexts are different in the kernel.
task (using tasklet api) runs in software interrupt context (meaning you cannot sleep while you are in the task ctx) while thread (using kthread api, or workqueue api) runs the handler in process ctx (i.e. sleep-able ctx).
In both cases, if a thread hangs in the kerenl, you cannot kill it.
if you run "ps" command from the shell, you can see it there (normally with "[" and "]" braces) but any attempt to kill it won't work.
the kernel is trusted code, such a situation shouldn't happen, and if it does, it indicates a kernel (or kernel module) bug.
normally the whole machine will hand after a while because the core running that thread is not responding (you will see a message in /var/log/messages or the console with more info) in some other cases the machine may survive but that specific core is dead. depends on the kernel configuration.
How is the signalling(interrupts) mechanism handled in kernel? The cause why I ask is: somehow a SIGABRT signal is received by my application and I want to find where does that come from..
You should be looking in your application for the cause, not in the kernel.
Usually a process receives SIGABRT when it directly calls abort or when an assert fails. Finding exactly the piece of the kernel that delivers the signal will gain you nothing.
In conclusion, your code or a library your code is using is causing this. See abort(3) and assert.
cnicutar's answer is the best guess IMHO.
It is possible that the signal has been emitted by another process, although in the case of SIGBART it most likely to be emitted by the same process which receives it via the abort(3) libc function.
In doubt, you can run your application with strace -e kill yourapp you args ... to quickly check if that kill system call is indeed invoked from within your program or dependent libraries. Or use gdb catch syscall.
Note that in some cases the kernel itself can emit signals, such as a SIGKILL when the infamous "OOM killer" goes into action.
BTW, signals are delivered asynchronously, they disrupt the normal workflow of your program. This is why they're painful to trace. Besides machinery such as SystemTap I don't know how to trace or log signals emission and delivery within the kernel.