Signal handler (segv) unable to complete before device crashes - linux

I have installed a handler (say, crashHandler()) which has a bit of file output functionality. It is a linux thread which registers for SIGSEGV with the crashHandler(). File writing is requred, as it stores the stack trace to persistent storage.
It works most of the times. But in a specific scenario, the function (crashHandler()) executes the function partly (I can see logs) and then device reboots. Can someone help me with a way to deal with such ?

The first question to ask here is why the device rebooted. Normally having an ordinary application crash won't cause a kernel-level or hardware-level reboot. Most likely, you're either hitting a watchdog timer before the crash handler completes (in which case you should extend the watchdog timeout - do NOT reset the timer from within the crash handler though, as then you're risking problems in the crash handler itself preventing a reboot), or this is pid 1 and it's crashing within the SIGSEGV handler, causing a kernel panic due to pid 1 (init) dying.
If it's the latter, you need to be more careful with what you do in that crash handler. Remember, you just crashed. You know memory is corrupt, but you don't know how it's corrupt. It may be corrupt in ways that affect the crash handler itself - e.g. if you corrupt the heap metadata, you may be unable to allocate memory without crashing for real this time. You should keep what you do in that handler to a bare minimum - in particular, avoid calling any library functions that are not documented as being async-signal-safe and avoid using any complex (pointer-containing) data structures or dynamically allocated memory. For the highest level of safety, limit yourself to just fork() and exec()ing another process that will use debugger APIs (ptrace() and /proc/$PID/mem) to perform memory dumps or whatever else you might need.

Related

Why linux process with status 'D' can be killed ? [duplicate]

Sometimes whenever I write a program in Linux and it crashes due to a bug of some sort, it will become an uninterruptible process and continue running forever until I restart my computer (even if I log out). My questions are:
What causes a process to become uninterruptible?
How do I stop that from happening?
This is probably a dumb question, but is there any way to interrupt it without restarting my computer?
An uninterruptible process is a process which happens to be in a system call (kernel function) that cannot be interrupted by a signal.
To understand what that means, you need to understand the concept of an interruptible system call. The classic example is read(). This is a system call that can take a long time (seconds) since it can potentially involve spinning up a hard drive, or moving heads. During most of this time, the process will be sleeping, blocking on the hardware.
While the process is sleeping in the system call, it can receive a Unix asynchronous signal (say, SIGTERM), then the following happens:
The system call exits prematurely, and is set up to return -EINTR to user space.
The signal handler is executed.
If the process is still running, it gets the return value from the system call, and it can make the same call again.
Returning early from the system call enables the user space code to immediately alter its behavior in response to the signal. For example, terminating cleanly in reaction to SIGINT or SIGTERM.
On the other hand, some system calls are not allowed to be interrupted in this way. If the system calls stalls for some reason, the process can indefinitely remains in this unkillable state.
LWN ran a nice article that touched this topic in July.
To answer the original question:
How to prevent this from happening: figure out which driver is causing you trouble, and either stop using, or become a kernel hacker and fix it.
How to kill an uninterruptible process without rebooting: somehow make the system call terminate. Frequently the most effective manner to do this without hitting the power switch is to pull the power cord. You can also become a kernel hacker and make the driver use TASK_KILLABLE, as explained in the LWN article.
When a process is on user mode, it can be interrupted at any time (switching to kernel mode). When the kernel returns to user mode, it checks if there are any signals pending (including the ones which are used to kill the process, such as SIGTERM and SIGKILL). This means a process can be killed only on return to user mode.
The reason a process cannot be killed in kernel mode is that it could potentially corrupt the kernel structures used by all the other processes in the same machine (the same way killing a thread can potentially corrupt data structures used by other threads in the same process).
When the kernel needs to do something which could take a long time (waiting on a pipe written by another process or waiting for the hardware to do something, for instance), it sleeps by marking itself as sleeping and calling the scheduler to switch to another process (if there is no non-sleeping process, it switches to a "dummy" process which tells the cpu to slow down a bit and sits in a loop — the idle loop).
If a signal is sent to a sleeping process, it has to be woken up before it will return to user space and thus process the pending signal. Here we have the difference between the two main types of sleep:
TASK_INTERRUPTIBLE, the interruptible sleep. If a task is marked with this flag, it is sleeping, but can be woken by signals. This means the code which marked the task as sleeping is expecting a possible signal, and after it wakes up will check for it and return from the system call. After the signal is handled, the system call can potentially be automatically restarted (and I won't go into details on how that works).
TASK_UNINTERRUPTIBLE, the uninterruptible sleep. If a task is marked with this flag, it is not expecting to be woken up by anything other than whatever it is waiting for, either because it cannot easily be restarted, or because programs are expecting the system call to be atomic. This can also be used for sleeps known to be very short.
TASK_KILLABLE (mentioned in the LWN article linked to by ddaa's answer) is a new variant.
This answers your first question. As to your second question: you can't avoid uninterruptible sleeps, they are a normal thing (it happens, for instance, every time a process reads/writes from/to the disk); however, they should last only a fraction of a second. If they last much longer, it usually means a hardware problem (or a device driver problem, which looks the same to the kernel), where the device driver is waiting for the hardware to do something which will never happen. It can also mean you are using NFS and the NFS server is down (it is waiting for the server to recover; you can also use the "intr" option to avoid the problem).
Finally, the reason you cannot recover is the same reason the kernel waits until return to user mode to deliver a signal or kill the process: it would potentially corrupt the kernel's data structures (code waiting on an interruptible sleep can receive an error which tells it to return to user space, where the process can be killed; code waiting on an uninterruptible sleep is not expecting any error).
Uninterruptable processes are USUALLY waiting for I/O following a page fault.
Consider this:
The thread tries to access a page which is not in core (either an executable which is demand-loaded, a page of anonymous memory which has been swapped out, or a mmap()'d file which is demand loaded, which are much the same thing)
The kernel is now (trying to) load it in
The process can't continue until the page is available.
The process/task cannot be interrupted in this state, because it can't handle any signals; if it did, another page fault would happen and it would be back where it was.
When I say "process", I really mean "task", which under Linux (2.6) roughly translates to "thread" which may or may not have an individual "thread group" entry in /proc
In some cases, it may be waiting for a long time. A typical example of this would be where the executable or mmap'd file is on a network filesystem where the server has failed. If the I/O eventually succeeds, the task will continue. If it eventually fails, the task will generally get a SIGBUS or something.
To your 3rd question:
I think you can kill the uninterruptable processes by running
sudo kill -HUP 1.
It will restart init without ending the running processes and after running it, my uninterruptable processes were gone.
If you are talking about a "zombie" process (which is designated as "zombie" in ps output), then this is a harmless record in the process list waiting for someone to collect its return code and it could be safely ignored.
Could you please describe what and "uninterruptable process" is for you? Does it survives the "kill -9 " and happily chugs along? If that is the case, then it's stuck on some syscall, which is stuck in some driver, and you are stuck with this process till reboot (and sometimes it's better to reboot soon) or unloading of relevant driver (which is unlikely to happen). You could try to use "strace" to find out where your process is stuck and avoid it in the future.

Can I safely access potentially unallocated memory addresses?

I'm trying to create memcpy like function that will fail gracefully (ie return an error instead of segfaulting) when given an address in memory that is part of an unallocated page. I think the right approach is to install a sigsegv signal handler, and do something in the handler to make the memcpy function stop copying.
But I'm not sure what happens in the case my program is multithreaded:
Is it possible for the signal handler to execute in another thread?
What happens if a segfault isn't related to any memcpy operation?
How does one handle two threads executing memcpy concurrently?
Am I missing something else? Am I looking for something that's impossible to implement?
Trust me, you do not want to go down that road. It's a can of worms for many reasons. Correct signal handling is already hard in single threaded environments, yet alone in multithreaded code.
First of all, returning from a signal handler that was caused by an exception condition is undefined behavior - it works in Linux, but it's still undefined behavior nevertheless, and it will give you problems sooner or later.
From man 2 sigaction:
The behaviour of a process is undefined after it returns normally from
a signal-catching function for a SIGBUS, SIGFPE, SIGILL or SIGSEGV
signal that was not generated by kill(), sigqueue() or raise().
(Note: this does not appear on the Linux manpage; but it's in SUSv2)
This is also specified in POSIX. While it works in Linux, it's not good practice.
Below the specific answers to your questions:
Is it possible for the signal handler to execute in another thread?
Yes, it is. A signal is delivered to any thread that is not blocking it (but is delivered only to one, of course), although in Linux and many other UNIX variants, exception-related signals (SIGILL, SIGFPE, SIGBUS and SIGSEGV) are usually delivered to the thread that caused the exception. This is not required though, so for maximum portability you shouldn't rely on it.
You can use pthread_sigmask(2) to block signals in every thread but one; that way you make sure that every signal is always delivered to the same thread. This makes it easy to have a single thread dedicated to signal handling, which in turn allows you to do synchronous signal handling, because the thread may use sigwait(2) (note that multithreaded code should use sigwait(2) rather than sigsuspend(2)) until a signal is delivered and then handle it synchronously. This is a very common pattern.
What happens if a segfault isn't related to any memcpy operation?
Good question. The signal is delivered, and there is no (trivial) way to portably differentiate a genuine segfault from a segfault in memcpy(3).
If you have one thread taking care of every signal, like I mentioned above, you could use sigwaitinfo(2), and then examine the si_addr field of siginfo_t once sigwaitinfo(2) returned. The si_addr field is the memory location that caused the fault, so you could compare that to the memory addresses passed to memcpy(3).
But some platforms, most notably Mac OS, do not implement sigwaitinfo(2) or its cousin sigtimedwait(2).
So there's no way to do it portably.
How does one handle two threads executing memcpy concurrently?
I don't really understand this question, what's so special about multithreaded memcpy(3)? It is the caller's responsibility to make sure regions of memory being read from and written to are not concurrently accessed; memcpy(3) isn't (and never was) thread-safe if you pass it overlapping buffers.
Am I missing something else? Am I looking for something that's
impossible to implement?
If you're concerned with portability, I would say it's pretty much impossible. Even if you just focus on Linux, it will be hard. If this was something easy to do, by this time someone would have probably done it already.
I think you're better off building your own allocator and force user code to rely on it. Then you can store state and manage allocated memory, and easily tell if the buffers passed are valid or not.

What happens if process crashes when flushinig mapped file?

I'm using boost::interprocess::managed_mapped_file to do IPC under linux. In short one process can write objects into files (method construct) for another process to read (method find). However what if the process crashes while writing? Will boost handle this automatically or I have to add a mechanism to detect such failure?
If process crashes result is not defined - nothing can know how much I/O it could have done. But I would think that OS is probably doing I/O in some units, probably at least one block (512 bytes) or page (4KB).

Can I coredump a process that is blocking on disk activity (preferably without killing it)?

I want to dump the core of a running process that is, according to /proc/<pid>/status, currently blocking on disk activity. Actually, it is busy doing work on the GPU (should be 4 hours of work, but it has taken significantly longer now). I would like to know how much of the process's work has been done, so it'd be good to be able to dump the process's memory. However, as far as I know, "blocking on disk activity" means that it's not possible to interrupt the process in any way, and coredumping a process e.g. using gdb requires interrupting and temporarily stopping the process in order to attach via ptrace, right?
I know that I could just read /proc/<pid>/{maps,mem} as root to get the (maybe inconsistent) memory state, but I don't know any way to get hold of the process's userspace CPU register values... they stay the same while the process is inside the kernel, right?
You can probably run gcore on your program. It's basically a wrapper around GDB that attaches, uses the gcore command, and detaches again.
This might interrupt your IO (as if it received a signal, which it will), but your program can likely restart it if written correctly (and this may occur in any case, due to default handling).

Is there a point to trapping "segfault"?

I know that, given enough context, one could hope to use constructively (i.e. recover) from a segfault condition.
But, is the effort worth it? If yes, in what situation(s) ?
You can't really hope to recover from a segfault. You can detect that it happened, and dump out relevant application-specific state if possible, but you can't continue the process. This is because (amongst others)
The thread which failed cannot be continued, so your only options are longjmp or terminating the thread. Neither is safe in most cases.
Either way, you may leave a mutex / lock in a locked state which causes other threads to wait forever
Even if that doesn't happen, you may leak resources
Even if you don't do either of those things, the thread which segfaulted may have left the internal state of the application inconsistent when it failed. An inconsistent internal state could cause data errors or further bad behaviour subsequently which causes more problems than simply quitting
So in general, there is no point in trapping it and doing anything EXCEPT terminating the process in a fairly abrupt fashion. There's no point in attempting to write (important) data back to disc, or continue to do other useful work. There is some point in dumping out state to logs- which many applications do - and then quitting.
A possibly useful thing to do might be to exec() your own process, or have a watchdog process which restarts it in the case of a crash. (NB: exec does not always have well defined behaviour if your process has >1 thread)
A number of the reasons:
To provide more application specific information to debug a crash. For instance, I crashed at stage 3 processing file 'x'.
To probe whether certain memory regions are accessible. This was mostly to satisfy an API for an embedded system. We would try to write to the memory region and catch the segfault that told us that the memory was read-only.
The segfault usually originates with a signal from the MMU, which is used by the operating system to swap in pages of memory if necessary. If the OS doesn't have that page of memory, it then forwards the signal onto the application.
a Segmentation Fault is really accessing memory that you do not have permission to access ( either because it's not mapped, you don't have permissions, invalid virtual address, etc. ).
Depending on the underlying reason, you may want to trap and handle the segmentation fault. For instance, if your program is passed an invalid virtual address, it may log that segfault and then do some damage control.
A segfault does not necessarily mean that the program heap is corrupted. Reading an invalid address ( eg. null pointer ) may result in a segfault, but that does not mean that the heap is corrupted. Also, an application can have multiple heaps depending on the C runtime.
There are very advanced techniques that one might implementing by catching a segmentation fault, if you know the segmentation fault isn't an error. For example, you can protect pages so that you can't read from them, and then trap the SIGSEGV to perform "magical" behavior before the read completes. (See Tomasz Węgrzanowski "Segfaulting own programs for fun and profit" for an example of what you might do, but usually the overhead is pretty high so it's not worth doing.)
A similar principle applies to catching trapping an illegal instruction exception (usually in the kernel) to emulate an instruction that's not implemented on your processor.
To log a crash stack trace, for example.
No. I think it is a waste of time - a seg fault indicates there is something wrong in your code, and you will be better advised to find this by examining a core dump and/or your source code. The one time I tried trapping a seg fault lead me off into a hall of mirrors which I could have avoided by simply thinking about the source code. Never again.

Resources