Can I safely access potentially unallocated memory addresses? - multithreading

I'm trying to create memcpy like function that will fail gracefully (ie return an error instead of segfaulting) when given an address in memory that is part of an unallocated page. I think the right approach is to install a sigsegv signal handler, and do something in the handler to make the memcpy function stop copying.
But I'm not sure what happens in the case my program is multithreaded:
Is it possible for the signal handler to execute in another thread?
What happens if a segfault isn't related to any memcpy operation?
How does one handle two threads executing memcpy concurrently?
Am I missing something else? Am I looking for something that's impossible to implement?

Trust me, you do not want to go down that road. It's a can of worms for many reasons. Correct signal handling is already hard in single threaded environments, yet alone in multithreaded code.
First of all, returning from a signal handler that was caused by an exception condition is undefined behavior - it works in Linux, but it's still undefined behavior nevertheless, and it will give you problems sooner or later.
From man 2 sigaction:
The behaviour of a process is undefined after it returns normally from
a signal-catching function for a SIGBUS, SIGFPE, SIGILL or SIGSEGV
signal that was not generated by kill(), sigqueue() or raise().
(Note: this does not appear on the Linux manpage; but it's in SUSv2)
This is also specified in POSIX. While it works in Linux, it's not good practice.
Below the specific answers to your questions:
Is it possible for the signal handler to execute in another thread?
Yes, it is. A signal is delivered to any thread that is not blocking it (but is delivered only to one, of course), although in Linux and many other UNIX variants, exception-related signals (SIGILL, SIGFPE, SIGBUS and SIGSEGV) are usually delivered to the thread that caused the exception. This is not required though, so for maximum portability you shouldn't rely on it.
You can use pthread_sigmask(2) to block signals in every thread but one; that way you make sure that every signal is always delivered to the same thread. This makes it easy to have a single thread dedicated to signal handling, which in turn allows you to do synchronous signal handling, because the thread may use sigwait(2) (note that multithreaded code should use sigwait(2) rather than sigsuspend(2)) until a signal is delivered and then handle it synchronously. This is a very common pattern.
What happens if a segfault isn't related to any memcpy operation?
Good question. The signal is delivered, and there is no (trivial) way to portably differentiate a genuine segfault from a segfault in memcpy(3).
If you have one thread taking care of every signal, like I mentioned above, you could use sigwaitinfo(2), and then examine the si_addr field of siginfo_t once sigwaitinfo(2) returned. The si_addr field is the memory location that caused the fault, so you could compare that to the memory addresses passed to memcpy(3).
But some platforms, most notably Mac OS, do not implement sigwaitinfo(2) or its cousin sigtimedwait(2).
So there's no way to do it portably.
How does one handle two threads executing memcpy concurrently?
I don't really understand this question, what's so special about multithreaded memcpy(3)? It is the caller's responsibility to make sure regions of memory being read from and written to are not concurrently accessed; memcpy(3) isn't (and never was) thread-safe if you pass it overlapping buffers.
Am I missing something else? Am I looking for something that's
impossible to implement?
If you're concerned with portability, I would say it's pretty much impossible. Even if you just focus on Linux, it will be hard. If this was something easy to do, by this time someone would have probably done it already.
I think you're better off building your own allocator and force user code to rely on it. Then you can store state and manage allocated memory, and easily tell if the buffers passed are valid or not.

Related

How to ensure a signal handler never yields to a thread within the same process group?

This is a bit of a meta question since I think I have a solution that works for me, but it has its own downsides and upsides. I need to do a fairly common thing, catch SIGSEGV on a thread (no dedicated crash handling thread), dump some debug information and exit.
The catch here is the fact that upon crash, my application runs llvm-symbolizer which takes a while (relatively speaking) and causes a yield (either because of clone + execve or exceeding the time quanta for the thread, I've seen latter happen when doing symbolication myself in-process using libLLVM). The reason for doing all this is to get a stack trace with demangled symbols and with line/file information (stored in a separate DWP file). For obvious reasons I do not want a yield happening across my SIGSEGV handler since I intend to terminate the application (thread group) after it has executed and never return from the signal handler.
I'm not that familiar with Linux signal handling and with glibc's wrappers doing magic around them, though, I know the basic gotchas but there isn't much information on the specifics of handling signals like whether synchronous signal handlers get any kind of special priority in terms of scheduling.
Brainstorming, I had a few ideas and downsides to them:
pthread_kill(<every other thread>, SIGSTOP) - Cumbersome with more threads, interacts with signal handlers which seems like it could have unintended side effects. Also requires intercepting thread creation from other libraries to keep track of the thread list and an increasing chance of pre-emption with every system call. Possibly even change their contexts once they're stopped to point to a syscall exit stub or flat out use SIGKILL.
Global flag to serve as cancellation points for all thread (kinda like pthread_cancel/pthread_testcancel). Safer but requires a lot of maintenance and across a large codebase it can be hellish, in addition to a a mild performance overhead. Global flag could also cause the error to cascade since the program is already in an unpredictable state so letting any other thread run there is already not great.
"Abusing" the scheduler which is my current pick, with my implementation as one of the answers. Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
Core dumps not an option since the goal here was to avoid them in the first place. I would prefer not requiring a helper program aside from from the symbolizer as well.
Environment is a typical glibc based Linux (4.4) distribution with NPTL.
I know that crash handlers are fairly common now so I believe none of the ways I picked are that great, especially considering I've never seen the scheduler "hack" ever get used that way. So with that, does anyone have a better alternative that is cleaner and less riskier than the scheduler "hack" and am I missing any important points in my general ideas about signals?
Edit: It seems that I haven't really considered MP in this equation (as per comments) and the fact that other threads are still runnable in an MP situation and can happily continue running alongside the FIFO thread on a different processor. I can however change the affinity of the process to only execute on the same core as the crashing thread, which basically will effectively freeze all other threads at schedule boundaries. However, that still leaves the "FIFO thread yielding due to blocking IO" scenario open.
It seems like the FIFO + SIGSTOP option is the best one, though I do wonder if there are any other tricks that can make a thread unschedulable short of using SIGSTOP. From the docuemntation it seems like it's not possible to set a thread's CPU affinity to zero (leaving it in a limbo state where it's technically runnable except no processors are available for it to run on).
upon crash, my application runs llvm-symbolizer
That is likely to cause deadlocks. I can't find any statement about llvm-symbolizer being async-signal safe. It's likely to call malloc, and if so will surely deadlock if the crash also happens inside malloc (e.g. due to heap corruption elsewhere).
Switching to FIFO scheduling policy and raising priority therefore becoming the only runnable thread in that group.
I believe you are mistaken: a SCHED_FIFO thread will run so long as it is runnable (i.e. does not issue any blocking system calls). If the thread does issue such a call (which it has to: to e.g. open the separate .dwp file), it will block and other threads will become runnable.
TL;DR: there is no easy way to achieve what you want, and it seems unnecessary anyway: what do you care that other threads continue running while the crashing thread finishes its business?
This is the best solution I could come up (parts omitted for brevity but it shows the principle) with, my basic assumption being that in this situation the process runs as root. This approach can lead to resource starvation in case things go really bad and requires privileges (if I understand the man(7) sched page correctly) I run the part of the signal handler that causes preemptions under the OSSplHigh guard and exit the scope as soon as I can. This is not strictly C++ related since the same could be done in C or any other native language.
void spl_get(spl_t& O)
{
os_assert(syscall(__NR_sched_getattr,
0, &O, sizeof(spl_t), 0) == 0);
}
void spl_set(spl_t& N)
{
os_assert(syscall(__NR_sched_setattr,
0, &N, 0) == 0);
}
void splx(uint32_t PRI, spl_t& O) {
spl_t PL = {0};
PL.size = sizeof(PL);
PL.sched_policy = SCHED_FIFO;
PL.sched_priority = PRI;
spl_set(PL, O);
}
class OSSplHigh {
os::spl_t OldPrioLevel;
public:
OSSplHigh() {
os::splx(2, OldPrioLevel);
}
~OSSplHigh() {
os::spl_set(OldPrioLevel);
}
};
The handler itself is quite trivial using sigaltstack and sigaction though I do not block SIGSEGV on any thread. Also oddly enough syscalls sched_setattr and sched_getattr or the struct definition weren't exposed through glibc contrary to the documentation.
Late Edit: The best solution involved sending SIGSTOP to all threads (by intercepting pthread_create via linker's --wrap option) to keep a ledger of all running threads, thank you to suggestion in the comments.

What are the possible threats that a waiting pthread_mutex might encounter?

If a pthread is locking a shared resource.
Is there any threat that a waiting pthread_mutex might encounter?
Something like limitation of parallel pthreads, time limit, event, ...
As you can see in the specification, for example here, pthread_mutex_lock() has a int return value. Apart from the trivial/obvious error causes, such as "invalid argument" etc, there is one which can actually be considered a "threat". Especially a threat to people who do not check return values.
This threat is the return value EAGAIN which if not caught properly can cause your program to become faulty, accessing the resource the mutex is supposed to protect while it did not acquire the mutex. EAGAIN can happen for example, if the process received a System V "signal" and if this thread with the code is affected by it.
In general, using Unix System V constructs (such as signals) in parallel with Posix threads is at least dangerous. In Unix System V, threads did not exist and it was clear that the single main thread of a process was "interrupted" and used to handle the signal (using a stack-switch to the signal stack). Any kernel side blocking of the main thread got interrupted, the blocking function returns with EAGAIN and has to re-issue its call after handling the signal.
Hence, unfortunately the only fool-proof way of coding on Posix/Unix systems involves an abundance of while loops around anything which might block.
while( EAGAIN == pthread_mutex_lock(...) );
Not doing that would mean that your code can only be used in applications which clearly exert full control over signal behavior. Such as disabling all signals or using other means to ensure that the thread executing this code will not be affected.
Apart from this, Mutexes are system resources (kernel objects) and the amount available is not infinite, yet not usually something to worry about. I hope for other answers to elaborate on such limits.
EDIT It seems the documentation has changed in the past few years. Now they state, that EAGAIN would be related to the limit of recursive locks and that EINTR shall not happen. In the past, at least there were systems/documentations which conformed with my explanation above.
Also new (at least to me):
If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler the thread shall resume waiting for the mutex as if it was not interrupted.
Well, maybe they learned something since I last was forced to work with such systems.

Why can a signal handler not run during system calls?

In Linux a process that is waiting for IO can be in either of the states TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
The latter, for example, is the case when the process waits until the read from a regular file completes. If this read takes very long or forever (for whatever reasons), it leads to the problem that the process cannot receive any signals during that time.
The former, is the case if, for example, one waits for a read from a socket (which may take an unbounded amount of time). If such a read call gets interrupted, however, it leads to the problem that it requires the programmer/user to handle the errno == EINTER condition correctly, which he might forget.
My question is: Wouldn't it be possible to allow all system calls to be interrupted by signals and, moreover, in such a way that after the signal handler ran, the original system call simply continues its business? Wouldn't this solve both problems? If it is not possible, why?
The two halves of the question are related but I may have digested too much spiked eggnog to determine if there is so much a one-to-one as opposed to a one-to-several relationship here.
On the kernel side I think a TASK_KILLABLE state was added a couple of years back. The consequence of that was that while the process blocked in a system call (for whatever reason), if the kernel encountered a signal that was going to kill the process anyway, it would let nature take its course. That reduces the chances of a process being permanent stuck in an uninterruptible state. What progress has been made in changing individual system calls to avail themselves of that option I do not know.
On the user side, SA_RESTART and its convenience function cousin siginterrupt go some distance to alleviating the application programmer pain. But the fact that these are specified per signal and aren't honored by all system calls is a pretty good hint that there are reasons why a blanket interruption scheme like you propose can't be implemented, or at least easily implemented. Just thinking about the possible interactions of the matrices of signals, individual system calls, calls that have historically expected and backwardly supported behavior (and possibly multiple historical semantics based on their system bloodline!), is a bit dizzying. But maybe that is the eggnog.

Is there a point to trapping "segfault"?

I know that, given enough context, one could hope to use constructively (i.e. recover) from a segfault condition.
But, is the effort worth it? If yes, in what situation(s) ?
You can't really hope to recover from a segfault. You can detect that it happened, and dump out relevant application-specific state if possible, but you can't continue the process. This is because (amongst others)
The thread which failed cannot be continued, so your only options are longjmp or terminating the thread. Neither is safe in most cases.
Either way, you may leave a mutex / lock in a locked state which causes other threads to wait forever
Even if that doesn't happen, you may leak resources
Even if you don't do either of those things, the thread which segfaulted may have left the internal state of the application inconsistent when it failed. An inconsistent internal state could cause data errors or further bad behaviour subsequently which causes more problems than simply quitting
So in general, there is no point in trapping it and doing anything EXCEPT terminating the process in a fairly abrupt fashion. There's no point in attempting to write (important) data back to disc, or continue to do other useful work. There is some point in dumping out state to logs- which many applications do - and then quitting.
A possibly useful thing to do might be to exec() your own process, or have a watchdog process which restarts it in the case of a crash. (NB: exec does not always have well defined behaviour if your process has >1 thread)
A number of the reasons:
To provide more application specific information to debug a crash. For instance, I crashed at stage 3 processing file 'x'.
To probe whether certain memory regions are accessible. This was mostly to satisfy an API for an embedded system. We would try to write to the memory region and catch the segfault that told us that the memory was read-only.
The segfault usually originates with a signal from the MMU, which is used by the operating system to swap in pages of memory if necessary. If the OS doesn't have that page of memory, it then forwards the signal onto the application.
a Segmentation Fault is really accessing memory that you do not have permission to access ( either because it's not mapped, you don't have permissions, invalid virtual address, etc. ).
Depending on the underlying reason, you may want to trap and handle the segmentation fault. For instance, if your program is passed an invalid virtual address, it may log that segfault and then do some damage control.
A segfault does not necessarily mean that the program heap is corrupted. Reading an invalid address ( eg. null pointer ) may result in a segfault, but that does not mean that the heap is corrupted. Also, an application can have multiple heaps depending on the C runtime.
There are very advanced techniques that one might implementing by catching a segmentation fault, if you know the segmentation fault isn't an error. For example, you can protect pages so that you can't read from them, and then trap the SIGSEGV to perform "magical" behavior before the read completes. (See Tomasz Węgrzanowski "Segfaulting own programs for fun and profit" for an example of what you might do, but usually the overhead is pretty high so it's not worth doing.)
A similar principle applies to catching trapping an illegal instruction exception (usually in the kernel) to emulate an instruction that's not implemented on your processor.
To log a crash stack trace, for example.
No. I think it is a waste of time - a seg fault indicates there is something wrong in your code, and you will be better advised to find this by examining a core dump and/or your source code. The one time I tried trapping a seg fault lead me off into a hall of mirrors which I could have avoided by simply thinking about the source code. Never again.

How to stop long executing threads gracefully?

I have a threading problem with Delphi. I guess this is common in other languages too. I have a long process which I do in a thread, that fills a list in main window. But if some parameters change in the mean time, then I should stop current executing thread and start from the beginning. Delphi suggests terminating a thread by setting Terminated:=true and checking for this variable's value in the thread. However my problem is this, the long executing part is buried in a library call and in this call I cannot check for the Terminated variable. Therefore I had to wait for this library call to finish, which affects the whole program.
What is the preferred way to do in this case? Can I kill the thread immediately?
The preferred way is to modify the code so that it doesn't block without checking for cancellation.
Since you can't modify the code, you can't do that; you either have to live with the background operation (but you can disassociate it from any UI, so that its completion will be ignored); or alternatively, you can try terminating it (TerminateThread API will rudely terminate any thread given its handle). Termination isn't clean, though, like Rob says, any locks held by the thread will be abandoned, and any cross-thread state protected by such locks may be in a corrupted state.
Can you consider calling the function in a separate executable? Perhaps using RPC (pipes, TCP, rather than shared memory owing to same lock problem), so that you can terminate a process rather than terminating a thread? Process isolation will give you a good deal more protection. So long as you aren't relying on cross-process named things like mutexes, it should be far safer than killing a thread.
The threads need to co-operate to achieve a graceful shutdown. I am not sure if Delphi offers a mechanism to abort another thread, but such mechanisms are available in .NET and Java, but should be considered an option of last resort, and the state of the application is indeterminate after they have been used.
If you can kill a thread at an arbitrary point, then you may kill it while it is holding a lock in the memory allocator (for example). This will leave your program open to hanging when your main thread next needs to access that lock.
If you can't modify the code to check for termination, then just set its priority really low, and ignore it when it returns.
I wrote this in reply to a similar question:
I use an exception-based technique
that's worked pretty well for me in a
number of Win32 applications.
To terminate a thread, I use
QueueUserAPC to queue a call to a
function which throws an exception.
However, the exception that's thrown
isn't derived from the type
"Exception", so will only be caught by
my thread's wrapper procedure.
I've used this with C++Builder apps very successfully. I'm not aware of all the subtleties of Delphi vs C++ exception handling, but I'd expect it could easily be modified to work.

Resources