How do sandboxing environments recover from faults?

In JavaScript runtimes and other JIT-compiled environments where code needs to be sandboxed, how can the larger program recover when the dynamically loaded code faults?
For instance, the code that SpiderMonkey generates for WebAssembly contains instructions which throw SIGILL when they are executed (e.g., to do a bounds check). When such a fault is thrown, I suppose the exception handler of the JavaScript engine is called. But such a signal handler is very restricted in its abilities. To jump out of the bad dynamically generated code it would typically use siglongjmp, but if I understand the notes at the bottom of man signal-safety correctly, if that happens, then no other unsafe function may be used by the program ever again. Such a restriction is of course unreasonable for a JS runtime (e.g.).
So how do sandboxing environments recover from child faults without losing abilities (in particular, becoming restricted to only async-safe functions)?
One thought I had is that perhaps the sandboxed code is ran in another thread with its own signal handler, so that the main thread does not lose any abilities when the child thread receives a signal. But I don't know whether that is correct; I find it hard to find documentation on how signals and threads interact exactly.
As you will see this question is tagged posix, but I am in principle also interested in the Windows approach.


Main thread context returning in a different hardware thread, undefined behaviour?

I'm currently working on a concurrency library for C++ with green threads using work stealing for load balancing between schedulers on multiple hardware threads.
I've pinned the main context to its hardware thread, meaning it cannot be stolen and therefore migrate to other hardware threads.
I believe I've read this somewhere that if the main context where to be stolen and return (exit) in different hardware thread from where it originated from it is undefined behaviour. But, I cannot seem to find any sources on this now.
Is this undefined behaviour? Citing sources would be perfect.
First of all, there are no green threads in standard C++. So the whole thing with making your own stacks and replacing them is undefined behaviour. UB doesn't mean that something will not work. It means that there is no guarantee that it will work on any standard compliant system.
Green threads/coroutines are balancing on the edge of UB, making implementation for every supported platform. They work bacause they were written for specific platforms and have kludges to overcome diferent pitfals on these platforms.

Why can I not blocking main thread in WinRT(Windows Store App)?

This question is not about "should I block my main thread" as it is generally a bad idea to block a main/STA/UI thread-for messaging and UI operations, but why WinRT C++/cx doesn't allow any blocking of the main thread compared to iOS, Android, and even C#(await doesn't actually block though).
Is there a fundamental difference in the way Android or iOS block the main thread? Why is WinRT the only platform that doesn't allow any form of blocking synchronization?
EDIT: I'm aware of co-await in VS2015, but due to backward compatibility my company still uses VS2013.
Big topic, at break-neck speed. This continues a tradition that started a long time ago in COM. WinRT inherits about all of the same concepts, it did get cleaned-up considerably. The fundamental design consideration is that thread-safety is one of the most difficult aspects of library design. And that any library has classes that are fundamentally thread-unsafe and if the consumer of the library is not aware of it then he'll easily create a nasty bug that is excessively difficult to diagnose.
This is an ugly problem for a company that relies on a closed-source business model and a 1-800 support phone number. Such phone calls can be very unpleasant, threading bugs invariably require telling a programmer "you can't do that, you'll have to rewrite your code". Rarely an acceptable answer, not at SO either :)
So thread-safety is not treated as an afterthought that the programmer needs to get right by himself. A WinRT class explicitly specifies whether or not it is thread-safe (the ThreadingModel attribute) and, if it is used in an unsafe way anyway, what should happen to make it thread-safe (the MarshallingBehavior attribute). Mostly a runtime detail, do note how compiler warning C4451 can even make these attributes produce a compile-time diagnostic.
The "used in an unsafe way anyway" clause is what you are asking about. WinRT can make a class that is not thread-safe safe by itself but there is one detail that it can't figure out by itself. To make it safe, it needs to know whether the thread that creates an object of the class can support the operating system provided way to make the object safe. And if the thread doesn't then the OS has to create a thread by itself to give the object a safe home. Solves the problem but that is pretty inefficient since every method call has to be marshalled.
You have to make a promise, cross-your-heart-hope-to-die style. The operating system can avoid creating a thread if your thread solves the producer-consumer problem. Better known as "pumping the message loop" in Windows vernacular. Something the OS can't figure out by itself since you typically don't start to pump until after you created a thread-unsafe object.
And just one more promise you make, you also promise that the consumer doesn't block and stops accepting messages from the message queue. Blocking is bad, implicit is that worker threads can't continue while the consumer is blocking. And worse, much worse, blocking is pretty likely to cause deadlock. The threading problem that's always a significant risk when there are two synchronization objects involved. One that you block on, the other that's hidden inside the OS that is waiting for the call to complete. Diagnosing a deadlock when you can't see the state of one of the sync objects that caused the deadlock is generally unpleasant.
Emphasis on promise, there isn't anything the OS can do if you break the promise and block anyway. It will let you, and it doesn't necessarily have to be fatal. It often isn't and doesn't cause anything more than an unresponsive UI. Different in managed code that runs on the CLR, if it blocks then the CLR will pump. Mostly works, but can cause some pretty bewildering re-entrancy bugs. That mechanism doesn't exist in native C++. Deadlock isn't actually that hard to diagnose, but you do have to find the thread back that's waiting for the STA thread to get back to business. Its stack trace tells the tale.
Do beware of these attributes when you use C++/CX. Unless you explicitly provide them, you'll create a class that's always considered thread-safe (ThreadingModel = Both, MarshallingType = Standard). An aspect that is not often actually tested, it will be the client code that ruins that expectation. Well, you'll get a phone call and you have to give an unpleasant answer :) Also note that OSX and Android are hardly the only examples of runtime systems that don't provide the WinRT guarantees, the .NET Framework does not either.
In a nutshell: because the policy for WinRT apps was "thou shalt not block the UI thread" and the C++ PPL runtime enforces this policy whilst the .NET runtime does not -- look at ppltasks.h and search for prevent Windows Runtime STA threads from blocking the UI. (Note that although .NET doesn't enforce this policy, it lets you accidentally deadlock yourself instead).
If you have to block the thread, there are ways to do it using Win32 IPC mechanisms (like waiting on an event that will be signaled by your completion handler) but the general guidance is still "don't do that" because it has a poor UX.

Can I safely access potentially unallocated memory addresses?

I'm trying to create memcpy like function that will fail gracefully (ie return an error instead of segfaulting) when given an address in memory that is part of an unallocated page. I think the right approach is to install a sigsegv signal handler, and do something in the handler to make the memcpy function stop copying.
But I'm not sure what happens in the case my program is multithreaded:
Is it possible for the signal handler to execute in another thread?
What happens if a segfault isn't related to any memcpy operation?
How does one handle two threads executing memcpy concurrently?
Am I missing something else? Am I looking for something that's impossible to implement?
Trust me, you do not want to go down that road. It's a can of worms for many reasons. Correct signal handling is already hard in single threaded environments, yet alone in multithreaded code.
First of all, returning from a signal handler that was caused by an exception condition is undefined behavior - it works in Linux, but it's still undefined behavior nevertheless, and it will give you problems sooner or later.
From man 2 sigaction:
The behaviour of a process is undefined after it returns normally from
a signal-catching function for a SIGBUS, SIGFPE, SIGILL or SIGSEGV
signal that was not generated by kill(), sigqueue() or raise().
(Note: this does not appear on the Linux manpage; but it's in SUSv2)
This is also specified in POSIX. While it works in Linux, it's not good practice.
Below the specific answers to your questions:
Is it possible for the signal handler to execute in another thread?
Yes, it is. A signal is delivered to any thread that is not blocking it (but is delivered only to one, of course), although in Linux and many other UNIX variants, exception-related signals (SIGILL, SIGFPE, SIGBUS and SIGSEGV) are usually delivered to the thread that caused the exception. This is not required though, so for maximum portability you shouldn't rely on it.
You can use pthread_sigmask(2) to block signals in every thread but one; that way you make sure that every signal is always delivered to the same thread. This makes it easy to have a single thread dedicated to signal handling, which in turn allows you to do synchronous signal handling, because the thread may use sigwait(2) (note that multithreaded code should use sigwait(2) rather than sigsuspend(2)) until a signal is delivered and then handle it synchronously. This is a very common pattern.
What happens if a segfault isn't related to any memcpy operation?
Good question. The signal is delivered, and there is no (trivial) way to portably differentiate a genuine segfault from a segfault in memcpy(3).
If you have one thread taking care of every signal, like I mentioned above, you could use sigwaitinfo(2), and then examine the si_addr field of siginfo_t once sigwaitinfo(2) returned. The si_addr field is the memory location that caused the fault, so you could compare that to the memory addresses passed to memcpy(3).
But some platforms, most notably Mac OS, do not implement sigwaitinfo(2) or its cousin sigtimedwait(2).
So there's no way to do it portably.
How does one handle two threads executing memcpy concurrently?
I don't really understand this question, what's so special about multithreaded memcpy(3)? It is the caller's responsibility to make sure regions of memory being read from and written to are not concurrently accessed; memcpy(3) isn't (and never was) thread-safe if you pass it overlapping buffers.
Am I missing something else? Am I looking for something that's
impossible to implement?
If you're concerned with portability, I would say it's pretty much impossible. Even if you just focus on Linux, it will be hard. If this was something easy to do, by this time someone would have probably done it already.
I think you're better off building your own allocator and force user code to rely on it. Then you can store state and manage allocated memory, and easily tell if the buffers passed are valid or not.

"Multi-process" vs. "single-process multi-threading" for software modules communicating via messaging

We need to build a software framework (or middleware) that will enable messaging between different software components (or modules) running on a single machine. This framework will provide such features:
Communication between modules are through 'messaging'.
Each module will have its own message queue and message handler thread that will synchronously handle each incoming message.
With the above requirements, which of the following approach is the correct one (with its reasoning)?:
Implementing modules as processes, and messaging through shared memory
Implementing modules as threads in a single process, and messaging by pushing message objects to the destination module's message queue.
Of source, there are some apparent cons & pros:
In Option-2, if one module causes segmentation fault, the process (thus the whole application) will crash. And one module can access/mutate another module's memory directly, which can lead to difficult-to-debug runtime errors.
But with Option-1, you need to take care of the states where a module you need to communicate has just crashed. If there are N modules in the software, there can be 2^N many alive/crashed states of the system that affects the algorithms running on the modules.
Again in Option-1, sender cannot assume that the receiver has received the message, because it might have crashed at that moment. (But the system can alert all the modules that a particular module has crashed; that way, sender can conclude that the receiver will not be able to handle the message, even though it has successfully received it)
I am in favor of Option-2, but I am not sure whether my arguments are solid enough or not. What are your opinions?
EDIT: Upon requests for clarification, here are more specification details:
This is an embedded application that is going to run on Linux OS.
Unfortunately, I cannot tell you about the project itself, but I can say that there are multiple components of the project, each component will be developed by its own team (of 3-4 people), and it is decided that the communication between these components/modules are through some kind of messaging framework.
C/C++ will be used as programming language.
What the 'Module Interface API' will automatically provide to the developers of a module are: (1) An message/event handler thread loop, (2) a synchronous message queue, (3) a function pointer member variable where you can set your message handler function.
Here is what I could come up with:
Multi-process(1) vs. Single-process, multi-threaded(2):
Impact of segmentation faults: In (2), if one module causes segmentation fault, the whole application crashes. In (1), modules have different memory regions and thus only the module that cause segmentation fault will crash.
Message delivery guarantee: In (2), you can assume that message delivery is guaranteed. In (1) the receiving module may crash before the receival or during handling of the message.
Sharing memory between modules: In (2), the whole memory is shared by all modules, so you can directly send message objects. In (1), you need to use 'Shared Memory' between modules.
Messaging implementation: In (2), you can send message objects between modules, in (1) you need to use either of network socket, unix socket, pipes, or message objects stored in a Shared Memory. For the sake of efficiency, storing message objects in a Shared Memory seems to be the best choice.
Pointer usage between modules: In (2), you can use pointers in your message objects. The ownership of heap objects (accessed by pointers in the messages) can be transferred to the receiving module. In (1), you need to manually manage the memory (with custom malloc/free functions) in the 'Shared Memory' region.
Module management: In (2), you are managing just one process. In (1), you need to manage a pool of processes each representing one module.
Sounds like you're implementing Communicating Sequential Processes. Excellent!
Tackling threads vs processes first, I would stick to threads; the context switch times are faster (especially on Windows where process context switches are quite slow).
Second, shared memory vs a message queue; if you're doing full synchronous message passing it'll make no difference to performance. The shared memory approach involves a shared buffer that gets copied to by the sender and copied from by the reader. That's the same amount of work as is required for a message queue. So for simplicity's sake I would stick with the message queue.
in fact you might like to consider using a pipe instead of a message queue. You have to write code to make the pipe synchronous (they're normally asynchronous, which would be Actor Model; message queues can often be set to zero length which does what you want for it to be synchronous and properly CSP), but then you could just as easily use a socket instead. Your program can then become multi-machine distributed should the need arise, but you've not had to change the architecture at all. Also named pipes between processes is an equivalent option, so on platforms where process context switch times are good (e.g. linux) the whole thread vs process question goes away. So working a bit harder to use a pipe gives you very significant scalability options.
Regarding crashing; if you go the multiprocess route and you want to be able to gracefully handle the failure of a process you're going to have to do a bit of work. Essentially you will need a thread at each end of the messaging channel simply to monitor the responsiveness of the other end (perhaps by bouncing a keep-awake message back and forth between themselves). These threads need to feed status info into their corresponding main thread to tell it when the other end has failed to send a keep-awake on schedule. The main thread can then act accordingly. When I did this I had the monitor thread automatically reconnect as and when it could (e.g. the remote process has come back to life), and tell the main thread that too. This means that bits of my system can come and go and the rest of it just copes nicely.
Finally, your actual application processes will end up as a loop, with something like select() at the top to wait for message inputs from all the different channels (and monitor threads) that it is expecting to hear from.
By the way, this sort of thing is frustratingly hard to implement in Windows. There's just no proper equivalent of select() anywhere in any Microsoft language. There is a select() for sockets, but you can't use it on pipes, etc. like you can in Unix. The Cygwin guys had real problems implementing their version of select(). I think they ended up with a polling thread per file descriptor; massively inefficient.
Good luck!
Your question lacks a description of how the "modules" are implemented and what do they do, and possibly a description of the environment in which you are planning to implement all of this.
For example:
If the modules themselves have some requirements which makes them hard to implement as threads (e.g. they use non-thread-safe 3rd party libraries, have global variables, etc.), your message delivery system will also not be implementable with threads.
If you are using an environment such as Python which does not handle thread parallelism very well (because of its global interpreter lock), and running on Linux, you will not gain any performance benefits with threads over processes.
There are more things to consider. If you are just passing data between modules, who says your system needs to use either multiple threads or multiple processes? There are other architectures which do the same thing without either of them, such as event-driven with callbacks (a message receiver can register a callback with your system, which is invoked when a message generator generates a message). This approach will be absolutely the fastest in any case where parallelism isn't important and where receiving code can be invoked in the execution context of the caller.
tl;dr: you have only scratched the surface with your question :)

How do you detect whether the calling thread of a function is already RTAI real-time?

I am working on a big project that uses RTAI both in kernel and user spaces. I won't get into the details of the project, but here is briefly where a problem arises.
In user-space, my project provides a library used by other people to write some software. Those programs themselves may have RTAI real-time threads.
Now, some functions in RTAI require that their calling thread have already rt_thread_inited so if I want to use them in a function in the library, I need to temporarily make the calling thread real-time by calling rt_thread_init and later rt_task_delete.
Now here's the problem:
If the calling thread of my function IS already real-time, then I am rt_thread_initing which I assume simply fails, but then I rt_task_delete and make that thread non-real-time (besides the fact that when the thread itself (assuming I changed nothing) again rt_task_deletes, RTAI crashes.
If the calling thread of my function IS not real-time, everything is ok.
For now, I resorted to taking a parameter in the function so that the calling function tells the library if it is real-time or not. However, I wanted to know if RTAI has a function or something so I could use to automatically detect whether the current thread is real-time or not.
Don't know if there are any RTAI users here (I certainly didn't see the RTAI tag), but hope there would be.
Never tried it myself, so this is a guess - but did you consider using rt_whoami?
Get the task pointer of the current task.
I would imagine it will fail (return NULL?) if you are in a non RT task...
