Main thread context returning in a different hardware thread, undefined behaviour? - multithreading

I'm currently working on a concurrency library for C++ with green threads using work stealing for load balancing between schedulers on multiple hardware threads.
I've pinned the main context to its hardware thread, meaning it cannot be stolen and therefore migrate to other hardware threads.
I believe I've read this somewhere that if the main context where to be stolen and return (exit) in different hardware thread from where it originated from it is undefined behaviour. But, I cannot seem to find any sources on this now.
Is this undefined behaviour? Citing sources would be perfect.

First of all, there are no green threads in standard C++. So the whole thing with making your own stacks and replacing them is undefined behaviour. UB doesn't mean that something will not work. It means that there is no guarantee that it will work on any standard compliant system.
Green threads/coroutines are balancing on the edge of UB, making implementation for every supported platform. They work bacause they were written for specific platforms and have kludges to overcome diferent pitfals on these platforms.

Related

Doing UI on a background thread

The SDL documentation for threading states:
NOTE: You should not expect to be able to create a window, render, or receive events on any thread other than the main one.
The glfw documentation for glfwCreateWindow states:
Thread safety: This function must only be called from the main thread.
I have read about issues regarding the glut library from people who have tried to run the windowing functions on a second thread.
I could go on with these examples, but I think you get the point I'm trying to make. A lot of cross-platform libraries don't allow you to create a window on a background thread.
Now, two of the libraries I mentioned are designed with OpenGL in mind, and I get that OpenGL is not designed for multithreading and you shouldn't do rendering on multiple threads. That's fine. The thing that I don't understand is why the rendering thread (the single thread that does all the rendering) has to be the main one of the application.
As far as I know, neither Windows nor Linux nor MacOS impose any restrictions on which threads can create windows. I do know that windows have affinity to the thread that creates them (only that thread can receive input for them, etc.); but still that thread does not need to be the main one.
So, I have three questions:
Why do these libraries impose such restrictions? Is it because there is some obscure operating system that mandates that all windows be created on the main thread, and so all operating systems have to pay the price? (Or did I get it wrong?)
Why do we have this imposition that you should not do UI on a background thread? What do threads have to do with windowing, anyways? Is it not a bad abstraction to tie your logic to a specific thread?
If this is what we have and can't get rid of it, how do I overcome this limitation? Do I make a ThreadManager class and yield the main thread to it so it can schedule what needs to be done in the main thread and what can be done in a background thread?
It would be amazing if someone could shed some light on this topic. All the advice I see thrown around is to just do input and UI both on the main thread. But that's just an arbitrary restriction if there isn't a technical reason why it isn't possible to do otherwise.
PS: Please note that I am looking for a cross platform solution. If it can't be found, I'll stick to doing UI on the main thread.
While I'm not quite up to date on the latest releases of MacOS/iOS, as of 2020 Apple UIKit and AppKit were not thread safe. Only one thread can safely change UI objects, and unless you go to a lot of trouble that's going to be the main thread. Even if you do go to all the trouble of closing the window manager connection etc etc you're still going to end up with one thread only doing UI. So the limitation still applies on at least one major system.
While it's possibly unsafe to directly modify the contents of a window from any other thread, you can do software rendering to an offscreen bitmap image from any thread you like, taking as long as you like. Then hand the finished image over to the main thread for rendering. (The possibly is why cross platform toolkits disallow/tell you not to. Sometimes it might work, but you can't say why, or even that it will keep working.)
With Vulkan and DirectX 12 (and I think but am not sure Metal) you can render from multiple threads. Woohoo! Of course now you have to figure out how to do all the coordination and locking and cross-synching without making the whole thing slower than single threaded, but at least you have the option to try.
Adding to the excellent answer by Matt, with Qt programs you can use invokeMethod and postEvent to have background threads update the UI safely.
It's highly unlikely that any of these frameworks actually care about which thread is the 'main thread', i.e., the one that called the entry point to your code. The real restriction is that you have to do all your UI work on the thread that initialized the framework, i.e., the one that called SDL_Init in your case. You will usually do this in your main thread. Why not?
Multithreaded code is difficult to write and difficult to understand, and in UI work, introducing multithreading makes it difficult to reason about when things happen. A UI is a very stateful thing, and when you're writing UI code, you usually need to have a very good idea about what has happened already and what will happen next -- those things are often undefined when multithreading is involved. Also, users are slow, so multithreading the UI is not really necessary for performance in normal cases. Because of all this, making a UI framework thread-safe isn't usually considered beneficial. (multithreading compute-intensive parts of your rendering pipeline is a different thing)
Single-threaded UI frameworks have a dispatcher of some sort that you can use to enqueue activities that should happen on the main thread when it next has time. In SDL, you use SDL_PushEvent for this. You can call that from any thread.

How do sandboxing environments recover from faults?

In JavaScript runtimes and other JIT-compiled environments where code needs to be sandboxed, how can the larger program recover when the dynamically loaded code faults?
For instance, the code that SpiderMonkey generates for WebAssembly contains instructions which throw SIGILL when they are executed (e.g., to do a bounds check). When such a fault is thrown, I suppose the exception handler of the JavaScript engine is called. But such a signal handler is very restricted in its abilities. To jump out of the bad dynamically generated code it would typically use siglongjmp, but if I understand the notes at the bottom of man signal-safety correctly, if that happens, then no other unsafe function may be used by the program ever again. Such a restriction is of course unreasonable for a JS runtime (e.g.).
So how do sandboxing environments recover from child faults without losing abilities (in particular, becoming restricted to only async-safe functions)?
One thought I had is that perhaps the sandboxed code is ran in another thread with its own signal handler, so that the main thread does not lose any abilities when the child thread receives a signal. But I don't know whether that is correct; I find it hard to find documentation on how signals and threads interact exactly.
As you will see this question is tagged posix, but I am in principle also interested in the Windows approach.

Why can I not blocking main thread in WinRT(Windows Store App)?

This question is not about "should I block my main thread" as it is generally a bad idea to block a main/STA/UI thread-for messaging and UI operations, but why WinRT C++/cx doesn't allow any blocking of the main thread compared to iOS, Android, and even C#(await doesn't actually block though).
Is there a fundamental difference in the way Android or iOS block the main thread? Why is WinRT the only platform that doesn't allow any form of blocking synchronization?
EDIT: I'm aware of co-await in VS2015, but due to backward compatibility my company still uses VS2013.
Big topic, at break-neck speed. This continues a tradition that started a long time ago in COM. WinRT inherits about all of the same concepts, it did get cleaned-up considerably. The fundamental design consideration is that thread-safety is one of the most difficult aspects of library design. And that any library has classes that are fundamentally thread-unsafe and if the consumer of the library is not aware of it then he'll easily create a nasty bug that is excessively difficult to diagnose.
This is an ugly problem for a company that relies on a closed-source business model and a 1-800 support phone number. Such phone calls can be very unpleasant, threading bugs invariably require telling a programmer "you can't do that, you'll have to rewrite your code". Rarely an acceptable answer, not at SO either :)
So thread-safety is not treated as an afterthought that the programmer needs to get right by himself. A WinRT class explicitly specifies whether or not it is thread-safe (the ThreadingModel attribute) and, if it is used in an unsafe way anyway, what should happen to make it thread-safe (the MarshallingBehavior attribute). Mostly a runtime detail, do note how compiler warning C4451 can even make these attributes produce a compile-time diagnostic.
The "used in an unsafe way anyway" clause is what you are asking about. WinRT can make a class that is not thread-safe safe by itself but there is one detail that it can't figure out by itself. To make it safe, it needs to know whether the thread that creates an object of the class can support the operating system provided way to make the object safe. And if the thread doesn't then the OS has to create a thread by itself to give the object a safe home. Solves the problem but that is pretty inefficient since every method call has to be marshalled.
You have to make a promise, cross-your-heart-hope-to-die style. The operating system can avoid creating a thread if your thread solves the producer-consumer problem. Better known as "pumping the message loop" in Windows vernacular. Something the OS can't figure out by itself since you typically don't start to pump until after you created a thread-unsafe object.
And just one more promise you make, you also promise that the consumer doesn't block and stops accepting messages from the message queue. Blocking is bad, implicit is that worker threads can't continue while the consumer is blocking. And worse, much worse, blocking is pretty likely to cause deadlock. The threading problem that's always a significant risk when there are two synchronization objects involved. One that you block on, the other that's hidden inside the OS that is waiting for the call to complete. Diagnosing a deadlock when you can't see the state of one of the sync objects that caused the deadlock is generally unpleasant.
Emphasis on promise, there isn't anything the OS can do if you break the promise and block anyway. It will let you, and it doesn't necessarily have to be fatal. It often isn't and doesn't cause anything more than an unresponsive UI. Different in managed code that runs on the CLR, if it blocks then the CLR will pump. Mostly works, but can cause some pretty bewildering re-entrancy bugs. That mechanism doesn't exist in native C++. Deadlock isn't actually that hard to diagnose, but you do have to find the thread back that's waiting for the STA thread to get back to business. Its stack trace tells the tale.
Do beware of these attributes when you use C++/CX. Unless you explicitly provide them, you'll create a class that's always considered thread-safe (ThreadingModel = Both, MarshallingType = Standard). An aspect that is not often actually tested, it will be the client code that ruins that expectation. Well, you'll get a phone call and you have to give an unpleasant answer :) Also note that OSX and Android are hardly the only examples of runtime systems that don't provide the WinRT guarantees, the .NET Framework does not either.
In a nutshell: because the policy for WinRT apps was "thou shalt not block the UI thread" and the C++ PPL runtime enforces this policy whilst the .NET runtime does not -- look at ppltasks.h and search for prevent Windows Runtime STA threads from blocking the UI. (Note that although .NET doesn't enforce this policy, it lets you accidentally deadlock yourself instead).
If you have to block the thread, there are ways to do it using Win32 IPC mechanisms (like waiting on an event that will be signaled by your completion handler) but the general guidance is still "don't do that" because it has a poor UX.

What is the downside of XInitThreads()?

I know XInitThreads() will allow me to do calls to the X server from threads other than the main thread, and that concurrent thread support in Xlib is necessary if I want to make OpenGL calls from secondary threads using Qt. I have such a need for my application, but in a very rare context. Unfortunately, XInitThreads() needs to be called at the very beginning of my application's execution and will therefore affect it whether I need it or not for a particular run (I have no way of knowing before I run the app if I do need multithreaded OpenGL support or not).
I'm pretty sure the overall behavior the the application will remain unchanged if I uselessly call XInitThread(), but programming is all about tradeoffs, and I'm pretty sure there's a reason multithread support is not the default behavior for Xlib.
The man page says that it is recommended that single-threaded programs not call this function, but it doesn't say why. What's the tradeoff when calling XInitThreads()?
It creates a global lock and also one on each Display, so each call into Xlib will do a lock/unlock. If you instead do your own locking (keep your Xlib usage in a single thread at a time with your own locks) then you could in theory have less locking overhead by taking your lock, doing a lot of Xlib stuff at once, and then dropping your lock. Or in practice most toolkits use a model where they don't lock at all but just require apps to use X only in the main UI thread.
Since many Xlib calls are blocking (they wait for a reply from the server) lock contention could be an issue.
There are also some semantic pains with locking per-Xlib-call on Display; between each Xlib call on thread A, in theory any other Xlib call could have been made on thread B. So threads can stomp on each other in terms of confusing the display state, even though only one of them makes an Xlib request at a time. That is, XInitThreads() is preventing crashes/corruption due to concurrent Display access, but it isn't really taking care of any of the semantic considerations of having multiple threads sharing an X server connection.
I think the need to make your own semantic sense out of concurrent display access is one reason that people don't bother with XInitThreads per-Xlib-call locking. Since they end up with application-level or toolkit-level locks anyway.
One approach here is to open a separate Display connection in each thread, which could make sense depending on what you are doing.
Another approach is to use the newer xcb API rather than Xlib; xcb is designed from scratch to be multithreaded and nonblocking.
Historically there have also been some bugs in XInitThreads() in some OS/Xlib versions, though I don't remember any specifics, I do remember seeing those go by.

Why might threads be considered "evil"?

I was reading the SQLite FAQ, and came upon this passage:
Threads are evil. Avoid them.
I don't quite understand the statement "Thread are evil". If that is true, then what is the alternative?
My superficial understanding of threads is:
Threads make concurrence happen. Otherwise, the CPU horsepower will be wasted, waiting for (e.g.) slow I/O.
But the bad thing is that you must synchronize your logic to avoid contention and you have to protect shared resources.
Note: As I am not familiar with threads on Windows, I hope the discussion will be limited to Linux/Unix threads.
When people say that "threads are evil", the usually do so in the context of saying "processes are good". Threads implicitly share all application state and handles (and thread locals are opt-in). This means that there are plenty of opportunities to forget to synchronize (or not even understand that you need to synchronize!) while accessing that shared data.
Processes have separate memory space, and any communication between them is explicit. Furthermore, primitives used for interprocess communication are often such that you don't need to synchronize at all (e.g. pipes). And you can still share state directly if you need to, using shared memory, but that is also explicit in every given instance. So there are fewer opportunities to make mistakes, and the intent of the code is more explicit.
Simple answer the way I understand it...
Most threading models use "shared state concurrency," which means that two execution processes can share the same memory at the same time. If one thread doesn't know what the other is doing, it can modify the data in a way that the other thread doesn't expect. This causes bugs.
Threads are "evil" because you need to wrap your mind around n threads all working on the same memory at the same time, and all of the fun things that go with it (deadlocks, racing conditions, etc).
You might read up about the Clojure (immutable data structures) and Erlang (message passsing) concurrency models for alternative ideas on how to achieve similar ends.
What makes threads "evil" is that once you introduce more than one stream of execution into your program, you can no longer count on your program to behave in a deterministic manner.
That is to say: Given the same set of inputs, a single-threaded program will (in most cases) always do the same thing.
A multi-threaded program, given the same set of inputs, may well do something different every time it is run, unless it is very carefully controlled. That is because the order in which the different threads run different bits of code is determined by the OS's thread scheduler combined with a system timer, and this introduces a good deal of "randomness" into what the program does when it runs.
The upshot is: debugging a multi-threaded program can be much harder than debugging a single-threaded program, because if you don't know what you are doing it can be very easy to end up with a race condition or deadlock bug that only appears (seemingly) at random once or twice a month. The program will look fine to your QA department (since they don't have a month to run it) but once it's out in the field, you'll be hearing from customers that the program crashed, and nobody can reproduce the crash.... bleah.
To sum up, threads aren't really "evil", but they are strong juju and should not be used unless (a) you really need them and (b) you know what you are getting yourself into. If you do use them, use them as sparingly as possible, and try to make their behavior as stupid-simple as you possibly can. Especially with multithreading, if anything can go wrong, it (sooner or later) will.
I would interpret it another way. It's not that threads are evil, it's that side-effects are evil in a multithreaded context (which is a lot less catchy to say).
A side effect in this context is something that affects state shared by more than one thread, be it global or just shared. I recently wrote a review of Spring Batch and one of the code snippets used is:
private static Map<Long, JobExecution> executionsById = TransactionAwareProxyFactory.createTransactionalMap();
private static long currentId = 0;
public void saveJobExecution(JobExecution jobExecution) {
Assert.isTrue(jobExecution.getId() == null);
Long newId = currentId++;
jobExecution.setId(newId);
jobExecution.incrementVersion();
executionsById.put(newId, copy(jobExecution));
}
Now there are at least three serious threading issues in less than 10 lines of code here. An example of a side effect in this context would be updating the currentId static variable.
Functional programming (Haskell, Scheme, Ocaml, Lisp, others) tend to espouse "pure" functions. A pure function is one with no side effects. Many imperative languages (eg Java, C#) also encourage the use of immutable objects (an immutable object is one whose state cannot change once created).
The reason for (or at least the effect of) both of these things is largely the same: they make multithreaded code much easier. A pure function by definition is threadsafe. An immutable object by definition is threadsafe.
The advantage processes have is that there is less shared state (generally). In traditional UNIX C programming, doing a fork() to create a new process would result in shared process state and this was used as a means of IPC (inter-process communication) but generally that state is replaced (with exec()) with something else.
But threads are much cheaper to create and destroy and they take less system resources (in fact, the operating itself may have no concept of threads yet you can still create multithreaded programs). These are called green threads.
The paper you linked to seems to explain itself very well. Did you read it?
Keep in mind that a thread can refer to the programming-language construct (as in most procedural or OOP languages, you create a thread manually, and tell it to executed a function), or they can refer to the hardware construct (Each CPU core executes one thread at a time).
The hardware-level thread is obviously unavoidable, it's just how the CPU works. But the CPU doesn't care how the concurrency is expressed in your source code. It doesn't have to be by a "beginthread" function call, for example. The OS and the CPU just have to be told which instruction threads should be executed.
His point is that if we used better languages than C or Java with a programming model designed for concurrency, we could get concurrency basically for free. If we'd used a message-passing language, or a functional one with no side-effects, the compiler would be able to parallelize our code for us. And it would work.
Threads aren't any more "evil" than hammers or screwdrivers or any other tools; they just require skill to utilize. The solution isn't to avoid them; it's to educate yourself and up your skill set.
Creating a lot of threads without constraint is indeed evil.. using a pooling mechanisme (threadpool) will mitigate this problem.
Another way threads are 'evil' is that most framework code is not designed to deal with multiple threads, so you have to manage your own locking mechanisme for those datastructures.
Threads are good, but you have to think about how and when you use them and remember to measure if there really is a performance benefit.
A thread is a bit like a light weight process. Think of it as an independent path of execution within an application. The thread runs in the same memory space as the application and therefore has access to all the same resources, global objects and global variables.
The good thing about them: you can parallelise a program to improve performance. Some examples, 1) In an image editing program a thread may run the filter processing independently of the GUI. 2) Some algorithms lend themselves to multiple threads.
Whats bad about them? if a program is poorly designed they can lead to deadlock issues where both threads are waiting on each other to access the same resource. And secondly, program design can me more complex because of this. Also, some class libraries don't support threading. e.g. the c library function "strtok" is not "thread safe". In other words, if two threads were to use it at the same time they would clobber each others results. Fortunately, there are often thread safe alternatives... e.g. boost library.
Threads are not evil, they can be very useful indeed.
Under Linux/Unix, threading hasn't been well supported in the past although I believe Linux now has Posix thread support and other unices support threading now via libraries or natively. i.e. pthreads.
The most common alternative to threading under Linux/Unix platforms is fork. Fork is simply a copy of a program including it's open file handles and global variables. fork() returns 0 to the child process and the process id to the parent. It's an older way of doing things under Linux/Unix but still well used. Threads use less memory than fork and are quicker to start up. Also, inter process communications is more work than simple threads.
In a simple sense you can think of a thread as another instruction pointer in the current process. In other words it points the IP of another processor to some code in the same executable. So instead of having one instruction pointer moving through the code there are two or more IP's executing instructions from the same executable and address space simultaneously.
Remember the executable has it's own address space with data / stack etc... So now that two or more instructions are being executed simultaneously you can imagine what happens when more than one of the instructions wants to read/write to the same memory address at the same time.
The catch is that threads are operating within the process address space and are not afforded protection mechanisms from the processor that full blown processes are. (Forking a process on UNIX is standard practice and simply creates another process.)
Out of control threads can consume CPU cycles, chew up RAM, cause execeptions etc.. etc.. and the only way to stop them is to tell the OS process scheduler to forcibly terminate the thread by nullifying it's instruction pointer (i.e. stop executing). If you forcibly tell a CPU to stop executing a sequence of instructions what happens to the resources that have been allocated or are being operated on by those instructions? Are they left in a stable state? Are they properly freed? etc...
So, yes, threads require more thought and responsibility than executing a process because of the shared resources.
For any application that requires stable and secure execution for long periods of time without failure or maintenance, threads are always a tempting mistake. They invariably turn out to be more trouble than they are worth. They produce rapid results and prototypes that seem to be performing correctly but after a couple weeks or months running you discover that they have critical flaws.
As mentioned by another poster, once you use even a single thread in your program you have now opened a non-deterministic path of code execution that can produce an almost infinite number of conflicts in timing, memory sharing and race conditions. Most expressions of confidence in solving these problems are expressed by people who have learned the principles of multithreaded programming but have yet to experience the difficulties in solving them.
Threads are evil. Good programmers avoid them wherever humanly possible. The alternative of forking was offered here and it is often a good strategy for many applications. The notion of breaking your code down into separate execution processes which run with some form of loose coupling often turns out to be an excellent strategy on platforms that support it. Threads running together in a single program is not a solution. It is usually the creation of a fatal architectural flaw in your design that can only be truly remedied by rewriting the entire program.
The recent drift towards event oriented concurrency is an excellent development innovation. These kinds of programs usually prove to have great endurance after they are deployed.
I've never met a young engineer who didn't think threads were great. I've never met an older engineer who didn't shun them like the plague.
Being an older engineer, I heartily agree with the answer by Texas Arcane.
Threads are very evil because they cause bugs that are extremely difficult to solve. I have literally spent months solving sporadic race-conditions. One example caused trams to suddenly stop about once a month in the middle of the road and block traffic until towed away. Luckily I didn't create the bug, but I did get to spend 4 months full-time to solve it...
It's a tad late to add to this thread, but I would like to mention a very interesting alternative to threads: asynchronous programming with co-routines and event loops. This is being supported by more and more languages, and does not have the problem of race conditions like multi-threading has.
It can replace multi-threading in cases where it is used to wait on events from multiple sources, but not where calculations need to be performed in parallel on multiple CPU cores.

Resources