Difference between forkIO/forkOS and forkProcess? - haskell

I'm not sure what the difference between forkIO/forkOS and forkProcess are in Haskell. To my understanding, forkIO/forkOS are more like threads (analogous to pthread_create in C) whereas forkProcess starts a separate process (analogous to fork).

forkIO creates a lightweight unbound green thread. Green threads have very little overhead; the GHC runtime is capable of efficiently multiplexing millions of green threads over a small pool of OS worker threads. A green thread may live on more than one OS thread over its lifetime.
forkOS creates a bound thread: a green thread for which FFI calls are guaranteed to take place on a single fixed OS thread. Bound threads are typically used when interacting with C libraries which use thread-local data and expect all API calls to come from the same thread. From the paper specifying GHC's bound threads:
The idea is that each bound Haskell thread has a dedicated associated
OS thread. It is guaranteed that any FFI calls made by a bound Haskell
thread are made by its associated OS thread, although pure-Haskell
execution can, of course, be carried out by any OS thread. A group of
foreign calls can thus be guaranteed to be carried out by the same OS
thread if they are all performed in a single bound Haskell thread.
[...]
[F]or each OS thread, there is at most a single bound Haskell thread.
Note that the above quotation does not exclude the possibility that an OS thread associated with a bound thread can act as a worker for unbound Haskell threads. Nor does it guarantee that the bound thread's non-FFI code will execute on any particular thread.
forkProcess creates a new process, just like fork on UNIX.

forkIO creates a lightweight thread managed by Haskell's runtime system. It is unbound, i.e. it can be ran by any OS thread.
forkOS creates a bound thread, meaning it is bound to an actual OS thread. This can be necessary when using C functions for example.
forkProcess forks the current process like fork() in C.

Related

How user threads are really scheduled? How is the OS (kernel) involved in such scheduling?

I’m reading the popular Operating System Concepts book but I can’t get how the user threads are really scheduled. There’s particularly a statement that confuses me :
“User-level threads are managed by a thread library, and the kernel is unaware of them”.
Let’s say I create the process A and 3 threads with the Pthreads library. My assumption is that user threads must be necessarily scheduled by the kernel (OS). Isn’t the OS responsible for allocating CPU? Doesn’t threads have their own registers and stack? So there must be a context switch ( registers switch ) and therefore there also must be some handling by the OS. How can the kernel be unaware of them?
How are user threads exactly scheduled?
In the simplest implementation of user-level threads (a.k.a., "green threads"), there would be a function named yield(). The yield() function is the scheduler. When one thread calls it, it would;
Choose which thread should run next according to the scheduling policy. If it happens to choose the calling thread, it would then simply return. Otherwise, it would...
...Save whatever registers any called function in the high-level language is obliged to save in the saved context area for the calling thread. This would include, at a minimum, the stack pointer. It probably also would include a frame pointer, and maybe a few general purpose registers.
...Restore the registers from the saved context area for the chosen thread. This would include the stack pointer, and since we're changing the stack pointer in mid-stream, so to speak, we'll have to be very careful about where the scheduler's own variables are kept. It won't really work for it to have "local" variables in the usual sense. There's a good chance that at least part of this "context switch" code will have to be written in assembly language,
Finally, all it has to do is a normal function return, and it will be returning from a yield() call in the chosen thread instead of the original thread.
The function to create a new thread would;
Allocate a block of memory for the thread's stack,
Construct an artificial stack frame in the new stack that looks like the following function just called yield() from the line marked "// 1";
void root_function(void (*thread_function)(void* args)) {
yield(); // 1
thread_function(args);
mark_my_own_saved_context_as_dead();
yield(); // 2
}
When the thread_function() returns, the thread will call the mark_my_own_saved_context_as_dead() function to notify the scheduler algorithm that the thread is dead. When the thread called yield() for the last time ("// 2"), then the scheduler algorithm would free up its stack, and clean up whatever else needs to be cleaned up before selecting some other thread to run.
In a typical implementation of green threads, there will be many other places where one thread implicitly yields to another. Any blocking I/O call, for example, or a sleep(), or any call to acquire a mutex or a semaphore.

If you have One-to-One-Mapping of Kernel- and User-Threads, why isn't process be blocked, if it makes a system-call?

I often read that when you have many-to-one-mapping, a sys-call would block the whole process, but a one-to-one would'nt. But why? The Thread that makes the sys-call is blocked anyway and can't make a switch-command to the other user-thread.
The kernel schedules the threads not the processes globally.
In a very general case when a process is made of just its main thread, we use a kind of shortcut in the manner of speaking and say that the kernel schedules those processes, but it is just a (very common) corner case.
Each thread can have its own priority, its own affinity to some CPUs...
But by default these properties are inherited from the main thread in the process it belongs to, and if we don't change them explicitly, we can have the illusion that all these threads inside a single process form one single entity from the scheduler point of view; but it is not the case.
When one thread is blocked on a system call for example, this does not prevent the other threads from being run.
The blocked thread does not decide of anything concerning the next thread to be run (except if we explicitly build a dedicated applicative synchronisation mechanism with cond-vars...).
The switch to another thread, in the same or another process, is entirely decided by the OS scheduler.
Another confusing situation for example is when a segmentation fault occurs in a thread.
Since the address space is shared between all the threads inside a single process, this is the process as a whole that must be terminated.
Then all its threads disappear at once, which gives again the illusion that all these threads form one single entity from the scheduler point of view; but it is more related to address space management than scheduling.
note: there may exist some user-space implementations of threads in which the OS scheduler have no way to consider these threads as distinct (it sees only the main thread of the process).
Depending on the internal details of such user-space implementations, a blocking system call could lead to blocking the entire process or not.
Nowadays, it is much more common to rely on the native threads provided by the kernel.
I think there's one very specific thing that you're missing.
When you have a one-to-one mapping of user threads to kernel threads, that means that each user thread has its own kernel thread. There's no "switch-command to the other user thread" because the other user thread is another kernel thread, and it's the kernel's job to switch between kernel threads. On modern operating systems, the kernel can easily switch kernel threads when a thread is blocked in a system call.
When you have a many-to-one mapping of user threads, that means a single kernel thread is expected to run code for more than one user thread. Here, the user code has to do something to cause that same kernel thread to execute code for another user thread. It can't do that if it's blocked in a system call.

Do IO operations run in green threads?

Given the example from Control.Concurrent.Async:
do a1 <- async (getURL url1)
a2 <- async (getURL url2)
page1 <- wait a1
page2 <- wait a2
Do the two getURL calls run on different OS threads, or just different green threads?
In case my question doesn't make sense... say the program is running on one OS thread only, will these calls still be made at the same time? Do blocking IO operations block the whole OS thread and all the green threads on that OS thread, or just one green thread?
From the documentation of Control.Concurrent.Async
This module provides a set of operations for running IO operations asynchronously and waiting for their results. It is a thin layer over the basic concurrency operations provided by Control.Concurrent.
and Control.Concurrent
Scheduling of Haskell threads is done internally in the Haskell runtime system, and doesn't make use of any operating system-supplied thread packages.
This last may be a bit misleading if not interpreted carefully: although the scheduling of Haskell threads -- that is, the choice of which Haskell code to run next -- is done without using any OS facilities, GHC can and does use multiple OS threads to actually execute whatever code is chosen to be run, at least when using the threaded runtime system.
It should all be green threads.
If your program is compiled (or rather, linked) with the single-threaded RTS, then all green threads run in a single OS thread. If your program is compiled (linked) with the multi-threaded RTS, then some arbitrary number of green threads are scheduled across (by default) one OS thread per CPU core.
As far as I'm aware, in either case blocking I/O calls should only block one green thread. Other green threads should be completely unaffected.
This isn't as simple as the question seems to imply. Haskell is a more capable programming language than most you would have run into. In particular, IO operations that appear to block from an internal point of view may be implemented as the sequence "start non-blocking IO operation, suspend thread, wait for that IO operation to complete in an IO manager that covers multiple Haskell threads, queue thread for resumption once the IO device is ready."
See waitRead# and waitWrite# for the api that provides that functionality with the standard global IO manager.
Using green threads or not is mostly irrelevant with this pattern. IO operations can be written to use non-blocking IO behind the scenes, with proper multiplexing, while appearing to present a blocking interface to their users.
Unfortunately, it's not that simple either. The fact is that OS limitations get in the way. Until very recently (I think the 5.1 kernel was released yesterday, maybe?), Linux has provided no good interface for non-blocking disk operations. Sure there were things that looked like they should work, but in practice they weren't very good. So disk reads/writes are actual blocking operations in GHC. (Not just on linux, either. GHC doesn't have a lot of developers supporting it, so a lot of things are written with the same code that works on linux, even if there are other alternatives.)
But it's not even as simple as "network operations are hidden non-blocking, disk operations are blocking". At least maybe not. I don't actually know, because it's so hard to find documentation on the non-threaded runtime. I know the threaded runtime actually maintains a separate thread pool for performing FFI calls marked as "safe", which prevents them from blocking execution of green threads. I don't know if the same is true with the non-threaded runtime.
But for your example, I can say - assuming getURL uses the standard network library (it's a hypothetical function anyway), it'll be doing non-blocking IO with proper multiplexing behind the scenes. So those operations will be truly concurrent, even without the threaded runtime.

What's the relationship between forkOn and the -qm RTS flag?

Suppose that I have a program that only spawn threads using forkOn. In such scenario, there will be no load balancing of Haskell threads among different capabilities. So is there a difference in executing this program with and without +RTS -qm?
According to the documentation, -qm disables the thread migration, which I think it has a similar effect of using only forkOn. Am I correct in this assumption? I'm sure not how clear the documentation is in this regard.
I'm no expert on the subject, but I'll give it a shot anyway.
GHC (The Haskell compiler) can have one or multiple HECs (Haskell Execution Context, also known as cap or capability). With runtime flag +RTS -N <number> or setNumCapabilities function it's possible to define how many those HECs are available for program. One HEC is one operating system thread. The runtime scheduler distributes Haskell lightweight threads between HECs.
With forkOn function, it's possible to select which HEC the thread is ran on. getNumCapabilities returns the number of capabilities (HECs).
Thread migration means that Haskell threads can be migrated (moved) to another HEC. The runtime flag +RTS -qm disables this thread migration.
Documentation about forkOn states that
Like forkIO, but lets you specify on which capability the thread should run. Unlike a forkIO thread, a thread created by forkOn will stay on the same capability for its entire lifetime (forkIO threads can migrate between capabilities according to the scheduling policy).
so with forkOn it's possible to select one single HEC the thread is ran in.
Compared to forkIO which states that
Foreign calls made by this thread are not guaranteed to be made by any particular OS thread; if you need foreign calls to be made by a particular OS thread, then use forkOS instead.
Now, are forkOn function and +RTS -qm (disabled thread migration) the same thing? Probably not. With forkOn user explicitly selects which HEC the Haskell thread is ran on (for example it's possible to put all Haskell threads into same HEC). With +RTS -qm and forkIO the Haskell threads don't switch between HECs, but there's no way knowing which HEC the Haskell thread spawned by forkIO ends in.
References:
Runtime Support for Multicore Haskell
The GHC scheduler
GHC(STG,Cmm,asm) illustrated

Is ThreadID consistent when shuffling Haskell threads around OS threads?

In Haskell forkIO creates an unbound (Haskell) thread, and forkOS creates a bound (native) thread. The answer to a previous question here that I had mentioned that Haskell threads are not guaranteed to stay on the same OS thread, which seems to be supported by the documentation for the Control.Concurrent module. My question is, if a running Haskell thread gets swapped to another OS thread, will its ThreadID remain the same?
Yes.
A ThreadId is an abstract type representing a handle to a thread.
This is how you send asynchronous signals to specific threads: with the ThreadId. It does not matter which OS thread is involved, and it is often quite likely that the targeted thread is not bound to any OS thread at all (e.g., it is sleeping).
The existence of "OS threads" is somewhat an implementation detail, although you'll need to manage them if you use the FFI with certain libraries. Otherwise, you can mostly ignore OS threads in your code.

Resources