In Haskell forkIO creates an unbound (Haskell) thread, and forkOS creates a bound (native) thread. The answer to a previous question here that I had mentioned that Haskell threads are not guaranteed to stay on the same OS thread, which seems to be supported by the documentation for the Control.Concurrent module. My question is, if a running Haskell thread gets swapped to another OS thread, will its ThreadID remain the same?
Yes.
A ThreadId is an abstract type representing a handle to a thread.
This is how you send asynchronous signals to specific threads: with the ThreadId. It does not matter which OS thread is involved, and it is often quite likely that the targeted thread is not bound to any OS thread at all (e.g., it is sleeping).
The existence of "OS threads" is somewhat an implementation detail, although you'll need to manage them if you use the FFI with certain libraries. Otherwise, you can mostly ignore OS threads in your code.
Related
I’m reading the popular Operating System Concepts book but I can’t get how the user threads are really scheduled. There’s particularly a statement that confuses me :
“User-level threads are managed by a thread library, and the kernel is unaware of them”.
Let’s say I create the process A and 3 threads with the Pthreads library. My assumption is that user threads must be necessarily scheduled by the kernel (OS). Isn’t the OS responsible for allocating CPU? Doesn’t threads have their own registers and stack? So there must be a context switch ( registers switch ) and therefore there also must be some handling by the OS. How can the kernel be unaware of them?
How are user threads exactly scheduled?
In the simplest implementation of user-level threads (a.k.a., "green threads"), there would be a function named yield(). The yield() function is the scheduler. When one thread calls it, it would;
Choose which thread should run next according to the scheduling policy. If it happens to choose the calling thread, it would then simply return. Otherwise, it would...
...Save whatever registers any called function in the high-level language is obliged to save in the saved context area for the calling thread. This would include, at a minimum, the stack pointer. It probably also would include a frame pointer, and maybe a few general purpose registers.
...Restore the registers from the saved context area for the chosen thread. This would include the stack pointer, and since we're changing the stack pointer in mid-stream, so to speak, we'll have to be very careful about where the scheduler's own variables are kept. It won't really work for it to have "local" variables in the usual sense. There's a good chance that at least part of this "context switch" code will have to be written in assembly language,
Finally, all it has to do is a normal function return, and it will be returning from a yield() call in the chosen thread instead of the original thread.
The function to create a new thread would;
Allocate a block of memory for the thread's stack,
Construct an artificial stack frame in the new stack that looks like the following function just called yield() from the line marked "// 1";
void root_function(void (*thread_function)(void* args)) {
yield(); // 1
thread_function(args);
mark_my_own_saved_context_as_dead();
yield(); // 2
}
When the thread_function() returns, the thread will call the mark_my_own_saved_context_as_dead() function to notify the scheduler algorithm that the thread is dead. When the thread called yield() for the last time ("// 2"), then the scheduler algorithm would free up its stack, and clean up whatever else needs to be cleaned up before selecting some other thread to run.
In a typical implementation of green threads, there will be many other places where one thread implicitly yields to another. Any blocking I/O call, for example, or a sleep(), or any call to acquire a mutex or a semaphore.
Given the example from Control.Concurrent.Async:
do a1 <- async (getURL url1)
a2 <- async (getURL url2)
page1 <- wait a1
page2 <- wait a2
Do the two getURL calls run on different OS threads, or just different green threads?
In case my question doesn't make sense... say the program is running on one OS thread only, will these calls still be made at the same time? Do blocking IO operations block the whole OS thread and all the green threads on that OS thread, or just one green thread?
From the documentation of Control.Concurrent.Async
This module provides a set of operations for running IO operations asynchronously and waiting for their results. It is a thin layer over the basic concurrency operations provided by Control.Concurrent.
and Control.Concurrent
Scheduling of Haskell threads is done internally in the Haskell runtime system, and doesn't make use of any operating system-supplied thread packages.
This last may be a bit misleading if not interpreted carefully: although the scheduling of Haskell threads -- that is, the choice of which Haskell code to run next -- is done without using any OS facilities, GHC can and does use multiple OS threads to actually execute whatever code is chosen to be run, at least when using the threaded runtime system.
It should all be green threads.
If your program is compiled (or rather, linked) with the single-threaded RTS, then all green threads run in a single OS thread. If your program is compiled (linked) with the multi-threaded RTS, then some arbitrary number of green threads are scheduled across (by default) one OS thread per CPU core.
As far as I'm aware, in either case blocking I/O calls should only block one green thread. Other green threads should be completely unaffected.
This isn't as simple as the question seems to imply. Haskell is a more capable programming language than most you would have run into. In particular, IO operations that appear to block from an internal point of view may be implemented as the sequence "start non-blocking IO operation, suspend thread, wait for that IO operation to complete in an IO manager that covers multiple Haskell threads, queue thread for resumption once the IO device is ready."
See waitRead# and waitWrite# for the api that provides that functionality with the standard global IO manager.
Using green threads or not is mostly irrelevant with this pattern. IO operations can be written to use non-blocking IO behind the scenes, with proper multiplexing, while appearing to present a blocking interface to their users.
Unfortunately, it's not that simple either. The fact is that OS limitations get in the way. Until very recently (I think the 5.1 kernel was released yesterday, maybe?), Linux has provided no good interface for non-blocking disk operations. Sure there were things that looked like they should work, but in practice they weren't very good. So disk reads/writes are actual blocking operations in GHC. (Not just on linux, either. GHC doesn't have a lot of developers supporting it, so a lot of things are written with the same code that works on linux, even if there are other alternatives.)
But it's not even as simple as "network operations are hidden non-blocking, disk operations are blocking". At least maybe not. I don't actually know, because it's so hard to find documentation on the non-threaded runtime. I know the threaded runtime actually maintains a separate thread pool for performing FFI calls marked as "safe", which prevents them from blocking execution of green threads. I don't know if the same is true with the non-threaded runtime.
But for your example, I can say - assuming getURL uses the standard network library (it's a hypothetical function anyway), it'll be doing non-blocking IO with proper multiplexing behind the scenes. So those operations will be truly concurrent, even without the threaded runtime.
I'm not sure what the difference between forkIO/forkOS and forkProcess are in Haskell. To my understanding, forkIO/forkOS are more like threads (analogous to pthread_create in C) whereas forkProcess starts a separate process (analogous to fork).
forkIO creates a lightweight unbound green thread. Green threads have very little overhead; the GHC runtime is capable of efficiently multiplexing millions of green threads over a small pool of OS worker threads. A green thread may live on more than one OS thread over its lifetime.
forkOS creates a bound thread: a green thread for which FFI calls are guaranteed to take place on a single fixed OS thread. Bound threads are typically used when interacting with C libraries which use thread-local data and expect all API calls to come from the same thread. From the paper specifying GHC's bound threads:
The idea is that each bound Haskell thread has a dedicated associated
OS thread. It is guaranteed that any FFI calls made by a bound Haskell
thread are made by its associated OS thread, although pure-Haskell
execution can, of course, be carried out by any OS thread. A group of
foreign calls can thus be guaranteed to be carried out by the same OS
thread if they are all performed in a single bound Haskell thread.
[...]
[F]or each OS thread, there is at most a single bound Haskell thread.
Note that the above quotation does not exclude the possibility that an OS thread associated with a bound thread can act as a worker for unbound Haskell threads. Nor does it guarantee that the bound thread's non-FFI code will execute on any particular thread.
forkProcess creates a new process, just like fork on UNIX.
forkIO creates a lightweight thread managed by Haskell's runtime system. It is unbound, i.e. it can be ran by any OS thread.
forkOS creates a bound thread, meaning it is bound to an actual OS thread. This can be necessary when using C functions for example.
forkProcess forks the current process like fork() in C.
can any one please tell me. Are all term "kernel thread", "native thread" and "Os thread" represent kernel thread? Or they are different? If they are different what is relationship among all?
There's no real standard for that. Terminology varies depending on context. However I'll try to explain the different kind of threads that I know of (and add fibers just for completeness as I've seen people call them threads).
-- Threading within the kernel
These are most likely what your kernel thread term refers to. They only exist at the kernel level. They allow (a somewhat limited) parallel execution of the kernel code itself.
-- Application threading
These are what the term thread generally means. They are separate threads of parallel execution which may be scheduled on different processors, that share the same address space and are handled as a single process by the operating system.
The POSIX standard defines the properties threads should have in POSIX compliant systems (in fact the libraries and how each library entry is supposed to behave). Windows threading model is extremely similar to the POSIX one and, AFAIK, it's safe to talk of threading in general the way I did: parallel execution that happens within the same process and can be scheduled on different processors.
-- Ancient linux threading
In the early days the linux kernel did not support threading. However it did support creating two different processes that shared the same address space. There was a project (LinuxThreads) that tried to use this to implement some sort of threading abilities.
The problem was, of course, that the kernel would still treat them as separate processes. The result was therefore not POSIX compliant. For example the treatment of signals was problematic (as signals are a process level concept). It was IN THIS VERY SPECIFIC CONTEXT that the term "native" started to become common. It refers to "native" as in "kernel level" support for threading.
With help from the kernel actual support for POSIX compliant threading was finally implemented. Today that's the only kind of threading that really deserves the name. The old way is, in fact, not real threading at all. It's a sharing of the address space by multiple processes, and as such should be referred to. But there was a time when that was referred to as threading (as it was the only thing you could do with Linux).
-- User level and Green threading
This is another context where "native" is often used to contrast to another threading model. Green threads and userl level threads are threads that do happen within the same process, but they are totally handled at userlevel. Green threads are used in virtual machines (especially those that implement pcode execution, as is the case for the java virtual machine), and they are also implemented at library level by many languages (examples: Haskell, Racket, Smalltalk).
These threads do not need to rely on any threading facilities by the kernel (but often do rely on asynchronous I/O). As such they generally cannot schedule on separate processors. In these contexts "native thread" or "OS thread" could be used to refer to the actual kernel scheduled threads in contrast to the green/user level threads.
Note that "cannot be scheduled on separate processors" is only true if they are used alone. In an hybrid system that has both user level/green threads and native/os threads, it may be possible to create exactly one native/os thread for each processor (and on some systems to set the affinity mask so that each only runs on a specific processor) and then effectively assign the userlevel threads to these.
-- Fibers and cooperative multitasking
I have seen some people call these threads. It's improper, the correct name is fibers. They are also a model of parallel execution, but contrary to threads (and processes) they are cooperative. Which means that whenever a fiber is running, the other fibers will not run until the running fiber voluntarily "yields" execution accepting to be suspended and eventually resumed later.
If I create a thread using forkIO I need to provide a function to run and get back an identifier (threadID). I then can communicate with this animal via e.g. the workloads, MVARs etc.. However, to my understanding the created thread is very limited and can only work in sort of a SIMD fashion where the function that was provided for thread creation is the instruction. I cannot change the function that I provided when the thread was initiated. I understand that these user threads are eventually by the OS mapped to OS threads.
I would like to know how the Haskell threads and the OS threads do interface. Why can Haskell threads that do completely different things be mapped to one and the same OS thread? Why was there no need to initiate the OS thread with a fixed instruction (as it is needed in forkIO)? How does the scheduler(?) recognize user threads in an application that could possibly be distributed? In other words, why are OS threads so flexible?
Last, is there any way to dump the heap of a selected thread from within the application?
First, let's address one quick misconception:
I understand that these user threads are eventually by the OS mapped to OS threads.
Actually, the Haskell runtime is in charge of choosing which Haskell thread a particular OS thread from its pool is executing.
Now the questions, one at a time.
Why can Haskell threads that do completely different things be mapped to one and the same OS thread?
Ignoring FFI for the moment, all OS threads are actually running the Haskell runtime, which keeps track of a list of ready Haskell threads. The runtime chooses a Haskell thread to execute, and jumps into the code, executing until the thread yields control back to the runtime. At that moment, the runtime has a chance to continue executing the same thread or pick a different one.
In short: many Haskell threads can be mapped to a single OS thread because in reality that OS thread is doing only one thing, namely, running the Haskell runtime.
Why was there no need to initiate the OS thread with a fixed instruction (as it is needed in forkIO)?
I don't understand this question (and I think it stems from a second misconception). You start OS threads with a fixed instruction in exactly the same sense that you start Haskell threads with a fixed instruction: for each thing, you just give a chunk of code to execute and that's what it does.
How does the scheduler(?) recognize user threads in an application that could possibly be distributed?
"Distributed" is a dangerous word: usually, it refers to spreading code across multiple machines (presumably not what you meant here). As for how the Haskell runtime can tell when there's multiple threads, well, that's easy: you tell it when you call forkIO.
In other words, why are OS threads so flexible?
It's not clear to me that OS threads are any more flexible than Haskell threads, so this question is a bit strange.
Last, is there any way to dump the heap of a selected thread from within the application?
I actually don't really know of any tools for dumping the Haskell heap at all, in multithreaded applications or otherwise. You can dump a representation of the part of the heap reachable from a particular object, if you like, using a package like vacuum. I've used vacuum-cairo to visualize these dumps with great success in the past.
For further information, you may enjoy the middle two sections, "Conventions" and "Foreign Imports", from my intro to multithreaded gtk2hs programming, and perhaps also bits of the section on "The Non-Threaded Runtime".
Instead of trying to directly answer your question, I will try to provide a conceptual model for how multi-threaded Haskell programs are implemented. I will ignore many details, and complexities.
Operating systems implement preemptive multithreading using hardware interrupts to allow multiple "threads" of computation to run logically on the same core at the same time.
The threads provided by operating systems tend to be heavy weight. They are well suited to certain types of "multi-threaded" applications, and, on systems like Linux, are fundamentally the same tool that allows multiple programs to run at the same time (a task they excel at).
But, these threads are bit heavy weight for many uses in high level languages such as Haskell. Essentially, the GHC runtime works as mini-OS, implementing its own "threads" on top of the OS threads, in the same way an OS implements threads on top of cores.
It is conceptually easy to imagine that a language like Haskell would be implemented in this way. Evaluating Haskell consists of "forcing thunks" where a thunk is a unit of computation that might 1. depend on another value (thunk) and/or 2. create new thunks.
Thus, one can imagine multiple threads each evaluating thunks at the same time. One would construct a queue of thunks to be evaluated. Each thread would pop the top of the queue, and evaluate that thunk until it was completed, then select a new thunk from the queue. The operation par and its ilk can "spark" new computation by adding a thunk to that queue.
Extending this model to IO actions is not particularly hard to imagine either. Instead of each simply forcing pure thunk, we imagine the unit of Haskell computation being somewhat more complicated. Psuedo Haskell for such a runtime:
type Spark = (ThreadId,Action)
data Action = Compute Thunk | Perform IOAction
note: this is for conceptual understanding only, don't think things are implemented this way
When we run a Spark, we look for exceptions "thrown" to that thread ID. Assuming we have none, execution consists of either forcing a thunk or performing an IO action.
Obviously, my explanation here has been very hand-wavy, and ignored some complexity. For more, the GHC team have written excellent articles such as "Runtime Support for Multicore Haskell" by Marlow et al. You might also want to look at text book on Operating Systems, as they often go in some depth on how to build a scheduler.