Python multithreading model - multithreading

I have been studying multithreading in python for a while, however I was confused on a few issues-
Firstly, are the threads created by the python threading library user level or kernel level threads?
Books say that user level threads must be mapped to kernel threads and
the operating system only creates and maintains kernel level threads.
Which thread model will be used in the python threading library? Further, who makes the choice between kernel and user level threads? Is it the operating system or can the programmer have a say?
If the many-to-one model (illustrated in the picture) is used, I think it is not real multithreading, since all the threads map to a single kernel thread.
Is there a way to direct the operating system to adhere to a certain threading model in my python program?
Can all running threads for a process be shown with their state separately marked as either kernel or user level. Also can the mappings between the two levels (user and kernel) be shown?

Usually, you never create 'kernel level threads' directly - everything you do in user space executes in user space, otherwise even a random browser JavaScript would be executing at the kernel level guaranteeing that within seconds the whole internet would go dark.
Thus, in most languages, a threading interface (if supported) is far removed from the actual 'kernel threads' and depending on implementation it will either link to a lower-level threading interface (pthreads for example) or just simulate threading unbeknownst to the user. Going down that chain, pthreads may or may not link to actual 'kernel' threads (it happens to be true on Linux, but on Windows there is another level of separation) but even then, the code executes in the user space - the 'supporting' kernel thread is there to control the scheduling the code runs separately.
When it comes to CPython, its threading interface links to pthreads so, technically, there is a chain from a Python thread all the way down to the kernel threads. However, Python also has the dreaded GIL pretty much guaranteeing that, with some rare exceptions mostly related to I/O, no two threads ever execute at the same time, pretty much making its threads operate in a cooperative multitasking mode. However, since on most systems processes are also backed by kernel threads, you can still utilize them in all their glory by using the multiprocessing interface.
Also, until you have multiple cores/CPUs on your system even kernel threads execute in a cooperative multitasking mode so, technically, kernel threads don't guarantee actual multi-threading as you're describing it.
As for how to list threads and their dependencies, you can use top -H -p <pid> to show the thread tree of a process.

Related

User level threads vs Kernel level threads

I'm aware that User Level threads are created on the User Mode( no privileges) and Kernel threads are created in the Kernel Mode( privileged).
I am also aware that Processor threads are hardware threads that operate on Kernel Threads( I hope I am correct by putting it in this way)
Here is my confusion:-
User Level threads are not recognized by the OS as they are created, maintained and destroyed on the User Level. The OS doesn't see a multithreaded process from the User Mode as being multithreaded. It treats it as a single threaded process. Therefore, this program cannot take advantage of Multiprocessing, I guess it cannot take advantage of hyperthreading as well since it appears as single threaded in the OS.
So what's the use of Multithreading in this case? I mean the computation time will still be the same🤷‍♂️.
The last question is, do POSIX thread API and OPenMP create user level threads or Kernel threads?
I know what both libraries are, please don't explain about that.
If none creates Kernel threads then how do we create a multithreaded program that takes advantage of multiprocessing?
...what's the use of Multithreading in this case?
Multithreading is older than multiprocessing. Multithreading is one model of concurrent computing. That is to say, it's a way to write a computer program in which different activities are allowed to happen independently from each other. A classic example is a multi-user network server that creates a new thread for each connected client. Each thread can talk to its own client in a simple, synchronous way even though there may be no synchrony between what the different clients want to do. You don't need to have multiple CPUs for that.
When multi-CPU computers were invented, using multiple threads to exploit them for parallel processing was a natural and obvious choice.
I mean the computation time [for a green-threaded program that cannot exploit multiple CPUs] will still be the same.
That is true, but depending on what the different activities are that the program performs concurrently, the multi-threaded version of it may be easier to read and understand* than a program that's built around a different model of concurrency.
The reason is, we all were taught to write single-threaded, synchronous code when we were beginners. We understood that we were writing instructions that "the computer" would follow. We now say "a thread" instead of saying "the computer," but otherwise, the code executed by each thread can be mostly similar to the style of code that we wrote as beginners.
Part of what makes it so simple is, that the state of each of the concurrent activities can be mostly implicit in the contexts and the local variables (i.e., the stacks) of the different threads. If you choose a different model of concurrency (e.g., an event driven model) then you may have to explicitly represent more of that state with (maybe complex) data structures.
* Easier to read but not necessarily easier to write without making subtle mistakes. But, when I started working with large teams of software developers, they taught me that I'd be reading about ten lines of code for every one line that I wrote, so "easier to read but harder to write" turns out to be a win in the long run.
Pure user level threads are (as you pointed out) not a lot of use because they don't allow you to exploit the processing capability of multiple cores within a process.
The flip-side is that pure kernel level threads will typically incur substantial overheads when switch between threads. (There are things that you can do to deal with that, but ... that's another topic.) But the upshot is that the overheads make it inefficient to preform small tasks (units of work) using kernel level threads.
Another alternative to both is a hybrid of user level and kernel level threads. For example, suppose:
each process has one kernel level thread for each physical core,
each kernel level thread can switch between a bunch of user level threads and,
switching between a user level threads is handled by a scheduler in user space.
The Java Loom project is developing a new threading model (roughly) along those lines. Classic Java threads are still kernel level threads. New virtual threads are user level threads. A Java program gets to choose whether it uses classic or virtual threads ... or both.
There is a lot of material on Loom on the web; e.g.
https://blogs.oracle.com/javamagazine/post/java-loom-virtual-threads-platform-threads
https://www.infoq.com/news/2022/05/virtual-threads-for-jdk19/
https://wiki.openjdk.org/display/loom/Main
Loom is likely to be part of the next Java release: Java 19.
I'm pretty sure that (C / C++) POSIX threads are kernel level. I don't know about OpenMPI threads, but I'd expect they are kernel level too. (They wouldn't be fit for purpose as pure user level threads.)
I have heard of hybrid threading models for C / C++, though I don't know about actual implementations. Look for articles, etcetera that talk about Threads vs Fibres.

Do any operating systems utilize user threads only?

We're reading a basic/simple guide to Operating Systems in my CS class. The text gives multiple examples of OSs that use 1:1 threading, and some that formerly did hybrid/ M:N. But there are no examples of user threads/N:1.
This isn't a homework question, I'm just genuinely curious if this is or was a thing. Have any OSs utilized exclusively user threads? Or is there any software or programming language that does? It seems like with the right scheduling it could be very fast? Thank you!
Spent forever on Google and can't find any explicit answer to this!
Do any operating systems utilize user threads only?
No (and not in the way you're expecting, but by definition). Whatever a program feels like doing in user-space is none of the operating system's business and can not be considered something the OS itself does.
Essentially there's 3 cases:
the OS is a single-tasking OS (and user-space programs use libraries or whatever to provide threading if/when they want it). E.g. MS-DOS.
the OS is a multi-tasking OS, where the OS only knows about processes (and user-space programs use libraries or whatever to provide threading if/when they want it). E.g. early Unix.
the OS/kernel provides threads (leading to 1:1 or M:N).
It seems like with the right scheduling it could be very fast?
User-space threading isn't "very fast", it's significantly worse for most things. The reasons are:
it can't work when there's multiple CPUs (so the nice 8-core CPU you're currently using becomes 87.5% wasted). You need a "M:N threading" at a minimum to avoid this performance disaster.
it breaks thread priorities badly - e.g. CPU/s wasting time doing unimportant work while important work isn't being done, because one process doesn't know anything about threads that belong to any other process (or their priorities). The scheduler must be aware of all threads to avoid this performance disaster (and if one process knows about all threads belonging to all other processes it becomes a security disaster).
almost all thread switches are caused by devices (threads having to wait for disk, network, keyboard, "wall clock time", ... causing scheduler to have to find some other thread to run; and things a thread was waiting for occurring causing the thread to be able to run again and possibly preempt less important work that was running at the time); and all devices involve the kernel (even for micro-kernels where kernel is needed to pass messages, etc); so almost all thread switches involve the kernel. By doing threading in user-space you just end up with kernel wasting time notifying user-space (so user-space can do some scheduling) instead of kernel doing the scheduling itself (without wasting time on notifications).
User-space threading is better for rare situations where kernel doesn't have to be involved anyway, which is limited to:
thread creation and termination; but only if memory (for thread state, thread stack, thread local storage) is pre-allocated and recycled, and only if "thread recycling" isn't done (e.g. pre-create kernel threads and put them back in a "free thread pool" instead of telling kernel to terminate and create them again later).
locking (e.g. mutexes) where all threads using the lock belong to the same process; where 1 kernel thread (and no need for locks) is still better than "multiple user-space threads (sharing 1 kernel thread) fighting for the same lock with extra pointless overhead".

the most devastating argument against user-level thread

I am reading sections about user space thread from the book "Modern Operating System". It states that:
Another, and probably the most devastating argument against user-level threads, is that programmers generally want threads precisely in applications where the threads block often, as, for example, in a multithreaded Web server. These threads are constantly making system calls. Once a trap has occurred to the kernel to carry out the system call, it is hardly any more work for the kernel to switch threads if the old one has blocked, and having the kernel do this eliminates the need for constantly making select system calls that check to see if read system calls are safe. For applications that are essentially entirely CPU bound and rarely block, what is the point of having threads at all? No one would seriously propose computing the first n prime numbers or playing chess using threads because there is nothing to be gained by doing it that way.
I am particularly confused about the bold text.
1.Since these are user space threads, how can the kernel do a "switch threads"?
2. "having the kernel do this" , what does "this" here mean?
I thought behaviors are like:
1. "select" call is made, and find following system call is a blocking one.
2. Then the user space thread scheduler makes a thread switching and execute anohter thread.
For some reason, colleges insist on using operating systems textbooks that are confusing and at times nonsensical.
First, what is being described here is ENTIRELY system specific. On SOME operating systems, a synchronous system call will block all threads. This is not true in ALL operating systems.
Second, user threads are the poor man's way of doing them. In ye olde days user threads came into being because there were no operating system support. There are some that promote user threads as being more "efficient" than kernel threads (in theory a library can switch threads faster than the kernel) but this is total BS in practice. User threads are completely obsolete and systems that force developers to use them for threading are OBSOLETE. Even systems older systems like VMS have kernel threads.
In a modern OS course, "user threads" should be a sidebar or historical footnote.
In essence, your book is trying to make a debate where none exists. It's like post WWII U.S. Army assessments comparing the Sherman Tank to the Panther. They talk about things like the Sherman having move comfortable seats to try to make the two sound comparable when, in reality, the Sherman was obsolete and not even in the same class at the Panther.
1.Since these are user space threads, how can the kernel do a "switch threads"? 2. "having the kernel do this" , what does "this" here mean?
What they appear to be suggesting is that the thread will block the process when it makes a system call. When the occurs, the operating system will make a context switch. In this case the operating system is making a "thread switch" to another process anyway. The [correct] conclusion they are trying to lead you to then is that this switch take away the user threads have in alleged reduced overhead.
I thought behaviors are like: 1. "select" call is made, and find following system call is a blocking one. 2. Then the user space thread scheduler makes a thread switching and execute anohter thread.
Let me take the case of a user thread implementation that is not totally blocked by blocking system calls.
The library sets a timer for thread switching.
The thread start or resumes executing.
The thread makes a blocking system service (e.g, select).
The operating system switches the process out as part of the system service processing.
The timer goes off.
The process becomes current again and the OS invokes the timer handler in the library.
The library schedules another thread to execute.
The problem you face is that a blocking system service is usually going to have as part of its processing code to trigger a context switch. Because the system does know no about threads (otherwise it would be using kernel threads), a thread calling such a blocking service is going to pass through the code.
Even though the process may have threads that are executable, the operating system has no way to cause them to be executed because it has know knowledge of them because they are managed by a library in the process.

Benefits of user-level threads

I was looking at the differences between user-level threads and kernel-level threads, which I basically understood.
What's not clear to me is the point of implementing user-level threads at all.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
An answer to a previously asked question (similar to mine) was something like:
No modern operating system actually maps n user-level threads to 1
kernel-level thread.
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
Could you help me understand this, please?
I strongly recommend Modern Operating Systems 4th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds). Costs a whole lot of bucks but it's definitely worth it if you really want to know stuff. For eager students and desperate enthusiasts it's great.
Your questions answered
[...] what's not clear to me is the point of implementing User-level threads
at all.
Read my post. It is comprehensive, I daresay.
If the kernel is unaware of the existence of multiple threads within a
single process, then which benefits could I experience?
Read the section "Disadvantages" below.
I have read a couple of articles that stated that user-level
implementation of threads is advisable only if such threads do not
perform blocking operations (which would cause the entire process to
block).
Read the subsection "No coordination with system calls" in "Disadvantages."
All citations are from the book I recommended in the top of this answer, Chapter 2.2.4, "Implementing Threads in User Space."
Advantages
Enables threads on systems without threads
The first advantage is that user-level threads are a way to work with threads on a system without threads.
The first, and most obvious, advantage is that
a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to
fall into this category, and even now some still do.
No kernel interaction required
A further benefit is the light overhead when switching threads, as opposed to switching to the kernel mode, doing stuff, switching back, etc. The lighter thread switching is described like this in the book:
When a thread does something that may cause it to become blocked
locally, for example, waiting for another thread in its process to
complete some work, it calls a run-time system procedure. This
procedure checks to see if the thread must be put into blocked state.
If, so it stores the thread’s registers (i.e., its own) [...] and
reloads the machine registers with the new thread’s saved values. As soon as the stack
pointer and program counter have been switched, the new thread comes
to life again automatically. If the machine happens to have an
instruction to store all the registers and another one to load them
all, the entire thread switch can be done in just a handful of in-
structions. Doing thread switching like this is at least an order of
magnitude—maybe more—faster than trapping to the kernel and is a
strong argument in favor of user-level threads packages.
This efficiency is also nice because it spares us from incredibly heavy context switches and all that stuff.
Individually adjusted scheduling algorithms
Also, hence there is no central scheduling algorithm, every process can have its own scheduling algorithm and is way more flexible in its variety of choices. In addition, the "private" scheduling algorithm is way more flexible concerning the information it gets from the threads. The number of information can be adjusted manually and per-process, so it's very finely-grained. This is because, again, there is no central scheduling algorithm needing to fit the needs of every process; it has to be very general and all and must deliver adequate performance in every case. User-level threads allow an extremely specialized scheduling algorithm.
This is only restricted by the disadvantage "No automatic switching to the scheduler."
They [user-level threads] allow each process to have its own
customized scheduling algorithm. For some applications, for example,
those with a garbage-collector thread, not having to worry about a
thread being stopped at an inconvenient moment is a plus. They also
scale better, since kernel threads invariably require some table space
and stack space in the kernel, which can be a problem if there are a
very large number of threads.
Disadvantages
No coordination with system calls
The user-level scheduling algorithm has no idea if some thread has called a blocking read system call. OTOH, a kernel-level scheduling algorithm would've known because it can be notified by the system call; both belong to the kernel code base.
Suppose that a thread reads from the keyboard before any keys have
been hit. Letting the thread actually make the system call is
unacceptable, since this will stop all the threads. One of the main
goals of having threads in the first place was to allow each one to
use blocking calls, but to prevent one blocked thread from affecting
the others. With blocking system calls, it is hard to see how this
goal can be achieved readily.
He goes on that system calls could be made non-blocking but that would be very inconvenient and compatibility to existing OSes would be drastically hurt.
Mr Tanenbaum also says that the library wrappers around the system calls (as found in glibc, for example) could be modified to predict when a system cal blocks using select but he utters that this is inelegant.
Building upon that, he says that threads do block often. Often blocking requires many system calls. And many system calls are bad. And without blocking, threads become less useful:
For applications that are essentially entirely CPU bound and rarely
block, what is the point of having threads at all? No one would
seriously propose computing the first n prime numbers or playing chess
using threads because there is nothing to be gained by doing it that
way.
Page faults block per-process if unaware of threads
The OS has no notion of threads. Therefore, if a page fault occurs, the whole process will be blocked, effectively blocking all user-level threads.
Somewhat analogous to the problem of blocking system calls is the
problem of page faults. [...] If the program calls or jumps to an
instruction that is not in memory, a page fault occurs and the
operating system will go and get the missing instruction (and its
neighbors) from disk. [...] The process is blocked while the necessary
instruction is being located and read in. If a thread causes a page
fault, the kernel, unaware of even the existence of threads, naturally
blocks the entire process until the disk I/O is complete, even though
other threads might be runnable.
I think this can be generalized to all interrupts.
No automatic switching to the scheduler
Since there is no per-process clock interrupt, a thread acquires the CPU forever unless some OS-dependent mechanism (such as a context switch) occurs or it voluntarily releases the CPU.
This prevents usual scheduling algorithms from working, including the Round-Robin algorithm.
[...] if a thread starts running, no other thread in that process
will ever run unless the first thread voluntarily gives up the CPU.
Within a single process, there are no clock interrupts, making it
impossible to schedule processes round-robin fashion (taking turns).
Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
He says that a possible solution would be
[...] to have the run-time system request a clock signal (interrupt) once a
second to give it control, but this, too, is crude and messy to
program.
I would even go on further and say that such a "request" would require some system call to happen, whose drawback is already explained in "No coordination with system calls." If no system call then the program would need free access to the timer, which is a security hole and unacceptable in modern OSes.
What's not clear to me is the point of implementing user-level threads at all.
User-level threads largely came into the mainstream due to Ada and its requirement for threads (tasks in Ada terminology). At the time, there were few multiprocessor systems and most multiprocessors were of the master/slave variety. Kernel threads simply did not exist. User threads had to be created to implement languages like Ada.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
If you have kernel threads, threads multiple threads within a single process can run simultaneously. In user threads, the threads always execute interleaved.
Using threads can simplify some types of programming.
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
That is true on Unix and maybe not all unix implementations. User threads on many operating systems function perfectly fine with blocking I/O.
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
In user threads. there is never parallel execution. In kernel threads, the can be parallel execution IF there are multiple processors. On a single processor system, there is not much advantage to using kernel threads over single threads (contra: note the blocking I/O issue on Unix and user threads).
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
In user threads, the process manages its own "threads" by interleaving execution within itself. The process can only have a thread run in the processor that the process is running in.
If the operating system provides system services to schedule code to run on a different processor, user threads could run on multiple processors.
I conclude by saying that for practicable purposes there are no advantages to user threads over kernel threads. There are those that will assert that there are performance advantages, but for there to be such an advantage it would be system dependent.

kernel thread native thread os thread

can any one please tell me. Are all term "kernel thread", "native thread" and "Os thread" represent kernel thread? Or they are different? If they are different what is relationship among all?
There's no real standard for that. Terminology varies depending on context. However I'll try to explain the different kind of threads that I know of (and add fibers just for completeness as I've seen people call them threads).
-- Threading within the kernel
These are most likely what your kernel thread term refers to. They only exist at the kernel level. They allow (a somewhat limited) parallel execution of the kernel code itself.
-- Application threading
These are what the term thread generally means. They are separate threads of parallel execution which may be scheduled on different processors, that share the same address space and are handled as a single process by the operating system.
The POSIX standard defines the properties threads should have in POSIX compliant systems (in fact the libraries and how each library entry is supposed to behave). Windows threading model is extremely similar to the POSIX one and, AFAIK, it's safe to talk of threading in general the way I did: parallel execution that happens within the same process and can be scheduled on different processors.
-- Ancient linux threading
In the early days the linux kernel did not support threading. However it did support creating two different processes that shared the same address space. There was a project (LinuxThreads) that tried to use this to implement some sort of threading abilities.
The problem was, of course, that the kernel would still treat them as separate processes. The result was therefore not POSIX compliant. For example the treatment of signals was problematic (as signals are a process level concept). It was IN THIS VERY SPECIFIC CONTEXT that the term "native" started to become common. It refers to "native" as in "kernel level" support for threading.
With help from the kernel actual support for POSIX compliant threading was finally implemented. Today that's the only kind of threading that really deserves the name. The old way is, in fact, not real threading at all. It's a sharing of the address space by multiple processes, and as such should be referred to. But there was a time when that was referred to as threading (as it was the only thing you could do with Linux).
-- User level and Green threading
This is another context where "native" is often used to contrast to another threading model. Green threads and userl level threads are threads that do happen within the same process, but they are totally handled at userlevel. Green threads are used in virtual machines (especially those that implement pcode execution, as is the case for the java virtual machine), and they are also implemented at library level by many languages (examples: Haskell, Racket, Smalltalk).
These threads do not need to rely on any threading facilities by the kernel (but often do rely on asynchronous I/O). As such they generally cannot schedule on separate processors. In these contexts "native thread" or "OS thread" could be used to refer to the actual kernel scheduled threads in contrast to the green/user level threads.
Note that "cannot be scheduled on separate processors" is only true if they are used alone. In an hybrid system that has both user level/green threads and native/os threads, it may be possible to create exactly one native/os thread for each processor (and on some systems to set the affinity mask so that each only runs on a specific processor) and then effectively assign the userlevel threads to these.
-- Fibers and cooperative multitasking
I have seen some people call these threads. It's improper, the correct name is fibers. They are also a model of parallel execution, but contrary to threads (and processes) they are cooperative. Which means that whenever a fiber is running, the other fibers will not run until the running fiber voluntarily "yields" execution accepting to be suspended and eventually resumed later.

Resources