what is the difference between application threads and OS threads?

what is the difference between application threads and OS threads? - multithreading

from http://coding-geek.com/how-databases-work/#Global_overview
The process manager: Many databases have a pool of processes/threads that needs to be managed. Moreover, in order to gain nanoseconds, some modern databases use their own threads instead of the Operating System threads.
what is the difference between application threads and os threads?

"Application threads" is another form of M:N thread model, or hybrid threading.
M:N maps some M number of application threads onto some N number of
kernel entities,[10] or "virtual processors." This is a compromise
between kernel-level ("1:1") and user-level ("N:1") threading. In
general, "M:N" threading systems are more complex to implement than
either kernel or user threads, because changes to both kernel and
user-space code are required. In the M:N implementation, the threading
library is responsible for scheduling user threads on the available
schedulable entities; this makes context switching of threads very
fast, as it avoids system calls. However, this increases complexity
and the likelihood of priority inversion, as well as suboptimal
scheduling without extensive (and expensive) coordination between the
userland scheduler and the kernel scheduler.
There always seems to be attempts to use "application threads" to improve performance.
Java did it - "green threads". And dropped it around Java 1.2.
Solaris used to have an M:N threading model, and dropped it starting in Solaris 8.
AFAIK, Windows and Linux never went there.
Now, Rust seems to want to try their hand at it.

Related

User thread, Kernel thread, software thread and hardware thread

I'm studying thread and multithreading concepts and I ran into different kinds of thread:
User thread: supported above the kernel and are managed without the kernel.
Kernel thread: supported and managed directly by the operating system.
Software thread: threads of execution managed by the operating system.
Hardware thread: a feature of some processors that allow better utilization of the processor under some circumstances.
Can anyone clarify the difference between these types of threads (I'm confused)?
Thanks

Hardware thread is what allows you to actually run things in parallel (which is not the same as concurrently). These corresspond to number of your CPU cores (with nuances like hyperthreading, which can double the number of cores).
On top of that are OS (kernel) threads. Its an abstraction provided by your OS. The OS will map them to hardware threads. It does this via internal scheduler, and we have little to no control over that. Note that in theory there may be arbitrarily many OS threads (if there are not enough cores to handle them they simply wait for CPU), although the price for so called context switch limits it to few thousands, maybe more.
User threads (a.k.a. green threads, coroutines, etc. they have many names) is an abstraction provided by your software (e.g. programming language and its runtime). They run on top of OS threads, and are mapped to them via internal (but in user space) scheduler. They tend to perform better than OS threads (especially with i/o bound tasks) because they have lower context switch overhead, plus they can take advantage of async apis (e.g. nonblocking sockets) without spawning OS threads (which is costly as well). Since they are lightweight, you can spawn lots of them. Some people claim to run millions of such threads at a time. I've seen tens of thousands without issues.
I've never seen the term "software thread" though. But depending on context it means either user or kernel thread. Unlikely it means anything else.
Btw no real code can run without some OS support. It can be limited, if for example you don't want things to run in parallel. But as soon as you want true parallelism there is no escape from OS threads. The internal scheduler for user threads have to spawn OS threads and map user threads to them in some way. Although typically it is an invisible implemention detail.

"Hardware thread" is a bad name. It was chosen as a term of art by CPU designers, without much regard for what software developers think "thread" means.
When an operating system interrupts a running thread so that some other thread may be allowed to use the CPU, it must save enough of the state of the CPU so that the thread can be resumed again later on. Mostly that saved state consists of the program counter, the stack pointer, and other CPU registers that are part of the programmer's model of the CPU.
A so-called "hyperthreaded CPU" has two or more complete sets of those registers. That allows it to execute instructions on behalf of two or more program threads without any need for the operating system to intervene.
Experts in the field like nice, short names for things. Instead of talking about "complete sets of context registers," they just call them "hardware threads."

Which type of threads are better for computation-intensive parallel applications and I/O intensive parallel applications?

Multi threading at the kernel level means that multiples processes can be executed on different threads at the same time.
User level threads reside in the processes which do not share same address space.
So, which type of thread is better for computation-intensive parallel applications and I/O intensive parallel applications?

In reality there is no advantage whatsoever of user level threads over kernel level threads. (There are some useless textbooks that invent such advantages.)
User threads are poor-mans threading that was invented in the days before kernel threads existed. They persist only because certain operating systems have not implemented kernel threads.

Work stealing and Kernel-level thread

Work stealing is a common strategy for User-level Thread. Each process has a work queue for taking work, and will steal from others' queue when they are out of work to do.
Is there any kernel that implements such strategy for Kernel-level thread ? If not, what is the reason ?
I believe in Linux there is a notion of thread-migration in kernel-level thread, which migrates thread from high-load processor to low-load processor but that seems like a different algorithm. But correct me if I'm wrong.
Thanks

The work stealing scheduler is a parallel computation scheduler. It is usually on user level libraries (like Intel tbb: https://www.threadingbuildingblocks.org/) or even languages like Cilk (https://software.intel.com/en-us/intel-cilk-plus)
Kernel-level threads are scheduled by the operating system, and as so the scheduling techniques are quite different. For instance, in work-stealing scheduler, one of the objectives is to limit memory usage (as proven in the original paper: http://supertech.csail.mit.edu/papers/steal.pdf) and to achieve that the threads are stored in a deque. However, in operating system's schedulers the main objective is to be fair between the users, give each process/kernel thread a fair amount of time to run (as max-min fairness states: http://en.wikipedia.org/wiki/Max-min_fairness), etc. Operating System's schedulers even use different priorities among kernel threads/processes (please see http://en.wikipedia.org/wiki/Completely_Fair_Scheduler or http://en.wikipedia.org/wiki/Multilevel_feedback_queue). For that reason, work stealing implementations are made in user-level, since their objective is to schedule user-level threads inside a process and not kernel-threads.

Difference between user-level and kernel-supported threads?

I've been looking through a few notes based on this topic, and although I have an understanding of threads in general, I'm not really to sure about the differences between user-level and kernel-level threads.
I know that processes are basically made up of multiple threads or a single thread, but are these thread of the two prior mentioned types?
From what I understand, kernel-supported threads have access to the kernel for system calls and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer when then utilise kernel-supported threads to perform operations that couldn't be normally performed due to its state?

Edit: The question was a little confusing, so I'm answering it two different ways.
OS-level threads vs Green Threads
For clarity, I usually say "OS-level threads" or "native threads" instead of "Kernel-level threads" (which I confused with "kernel threads" in my original answer below.) OS-level threads are created and managed by the OS. Most languages have support for them. (C, recent Java, etc) They are extremely hard to use because you are 100% responsible for preventing problems. In some languages, even the native data structures (such as Hashes or Dictionaries) will break without extra locking code.
The opposite of an OS-thread is a green thread that is managed by your language. These threads are given various names depending on the language (coroutines in C, goroutines in Go, fibers in Ruby, etc). These threads only exist inside your language and not in your OS. Because the language chooses context switches (i.e. at the end of a statement), it prevents TONS of subtle race conditions (such as seeing a partially-copied structure, or needing to lock most data structures). The programmer sees "blocking" calls (i.e. data = file.read() ), but the language translates it into async calls to the OS. The language then allows other green threads to run while waiting for the result.
Green threads are much simpler for the programmer, but their performance varies: If you have a LOT of threads, green threads can be better for both CPU and RAM. On the other hand, most green thread languages can't take advantage of multiple cores. (You can't even buy a single-core computer or phone anymore!). And a bad library can halt the entire language by doing a blocking OS call.
The best of both worlds is to have one OS thread per CPU, and many green threads that are magically moved around onto OS threads. Languages like Go and Erlang can do this.
system calls and other uses not available to user-level threads
This is only half true. Yes, you can easily cause problems if you call the OS yourself (i.e. do something that's blocking.) But the language usually has replacements, so you don't even notice. These replacements do call the kernel, just slightly differently than you think.
Kernel threads vs User Threads
Edit: This is my original answer, but it is about User space threads vs Kernel-only threads, which (in hindsight) probably wasn't the question.
User threads and Kernel threads are exactly the same. (You can see by looking in /proc/ and see that the kernel threads are there too.)
A User thread is one that executes user-space code. But it can call into kernel space at any time. It's still considered a "User" thread, even though it's executing kernel code at elevated security levels.
A Kernel thread is one that only runs kernel code and isn't associated with a user-space process. These are like "UNIX daemons", except they are kernel-only daemons. So you could say that the kernel is a multi-threaded program. For example, there is a kernel thread for swap. This forces all swap issues to get "serialized" into a single stream.
If a user thread needs something, it will call into the kernel, which marks that thread as sleeping. Later, the swap thread finds the data, so it marks the user thread as runnable. Later still, the "user thread" returns from the kernel back to userland as if nothing happened.
In fact, all threads start off in kernel space, because the clone() operation happens in kernel space. (And there's lots of kernel accounting to do before you can 'return' to a new process in user space.)

Before we go into comparison, let us first understand what a thread is. Threads are lightweight processes within the domain of independent processes. They are required because processes are heavy, consume a lot of resources and more importantly,
two separate processes cannot share a memory space.
Let's say you open a text editor. It's an independent process executing in the memory with a separate addressable location. You'll need many resources within this process, such as insert graphics, spell-checks etc. It's not feasible to create separate processes for each of these functionalities and maintain them independently in memory. To avoid this,
multiple threads can be created within a single process, which can
share a common memory space, existing independently within a process.
Now, coming back to your questions, one at a time.
I'm not really to sure about the differences between user-level and kernel-level threads.
Threads are broadly classified as user level threads and kernel level threads based on their domain of execution. There are also cases when one or many user thread maps to one or many kernel threads.
- User Level Threads
User level threads are mostly at the application level where an application creates these threads to sustain its execution in the main memory. Unless required, these thread work in isolation with kernel threads.
These are easier to create since they do not have to refer many registers and context switching is much faster than a kernel level thread.
User level thread, mostly can cause changes at the application level and the kernel level thread continues to execute at its own pace.
- Kernel Level Threads
These threads are mostly independent of the ongoing processes and are executed by the operating system.
These threads are required by the Operating System for tasks like memory management, process management etc.
Since these threads maintain, execute and report the processes required by the operating system; kernel level threads are more expensive to create and manage and context switching of these threads are slow.
Most of the kernel level threads can not be preempted by the user level threads.
MS DOS written for Intel 8088 didn't have dual mode of operation. Thus, a user level process had the ability to corrupt the entire operating system.
- User Level Threads mapped over Kernel Threads
This is perhaps the most interesting part. Many user level threads map over to kernel level thread, which in-turn communicate with the kernel.
Some of the prominent mappings are:
One to One
When one user level thread maps to only one kernel thread.
advantages: each user thread maps to one kernel thread. Even if one of the user thread issues a blocking system call, the other processes remain unaffected.
disadvantages: every user thread requires one kernel thread to interact and kernel threads are expensive to create and manage.
Many to One
When many user threads map to one kernel thread.
advantages: multiple kernel threads are not required since similar user threads can be mapped to one kernel thread.
disadvantage: even if one of the user thread issues a blocking system call, all the other user threads mapped to that kernel thread are blocked.
Also, a good level of concurrency cannot be achieved since the kernel will process only one kernel thread at a time.
Many to Many
When many user threads map to equal or lesser number of kernel threads. The programmer decides how many user threads will map to how many kernel threads. Some of the user threads might map to just one kernel thread.
advantages: a great level of concurrency is achieved. Programmer can decide some potentially dangerous threads which might issue a blocking system call and place them with the one-to-one mapping.
disadvantage: the number of kernel threads, if not decided cautiously can slow down the system.
The other part of your question:
kernel-supported threads have access to the kernel for system calls
and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer
when then utilise kernel-supported threads to perform operations that
couldn't be normally performed due to its state?
Partially correct. Almost all the kernel thread have access to system calls and other critical interrupts since kernel threads are responsible for executing the processes of the OS. User thread will not have access to some of these critical features. e.g. a text editor can never shoot a thread which has the ability to change the physical address of the process. But if needed, a user thread can map to kernel thread and issue some of the system calls which it couldn't do as an independent entity. The kernel thread would then map this system call to the kernel and would execute actions, if deemed fit.

Quote from here :
Kernel-Level Threads
To make concurrency cheaper, the execution aspect of process is separated out into threads. As such, the OS now manages threads and processes. All thread operations are implemented in the kernel and the OS schedules all threads in the system. OS managed threads are called kernel-level threads or light weight processes.
NT: Threads
Solaris: Lightweight processes(LWP).
In this method, the kernel knows about and manages the threads. No runtime system is needed in this case. Instead of thread table in each process, the kernel has a thread table that keeps track of all threads in the system. In addition, the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads.
Advantages:
Because kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having large number of threads than process having small number of threads.
Kernel-level threads are especially good for applications that frequently block.
Disadvantages:
The kernel-level threads are slow and inefficient. For instance, threads operations are hundreds of times slower than that of user-level threads.
Since kernel must manage and schedule threads as well as processes. It require a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity.
User-Level Threads
Kernel-Level threads make concurrency much cheaper than process because, much less state to allocate and initialize. However, for fine-grained concurrency, kernel-level threads still suffer from too much overhead. Thread operations still require system calls. Ideally, we require thread operations to be as fast as a procedure call. Kernel-Level threads have to be general to support the needs of all programmers, languages, runtimes, etc. For such fine grained concurrency we need still "cheaper" threads.
To make threads cheap and fast, they need to be implemented at user level. User-Level threads are managed entirely by the run-time system (user-level library).The kernel knows nothing about user-level threads and manages them as if they were single-threaded processes.User-Level threads are small and fast, each thread is represented by a PC,register,stack, and small thread control block. Creating a new thread, switiching between threads, and synchronizing threads are done via procedure call. i.e no kernel involvement. User-Level threads are hundred times faster than Kernel-Level threads.
Advantages:
The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads.
User-level threads does not require modification to operating systems.
Simple Representation: Each thread is represented simply by a PC, registers, stack and a small control block, all stored in the user process address space.
Simple Management: This simply means that creating a thread, switching between threads and synchronization between threads can all be done without intervention of the kernel.
Fast and Efficient: Thread switching is not much more expensive than a procedure call.
Disadvantages:
User-Level threads are not a perfect solution as with everything else, they are a trade off. Since, User-Level threads are invisible to the OS they are not well integrated with the OS. As a result, Os can make poor decisions like scheduling a process with idle threads, blocking a process whose thread initiated an I/O even though the process has other threads that can run and unscheduling a process with a thread holding a lock. Solving this requires communication between between kernel and user-level thread manager.
There is a lack of coordination between threads and operating system kernel. Therefore, process as whole gets one time slice irrespect of whether process has one thread or 1000 threads within. It is up to each thread to relinquish control to other threads.
User-level threads requires non-blocking systems call i.e., a multithreaded kernel. Otherwise, entire process will blocked in the kernel, even if there are runable threads left in the processes. For example, if one thread causes a page fault, the process blocks.

User Threads
The library provides support for thread creation, scheduling and management with no support from the kernel.
The kernel unaware of user-level threads creation and scheduling are done in user space without kernel intervention.
User-level threads are generally fast to create and manage they have drawbacks however.
If the kernel is single-threaded, then any user-level thread performing a blocking system call will cause the entire process to block, even if other threads are available to run within the application.
User-thread libraries include POSIX Pthreads, Mach C-threads,
and Solaris 2 UI-threads.
Kernel threads
The kernel performs thread creation, scheduling, and management in kernel space.
kernel threads are generally slower to create and manage than are user threads.
the kernel is managing the threads, if a thread performs a blocking system call.
A multiprocessor environment, the kernel can schedule threads on different processors.
5.including Windows NT, Windows 2000, Solaris 2, BeOS, and Tru64 UNIX (formerlyDigital UN1X)-support kernel threads.

Some development environments or languages will add there own threads like feature, that is written to take advantage of some knowledge of the environment, for example a GUI environment could implement some thread functionality which switch between user threads on each event loop.
A game library could have some thread like behaviour for characters. Sometimes the user thread like behaviour can be implemented in a different way, for example I work with cocoa a lot, and it has a timer mechanism which executes your code every x number of seconds, use fraction of a seconds and it like a thread. Ruby has a yield feature which is like cooperative threads. The advantage of user threads is they can switch at more predictable times. With kernel thread every time a thread starts up again, it needs to load any data it was working on, this can take time, with user threads you can switch when you have finished working on some data, so it doesn't need to be reloaded.
I haven't come across user threads that look the same as kernel threads, only thread like mechanisms like the timer, though I have read about them in older text books so I wonder if they were something that was more popular in the past but with the rise of true multithreaded OS's (modern Windows and Mac OS X) and more powerful hardware I wonder if they have gone out of favour.

Is there any comprehensive overview somewhere that discusses all the different types of threads?

Is there any comprehensive overview somewhere that discusses all the different types of threads and what their relationship is with the OS and the scheduler? I've heard so much contradicting information about whether you want certain types of threads, or whether thread pooling is a performance gain or a performance hit, or that threads are heavy weight so you should use these other kind of threads that don't map directly to real threads but then how is that different from thread pooling .... I'm paralyzed. How does anyone make sense of it? Assuming the use of a language that actually directly interacts with threads (I'm aware of concurrent languages, implicit parallelism, etc. as an alternative to needing to know this stuff but I'm curious about this at the moment)

Here is my brief summary, please comment and edit at will:
There are no hyperthreads, unless you're talking about Intel's hyperthreading in which case it's just virtual cores.
"Green" usually means "not OS-level" (scheduled/handled by a VM, which may or may not map those unto multiple OS-level threads or processes)
pthreads are an API (Posix Threads)
Kernel threads vs user threads is an implementation level (user threads are implemented in userland, so the kernel is not aware of them and neither is its scheduler), "threads" alone is generally an alias for "kernel threads"
Fibers are system-level coroutines. They're threads, except cooperatively multitasked rather than preemptively.

Well, like with most things, it's common to not just care unless threading is identified as a bottleneck. That is, just use the threading functionality that your platform provides in the usual manner and don't worry about the details, at least in the beginning.
But since you evidently want to know more: Usually, the operating system has a concept of a thread as a unit of execution, which is what the OS scheduler handles. Now, switching between OS-level threads requires a context switch, which can be expensive and can become a performance bottleneck. So instead of mapping programming-language threads directly to OS threads, some threading implementations do everything in user space, so that there is only one OS-level thread that is responsible for all the user-level threads in the application. This is more efficient both performance- and resource-wise, but it has the problem that if you actually have several physical processors, you cannot use more than one of them with user-level threads. So there's one more strategy of allocating threads: have multiple OS-level threads, the number of which relates to the number of physical processors you have, and have each of these be responsible for several user-level threads. These three strategies are often called 1:1 (user threads map 1-to-1 to OS threads), N:1 (all user threads map to 1 OS thread), and M:N (M user threads map to N OS threads).
Thread pooling is a slightly different thing. The idea behind thread pooling is to separate the execution resources from the actual execution, so that you have a number of threads (your resources) available in the thread pool, and when you need some task to be executed, you just pick one thread from the pool and hand the task over to it. So thread pooling is a way to design a multi-threaded application. Another way to design would be to identify the different tasks that will need to be performed (e.g., reading from a network, drawing the UI to the screen), and create a dedicated thread for these tasks. This is mostly orthogonal to whether the threads are user- or OS-level concepts.

Threads are the main building block of a Process in the Windows win32 architecture. You can ignore green threads, fibers, green fibers, pthreads (POSIX). Hyper threads don't exist. It is "hyper threading" which is a CPU architecture thing. You cannot code it. You can ignore it.
This leaves use with threads. Indeed. Only threads. A kernel thread is a thread of the kernel, which lives in the upper 2GB (sometimes upper 1GB) of the virtual memory addess space of a machine. You cannot touch it. So you can ignore it most of the time (unless you are writing kernel mode ring-0 code).
Only user threads are the ones you should be concerned about. They come in two flavors: main thread and auxiliary threads. Each process has at least one main thread, it is created for you when you create a process (CreateProcess API call). Auxiliary threads can do tasks that take long and otherwise interrupt the user experience. In C#/,NET you can use the BackgroundWorker class to easily create and manage threads.
Threads have several properties. This may have lead to all "kinds" of threads. But worker threads are probably the only ones you should be worried about when you start dealing with threads.

I learned a lot reading these slides.
I came across this after looking at Unicorn.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string