What is the difference between a thread and a fiber?

What is the difference between a thread and a fiber? - multithreading

What is the difference between a thread and a fiber? I've heard of fibers from ruby and I've read heard they're available in other languages, could somebody explain to me in simple terms what is the difference between a thread and a fiber.

In the most simple terms, threads are generally considered to be preemptive (although this may not always be true, depending on the operating system) while fibers are considered to be light-weight, cooperative threads. Both are separate execution paths for your application.
With threads: the current execution path may be interrupted or preempted at any time (note: this statement is a generalization and may not always hold true depending on OS/threading package/etc.). This means that for threads, data integrity is a big issue because one thread may be stopped in the middle of updating a chunk of data, leaving the integrity of the data in a bad or incomplete state. This also means that the operating system can take advantage of multiple CPUs and CPU cores by running more than one thread at the same time and leaving it up to the developer to guard data access.
With fibers: the current execution path is only interrupted when the fiber yields execution (same note as above). This means that fibers always start and stop in well-defined places, so data integrity is much less of an issue. Also, because fibers are often managed in the user space, expensive context switches and CPU state changes need not be made, making changing from one fiber to the next extremely efficient. On the other hand, since no two fibers can run at exactly the same time, just using fibers alone will not take advantage of multiple CPUs or multiple CPU cores.

Threads use pre-emptive scheduling, whereas fibers use cooperative scheduling.
With a thread, the control flow could get interrupted at any time, and another thread can take over. With multiple processors, you can have multiple threads all running at the same time (simultaneous multithreading, or SMT). As a result, you have to be very careful about concurrent data access, and protect your data with mutexes, semaphores, condition variables, and so on. It is often very tricky to get right.
With a fiber, control only switches when you tell it to, typically with a function call named something like yield(). This makes concurrent data access easier, since you don't have to worry about atomicity of data structures or mutexes. As long as you don't yield, there's no danger of being preempted and having another fiber trying to read or modify the data you're working with. As a result, though, if your fiber gets into an infinite loop, no other fiber can run, since you're not yielding.
You can also mix threads and fibers, which gives rise to the problems faced by both. Not recommended, but it can sometimes be the right thing to do if done carefully.

In Win32, a fiber is a sort of user-managed thread. A fiber has its own stack and its own instruction pointer etc., but fibers are not scheduled by the OS: you have to call SwitchToFiber explicitly. Threads, by contrast, are pre-emptively scheduled by the operation system. So roughly speaking a fiber is a thread that is managed at the application/runtime level rather than being a true OS thread.
The consequences are that fibers are cheaper and that the application has more control over scheduling. This can be important if the app creates a lot of concurrent tasks, and/or wants to closely optimise when they run. For example, a database server might choose to use fibers rather than threads.
(There may be other usages for the same term; as noted, this is the Win32 definition.)

First I would recommend reading this explanation of the difference between processes and threads as background material.
Once you've read that it's pretty straight forward. Threads cans be implemented either in the kernel, in user space, or the two can be mixed. Fibers are basically threads implemented in user space.
What is typically called a thread is a thread of execution implemented in the kernel: what's known as a kernel thread. The scheduling of a kernel thread is handled exclusively by the kernel, although a kernel thread can voluntarily release the CPU by sleeping if it wants. A kernel thread has the advantage that it can use blocking I/O and let the kernel worry about scheduling. It's main disadvantage is that thread switching is relatively slow since it requires trapping into the kernel.
Fibers are user space threads whose scheduling is handled in user space by one or more kernel threads under a single process. This makes fiber switching very fast. If you group all the fibers accessing a particular set of shared data under the context of a single kernel thread and have their scheduling handled by a single kernel thread, then you can eliminate synchronization issues since the fibers will effectively run in serial and you have complete control over their scheduling. Grouping related fibers under a single kernel thread is important, since the kernel thread they are running in can be pre-empted by the kernel. This point is not made clear in many of the other answers. Also, if you use blocking I/O in a fiber, the entire kernel thread it is a part of blocks including all the fibers that are part of that kernel thread.
In section 11.4 "Processes and Threads in Windows Vista" in Modern Operating Systems, Tanenbaum comments:
Although fibers are cooperatively scheduled, if there are multiple
threads scheduling the fibers, a lot of careful synchronization is
required to make sure fibers do not interfere with each other. To
simplify the interaction between threads and fibers, it is often
useful to create only as many threads as there are processors to run
them, and affinitize the threads to each run only on a distinct set of
available processors, or even just one processor. Each thread can
then run a particular subset of the fibers, establishing a one
to-many relationship between threads and fibers which simplifies
synchronization. Even so there are still many difficulties with
fibers. Most Win32 libraries are completely unaware of fibers, and
applications that attempt to use fibers as if they were threads will
encounter various failures. The kernel has no knowledge of fibers,
and when a fiber enters the kernel, the thread it is executing on may
block and the kernel will schedule an arbitrary thread on the
processor, making it unavailable to run other fibers. For these
reasons fibers are rarely used except when porting code from other
systems that explicitly need the functionality provided by fibers.

Note that in addition to Threads and Fibers, Windows 7 introduces User-Mode Scheduling:
User-mode scheduling (UMS) is a
light-weight mechanism that
applications can use to schedule their
own threads. An application can switch
between UMS threads in user mode
without involving the system scheduler
and regain control of the processor if
a UMS thread blocks in the kernel. UMS
threads differ from fibers in that
each UMS thread has its own thread
context instead of sharing the thread
context of a single thread. The
ability to switch between threads in
user mode makes UMS more efficient
than thread pools for managing large
numbers of short-duration work items
that require few system calls.
More information about threads, fibers and UMS is available by watching Dave Probert: Inside Windows 7 - User Mode Scheduler (UMS).

Threads were originally created as lightweight processes. In a similar fashion, fibers are a lightweight thread, relying (simplistically) on the fibers themselves to schedule each other, by yielding control.
I guess the next step will be strands where you have to send them a signal every time you want them to execute an instruction (not unlike my 5yo son :-). In the old days (and even now on some embedded platforms), all threads were fibers, there was no pre-emption and you had to write your threads to behave nicely.

Threads are scheduled by the OS (pre-emptive). A thread may be stopped or resumed at any time by the OS, but fibers more or less manage themselves (co-operative) and yield to each other. That is, the programmer controls when fibers do their processing and when that processing switches to another fiber.

Threads generally rely on the kernel to interrupt the thread so it or another thread can run (which is better known as Pre-emptive multitasking) whereas fibers use co-operative multitasking where it is the fiber itself that give up the its running time so that other fibres can run.
Some useful links explaining it better than I probably did are:
http://en.wikipedia.org/wiki/Fiber_(computer_science)
http://en.wikipedia.org/wiki/Computer_multitasking#Cooperative_multitasking.2Ftime-sharing
http://en.wikipedia.org/wiki/Pre-emptive_multitasking

Win32 fiber definition is in fact "Green Thread" definition established at Sun Microsystems. There is no need to waste the term fiber on the thread of some kind, i.e., a thread executing in user space under user code/thread-library control.
To clarify the argument look at the following comments:
With hyper-threading, a multi-core CPU can accept multiple threads and distribute them one on each core.
Superscalar pipelined CPU accepts one thread for execution and uses Instruction Level Parallelism (ILP) to run the thread faster. We may assume that one thread is broken into parallel fibers running in parallel pipelines.
SMT CPU can accept multiple threads and break them into instruction fibers for parallel execution on multiple pipelines, using pipelines more efficiently.
We should assume that processes are made of threads and that threads should be made of fibers. With that logic in mind, using fibers for other sorts of threads is wrong.

Related

Threads giving up CPU control - Seeming contradiction in textbook

I'm learning about threads and processes in an Operating Systems course, and I've come across an apparent contradiction in my textbook (Modern Operating Systems, 4th Ed. by Tanenbaum and Bos). I'm sure there's a something I'm misinterpreting here, it'd be great if someone could clear things up.
On page 106:
Another common thread call is thread_yield, which allows a thread to voluntarily give up the CPU to let another thread run. Such a call is important because there is no clock interrupt to actually enforce multiprogramming as there is with processes
Ok fine - so how I interpret that is that threads will never give up control unless they willingly cede it. Makes sense.
Then on page 116, in an example of threads mishandling shared information:
As an example, consider the errno variable maintained by UNIX. When a process (or a thread) makes a system call that fails, the error code is put into errno. In Fig. 2-19, thread 1 executes the system call access to find out if it has permission to access a certain file. The operating system returns the answer in the global variable errno. After control has returned to thread 1, but before it has a chance to read errno, the scheduler decides that thread 1 has had enough CPU time for the moment and decides to switch to thread 2.
But didn't thread 1 just get pulled from the CPU involuntarily? I thought there was no way to enforce thread switching as there is with process switching?

This makes sense if we're going about process-level threads instead of OS-level threads. The CPU can interrupt a process (regardless of what thread is running), but because the OS is not aware of process-level threads, it cannot interrupt them. If one thread inside the process wants to allow another thread to run, it has to specifically yield to the other thread.
However, most languages these days use OS-level threads, which the OS does know about and can pre-empt.

The confusion is that there are two different ways threads are implemented. In ye olde days there was no thread support at all. The DoD's mandate of the Ada programming language (in which tasks—aka threads—were is an integral part) forced the adoption of threads.
Run time libraries were created (largely to support Ada). That worked within a process. The process maintained a timer that would interrupt a threads and the library would switch among threads much like the operating system switches processes.
Note that this system only allows one thread of a process at a time to execute, even on a multiprocessor system.
Your first example is describing such a library but it is describing a very primitive thread library where thread scheduling is based upon cooperation among the various threads of the process.
Later, operating system started to develop support for threads. Rather than scheduling a process, the operating system schedules threads for execution. A process is then an address space with a collection of threads. Your second example is talking about this kind of thread.

Benefits of user-level threads

I was looking at the differences between user-level threads and kernel-level threads, which I basically understood.
What's not clear to me is the point of implementing user-level threads at all.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
An answer to a previously asked question (similar to mine) was something like:
No modern operating system actually maps n user-level threads to 1
kernel-level thread.
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
Could you help me understand this, please?

I strongly recommend Modern Operating Systems 4th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds). Costs a whole lot of bucks but it's definitely worth it if you really want to know stuff. For eager students and desperate enthusiasts it's great.
Your questions answered
[...] what's not clear to me is the point of implementing User-level threads
at all.
Read my post. It is comprehensive, I daresay.
If the kernel is unaware of the existence of multiple threads within a
single process, then which benefits could I experience?
Read the section "Disadvantages" below.
I have read a couple of articles that stated that user-level
implementation of threads is advisable only if such threads do not
perform blocking operations (which would cause the entire process to
block).
Read the subsection "No coordination with system calls" in "Disadvantages."
All citations are from the book I recommended in the top of this answer, Chapter 2.2.4, "Implementing Threads in User Space."
Advantages
Enables threads on systems without threads
The first advantage is that user-level threads are a way to work with threads on a system without threads.
The first, and most obvious, advantage is that
a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to
fall into this category, and even now some still do.
No kernel interaction required
A further benefit is the light overhead when switching threads, as opposed to switching to the kernel mode, doing stuff, switching back, etc. The lighter thread switching is described like this in the book:
When a thread does something that may cause it to become blocked
locally, for example, waiting for another thread in its process to
complete some work, it calls a run-time system procedure. This
procedure checks to see if the thread must be put into blocked state.
If, so it stores the thread’s registers (i.e., its own) [...] and
reloads the machine registers with the new thread’s saved values. As soon as the stack
pointer and program counter have been switched, the new thread comes
to life again automatically. If the machine happens to have an
instruction to store all the registers and another one to load them
all, the entire thread switch can be done in just a handful of in-
structions. Doing thread switching like this is at least an order of
magnitude—maybe more—faster than trapping to the kernel and is a
strong argument in favor of user-level threads packages.
This efficiency is also nice because it spares us from incredibly heavy context switches and all that stuff.
Individually adjusted scheduling algorithms
Also, hence there is no central scheduling algorithm, every process can have its own scheduling algorithm and is way more flexible in its variety of choices. In addition, the "private" scheduling algorithm is way more flexible concerning the information it gets from the threads. The number of information can be adjusted manually and per-process, so it's very finely-grained. This is because, again, there is no central scheduling algorithm needing to fit the needs of every process; it has to be very general and all and must deliver adequate performance in every case. User-level threads allow an extremely specialized scheduling algorithm.
This is only restricted by the disadvantage "No automatic switching to the scheduler."
They [user-level threads] allow each process to have its own
customized scheduling algorithm. For some applications, for example,
those with a garbage-collector thread, not having to worry about a
thread being stopped at an inconvenient moment is a plus. They also
scale better, since kernel threads invariably require some table space
and stack space in the kernel, which can be a problem if there are a
very large number of threads.
Disadvantages
No coordination with system calls
The user-level scheduling algorithm has no idea if some thread has called a blocking read system call. OTOH, a kernel-level scheduling algorithm would've known because it can be notified by the system call; both belong to the kernel code base.
Suppose that a thread reads from the keyboard before any keys have
been hit. Letting the thread actually make the system call is
unacceptable, since this will stop all the threads. One of the main
goals of having threads in the first place was to allow each one to
use blocking calls, but to prevent one blocked thread from affecting
the others. With blocking system calls, it is hard to see how this
goal can be achieved readily.
He goes on that system calls could be made non-blocking but that would be very inconvenient and compatibility to existing OSes would be drastically hurt.
Mr Tanenbaum also says that the library wrappers around the system calls (as found in glibc, for example) could be modified to predict when a system cal blocks using select but he utters that this is inelegant.
Building upon that, he says that threads do block often. Often blocking requires many system calls. And many system calls are bad. And without blocking, threads become less useful:
For applications that are essentially entirely CPU bound and rarely
block, what is the point of having threads at all? No one would
seriously propose computing the first n prime numbers or playing chess
using threads because there is nothing to be gained by doing it that
way.
Page faults block per-process if unaware of threads
The OS has no notion of threads. Therefore, if a page fault occurs, the whole process will be blocked, effectively blocking all user-level threads.
Somewhat analogous to the problem of blocking system calls is the
problem of page faults. [...] If the program calls or jumps to an
instruction that is not in memory, a page fault occurs and the
operating system will go and get the missing instruction (and its
neighbors) from disk. [...] The process is blocked while the necessary
instruction is being located and read in. If a thread causes a page
fault, the kernel, unaware of even the existence of threads, naturally
blocks the entire process until the disk I/O is complete, even though
other threads might be runnable.
I think this can be generalized to all interrupts.
No automatic switching to the scheduler
Since there is no per-process clock interrupt, a thread acquires the CPU forever unless some OS-dependent mechanism (such as a context switch) occurs or it voluntarily releases the CPU.
This prevents usual scheduling algorithms from working, including the Round-Robin algorithm.
[...] if a thread starts running, no other thread in that process
will ever run unless the first thread voluntarily gives up the CPU.
Within a single process, there are no clock interrupts, making it
impossible to schedule processes round-robin fashion (taking turns).
Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
He says that a possible solution would be
[...] to have the run-time system request a clock signal (interrupt) once a
second to give it control, but this, too, is crude and messy to
program.
I would even go on further and say that such a "request" would require some system call to happen, whose drawback is already explained in "No coordination with system calls." If no system call then the program would need free access to the timer, which is a security hole and unacceptable in modern OSes.

What's not clear to me is the point of implementing user-level threads at all.
User-level threads largely came into the mainstream due to Ada and its requirement for threads (tasks in Ada terminology). At the time, there were few multiprocessor systems and most multiprocessors were of the master/slave variety. Kernel threads simply did not exist. User threads had to be created to implement languages like Ada.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
If you have kernel threads, threads multiple threads within a single process can run simultaneously. In user threads, the threads always execute interleaved.
Using threads can simplify some types of programming.
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
That is true on Unix and maybe not all unix implementations. User threads on many operating systems function perfectly fine with blocking I/O.
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
In user threads. there is never parallel execution. In kernel threads, the can be parallel execution IF there are multiple processors. On a single processor system, there is not much advantage to using kernel threads over single threads (contra: note the blocking I/O issue on Unix and user threads).
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
In user threads, the process manages its own "threads" by interleaving execution within itself. The process can only have a thread run in the processor that the process is running in.
If the operating system provides system services to schedule code to run on a different processor, user threads could run on multiple processors.
I conclude by saying that for practicable purposes there are no advantages to user threads over kernel threads. There are those that will assert that there are performance advantages, but for there to be such an advantage it would be system dependent.

Difference between user-level and kernel-supported threads?

I've been looking through a few notes based on this topic, and although I have an understanding of threads in general, I'm not really to sure about the differences between user-level and kernel-level threads.
I know that processes are basically made up of multiple threads or a single thread, but are these thread of the two prior mentioned types?
From what I understand, kernel-supported threads have access to the kernel for system calls and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer when then utilise kernel-supported threads to perform operations that couldn't be normally performed due to its state?

Edit: The question was a little confusing, so I'm answering it two different ways.
OS-level threads vs Green Threads
For clarity, I usually say "OS-level threads" or "native threads" instead of "Kernel-level threads" (which I confused with "kernel threads" in my original answer below.) OS-level threads are created and managed by the OS. Most languages have support for them. (C, recent Java, etc) They are extremely hard to use because you are 100% responsible for preventing problems. In some languages, even the native data structures (such as Hashes or Dictionaries) will break without extra locking code.
The opposite of an OS-thread is a green thread that is managed by your language. These threads are given various names depending on the language (coroutines in C, goroutines in Go, fibers in Ruby, etc). These threads only exist inside your language and not in your OS. Because the language chooses context switches (i.e. at the end of a statement), it prevents TONS of subtle race conditions (such as seeing a partially-copied structure, or needing to lock most data structures). The programmer sees "blocking" calls (i.e. data = file.read() ), but the language translates it into async calls to the OS. The language then allows other green threads to run while waiting for the result.
Green threads are much simpler for the programmer, but their performance varies: If you have a LOT of threads, green threads can be better for both CPU and RAM. On the other hand, most green thread languages can't take advantage of multiple cores. (You can't even buy a single-core computer or phone anymore!). And a bad library can halt the entire language by doing a blocking OS call.
The best of both worlds is to have one OS thread per CPU, and many green threads that are magically moved around onto OS threads. Languages like Go and Erlang can do this.
system calls and other uses not available to user-level threads
This is only half true. Yes, you can easily cause problems if you call the OS yourself (i.e. do something that's blocking.) But the language usually has replacements, so you don't even notice. These replacements do call the kernel, just slightly differently than you think.
Kernel threads vs User Threads
Edit: This is my original answer, but it is about User space threads vs Kernel-only threads, which (in hindsight) probably wasn't the question.
User threads and Kernel threads are exactly the same. (You can see by looking in /proc/ and see that the kernel threads are there too.)
A User thread is one that executes user-space code. But it can call into kernel space at any time. It's still considered a "User" thread, even though it's executing kernel code at elevated security levels.
A Kernel thread is one that only runs kernel code and isn't associated with a user-space process. These are like "UNIX daemons", except they are kernel-only daemons. So you could say that the kernel is a multi-threaded program. For example, there is a kernel thread for swap. This forces all swap issues to get "serialized" into a single stream.
If a user thread needs something, it will call into the kernel, which marks that thread as sleeping. Later, the swap thread finds the data, so it marks the user thread as runnable. Later still, the "user thread" returns from the kernel back to userland as if nothing happened.
In fact, all threads start off in kernel space, because the clone() operation happens in kernel space. (And there's lots of kernel accounting to do before you can 'return' to a new process in user space.)

Before we go into comparison, let us first understand what a thread is. Threads are lightweight processes within the domain of independent processes. They are required because processes are heavy, consume a lot of resources and more importantly,
two separate processes cannot share a memory space.
Let's say you open a text editor. It's an independent process executing in the memory with a separate addressable location. You'll need many resources within this process, such as insert graphics, spell-checks etc. It's not feasible to create separate processes for each of these functionalities and maintain them independently in memory. To avoid this,
multiple threads can be created within a single process, which can
share a common memory space, existing independently within a process.
Now, coming back to your questions, one at a time.
I'm not really to sure about the differences between user-level and kernel-level threads.
Threads are broadly classified as user level threads and kernel level threads based on their domain of execution. There are also cases when one or many user thread maps to one or many kernel threads.
- User Level Threads
User level threads are mostly at the application level where an application creates these threads to sustain its execution in the main memory. Unless required, these thread work in isolation with kernel threads.
These are easier to create since they do not have to refer many registers and context switching is much faster than a kernel level thread.
User level thread, mostly can cause changes at the application level and the kernel level thread continues to execute at its own pace.
- Kernel Level Threads
These threads are mostly independent of the ongoing processes and are executed by the operating system.
These threads are required by the Operating System for tasks like memory management, process management etc.
Since these threads maintain, execute and report the processes required by the operating system; kernel level threads are more expensive to create and manage and context switching of these threads are slow.
Most of the kernel level threads can not be preempted by the user level threads.
MS DOS written for Intel 8088 didn't have dual mode of operation. Thus, a user level process had the ability to corrupt the entire operating system.
- User Level Threads mapped over Kernel Threads
This is perhaps the most interesting part. Many user level threads map over to kernel level thread, which in-turn communicate with the kernel.
Some of the prominent mappings are:
One to One
When one user level thread maps to only one kernel thread.
advantages: each user thread maps to one kernel thread. Even if one of the user thread issues a blocking system call, the other processes remain unaffected.
disadvantages: every user thread requires one kernel thread to interact and kernel threads are expensive to create and manage.
Many to One
When many user threads map to one kernel thread.
advantages: multiple kernel threads are not required since similar user threads can be mapped to one kernel thread.
disadvantage: even if one of the user thread issues a blocking system call, all the other user threads mapped to that kernel thread are blocked.
Also, a good level of concurrency cannot be achieved since the kernel will process only one kernel thread at a time.
Many to Many
When many user threads map to equal or lesser number of kernel threads. The programmer decides how many user threads will map to how many kernel threads. Some of the user threads might map to just one kernel thread.
advantages: a great level of concurrency is achieved. Programmer can decide some potentially dangerous threads which might issue a blocking system call and place them with the one-to-one mapping.
disadvantage: the number of kernel threads, if not decided cautiously can slow down the system.
The other part of your question:
kernel-supported threads have access to the kernel for system calls
and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer
when then utilise kernel-supported threads to perform operations that
couldn't be normally performed due to its state?
Partially correct. Almost all the kernel thread have access to system calls and other critical interrupts since kernel threads are responsible for executing the processes of the OS. User thread will not have access to some of these critical features. e.g. a text editor can never shoot a thread which has the ability to change the physical address of the process. But if needed, a user thread can map to kernel thread and issue some of the system calls which it couldn't do as an independent entity. The kernel thread would then map this system call to the kernel and would execute actions, if deemed fit.

Quote from here :
Kernel-Level Threads
To make concurrency cheaper, the execution aspect of process is separated out into threads. As such, the OS now manages threads and processes. All thread operations are implemented in the kernel and the OS schedules all threads in the system. OS managed threads are called kernel-level threads or light weight processes.
NT: Threads
Solaris: Lightweight processes(LWP).
In this method, the kernel knows about and manages the threads. No runtime system is needed in this case. Instead of thread table in each process, the kernel has a thread table that keeps track of all threads in the system. In addition, the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads.
Advantages:
Because kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having large number of threads than process having small number of threads.
Kernel-level threads are especially good for applications that frequently block.
Disadvantages:
The kernel-level threads are slow and inefficient. For instance, threads operations are hundreds of times slower than that of user-level threads.
Since kernel must manage and schedule threads as well as processes. It require a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity.
User-Level Threads
Kernel-Level threads make concurrency much cheaper than process because, much less state to allocate and initialize. However, for fine-grained concurrency, kernel-level threads still suffer from too much overhead. Thread operations still require system calls. Ideally, we require thread operations to be as fast as a procedure call. Kernel-Level threads have to be general to support the needs of all programmers, languages, runtimes, etc. For such fine grained concurrency we need still "cheaper" threads.
To make threads cheap and fast, they need to be implemented at user level. User-Level threads are managed entirely by the run-time system (user-level library).The kernel knows nothing about user-level threads and manages them as if they were single-threaded processes.User-Level threads are small and fast, each thread is represented by a PC,register,stack, and small thread control block. Creating a new thread, switiching between threads, and synchronizing threads are done via procedure call. i.e no kernel involvement. User-Level threads are hundred times faster than Kernel-Level threads.
Advantages:
The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads.
User-level threads does not require modification to operating systems.
Simple Representation: Each thread is represented simply by a PC, registers, stack and a small control block, all stored in the user process address space.
Simple Management: This simply means that creating a thread, switching between threads and synchronization between threads can all be done without intervention of the kernel.
Fast and Efficient: Thread switching is not much more expensive than a procedure call.
Disadvantages:
User-Level threads are not a perfect solution as with everything else, they are a trade off. Since, User-Level threads are invisible to the OS they are not well integrated with the OS. As a result, Os can make poor decisions like scheduling a process with idle threads, blocking a process whose thread initiated an I/O even though the process has other threads that can run and unscheduling a process with a thread holding a lock. Solving this requires communication between between kernel and user-level thread manager.
There is a lack of coordination between threads and operating system kernel. Therefore, process as whole gets one time slice irrespect of whether process has one thread or 1000 threads within. It is up to each thread to relinquish control to other threads.
User-level threads requires non-blocking systems call i.e., a multithreaded kernel. Otherwise, entire process will blocked in the kernel, even if there are runable threads left in the processes. For example, if one thread causes a page fault, the process blocks.

User Threads
The library provides support for thread creation, scheduling and management with no support from the kernel.
The kernel unaware of user-level threads creation and scheduling are done in user space without kernel intervention.
User-level threads are generally fast to create and manage they have drawbacks however.
If the kernel is single-threaded, then any user-level thread performing a blocking system call will cause the entire process to block, even if other threads are available to run within the application.
User-thread libraries include POSIX Pthreads, Mach C-threads,
and Solaris 2 UI-threads.
Kernel threads
The kernel performs thread creation, scheduling, and management in kernel space.
kernel threads are generally slower to create and manage than are user threads.
the kernel is managing the threads, if a thread performs a blocking system call.
A multiprocessor environment, the kernel can schedule threads on different processors.
5.including Windows NT, Windows 2000, Solaris 2, BeOS, and Tru64 UNIX (formerlyDigital UN1X)-support kernel threads.

Some development environments or languages will add there own threads like feature, that is written to take advantage of some knowledge of the environment, for example a GUI environment could implement some thread functionality which switch between user threads on each event loop.
A game library could have some thread like behaviour for characters. Sometimes the user thread like behaviour can be implemented in a different way, for example I work with cocoa a lot, and it has a timer mechanism which executes your code every x number of seconds, use fraction of a seconds and it like a thread. Ruby has a yield feature which is like cooperative threads. The advantage of user threads is they can switch at more predictable times. With kernel thread every time a thread starts up again, it needs to load any data it was working on, this can take time, with user threads you can switch when you have finished working on some data, so it doesn't need to be reloaded.
I haven't come across user threads that look the same as kernel threads, only thread like mechanisms like the timer, though I have read about them in older text books so I wonder if they were something that was more popular in the past but with the rise of true multithreaded OS's (modern Windows and Mac OS X) and more powerful hardware I wonder if they have gone out of favour.

How do user level threads (ULTs) and kernel level threads (KLTs) differ with regards to concurrent execution?

Here's what I understand; please correct/add to it:
In pure ULTs, the multithreaded process itself does the thread scheduling. So, the kernel essentially does not notice the difference and considers it a single-thread process. If one thread makes a blocking system call, the entire process is blocked. Even on a multicore processor, only one thread of the process would running at a time, unless the process is blocked. I'm not sure how ULTs are much help though.
In pure KLTs, even if a thread is blocked, the kernel schedules another (ready) thread of the same process. (In case of pure KLTs, I'm assuming the kernel creates all the threads of the process.)
Also, using a combination of ULTs and KLTs, how are ULTs mapped into KLTs?

Your analysis is correct. The OS kernel has no knowledge of user-level threads. From its perspective, a process is an opaque black box that occasionally makes system calls. Consequently, if that program has 100,000 user-level threads but only one kernel thread, then the process can only one run user-level thread at a time because there is only one kernel-level thread associated with it. On the other hand, if a process has multiple kernel-level threads, then it can execute multiple commands in parallel if there is a multicore machine.
A common compromise between these is to have a program request some fixed number of kernel-level threads, then have its own thread scheduler divvy up the user-level threads onto these kernel-level threads as appropriate. That way, multiple ULTs can execute in parallel, and the program can have fine-grained control over how threads execute.
As for how this mapping works - there are a bunch of different schemes. You could imagine that the user program uses any one of multiple different scheduling systems. In fact, if you do this substitution:
Kernel thread <---> Processor core
User thread <---> Kernel thread
Then any scheme the OS could use to map kernel threads onto cores could also be used to map user-level threads onto kernel-level threads.
Hope this helps!

Before anything else, templatetypedef's answer is beautiful; I simply wanted to extend his response a little.
There is one area which I felt the need for expanding a little: combinations of ULT's and KLT's. To understand the importance (what Wikipedia labels hybrid threading), consider the following examples:
Consider a multi-threaded program (multiple KLT's) where there are more KLT's than available logical cores. In order to efficiently use every core, as you mentioned, you want the scheduler to switch out KLT's that are blocking with ones that in a ready state and not blocking. This ensures the core is reducing its amount of idle time. Unfortunately, switching KLT's is expensive for the scheduler and it consumes a relatively large amount of CPU time.
This is one area where hybrid threading can be helpful. Consider a multi-threaded program with multiple KLT's and ULT's. Just as templatetypedef noted, only one ULT can be running at one time for each KLT. If a ULT is blocking, we still want to switch it out for one which is not blocking. Fortunately, ULT's are much more lightweight than KLT's, in the sense that there less resources assigned to a ULT and they require no interaction with the kernel scheduler. Essentially, it is almost always quicker to switch out ULT's than it is to switch out KLT's. As a result, we are able to significantly reduce a cores idle time relative to the first example.
Now, of course, all of this depends on the threading library being used for implementing ULT's. There are two ways (which I can come up with) for "mapping" ULT's to KLT's.
A collection of ULT's for all KLT's
This situation is ideal on a shared memory system. There is essentially a "pool" of ULT's to which each KLT has access. Ideally, the threading library scheduler would assign ULT's to each KLT upon request as opposed to the KLT's accessing the pool individually. The later could cause race conditions or deadlocks if not implemented with locks or something similar.
A collection of ULT's for each KLT (Qthreads)
This situation is ideal on a distributed memory system. Each KLT would have a collection of ULT's to run. The draw back is that the user (or the threading library) would have to divide the ULT's between the KLT's. This could result in load imbalance since it is not guaranteed that all ULT's will have the same amount of work to complete and complete roughly the same amount of time. The solution to this is allowing for ULT migration; that is, migrating ULT's between KLT's.

Preemptive threads Vs Non Preemptive threads

Can someone please explain the difference between preemptive Threading model and Non Preemptive threading model?
As per my understanding:
Non Preemptive threading model: Once a thread is started it cannot be stopped or the control cannot be transferred to other threads until the thread has completed its task.
Preemptive Threading Model: The runtime is allowed to step in and hand control from one thread to another at any time. Higher priority threads are given precedence over Lower priority threads.
Can someone please:
Explain if the understanding is correct.
Explain the advantages and disadvantages of both models.
An example of when to use what will be really helpful.
If i create a thread in Linux (system v or Pthread) without mentioning any options(are there any??) by default the threading model used is preemptive threading model?

No, your understanding isn't entirely correct. Non-preemptive (aka cooperative) threads typically manually yield control to let other threads run before they finish (though it is up to that thread to call yield() (or whatever) to make that happen.
Preempting threading is simpler. Cooperative threads have less overhead.
Normally use preemptive. If you find your design has a lot of thread-switching overhead, cooperative threads would be a possible optimization. In many (most?) situations, this will be a fairly large investment with minimal payoff though.
Yes, by default you'd get preemptive threading, though if you look around for the CThreads package, it supports cooperative threading. Few enough people (now) want cooperative threads that I'm not sure it's been updated within the last decade though...

Non-preemptive threads are also called cooperative threads. An example of these is POE (Perl). Another example is classic Mac OS (before OS X). Cooperative threads have exclusive use of the CPU until they give it up. The scheduler then picks another thread to run.
Preemptive threads can voluntarily give up the CPU just like cooperative ones, but when they don't, it will be taken from them, and the scheduler will start another thread. POSIX & SysV threads fall in this category.
Big advantages of cooperative threads are greater efficiency (on single-core machines, at least) and easier handling of concurrency: it only exists when you yield control, so locking isn't required.
Big advantages of preemptive threads are better fault tolerance: a single thread failing to yield doesn't stop all other threads from executing. Also normally works better on multi-core machines, since multiple threads execute at once. Finally, you don't have to worry about making sure you're constantly yielding. That can be really annoying inside, e.g., a heavy number crunching loop.
You can mix them, of course. A single preemptive thread can have many cooperative threads running inside it.

If you use non-preemptive it does not mean that process doesn't perform context switching when the process is waiting for I/O. The dispatcher will choose another process according to the scheduling model. We have to trust the process.
non-preemptive:
less context switching, less overhead that can be sensible in non-preemptive model
Easier to handle since it can be handled using a single-core processor
preemptive:
Advantage:
In this model, we have a priority that helps us to have more control over the running process
Better concurrency is a bonus
Handling system calls without blocking the entire system
Disadvantage:
Requires more complex algorithms for synchronization and critical section handling is inevitable.
The overhead that comes with it

In cooperative (non-preemptive) models, once a thread is given control it continues to run until it explicitly yields control or it blocks.
In a preemptive model, the virtual machine is allowed to step in and hand control from one thread to another at any time. Both models have their advantages and disadvantages.
Java threads are generally preemptive between priorities. A higher priority thread takes precedence over a lower priority thread. If a higher priority thread goes to sleep or blocks, then a lower priority thread can run (assuming one is available and ready to run).
However, as soon as the higher priority thread wakes up or unblocks, it will interrupt the lower priority thread and run until it finishes, blocks again, or is preempted by an even higher priority thread.
The Java Language Specification, occasionally allows the VMs to run lower priority threads instead of a runnable higher priority thread, but in practice this is unusual.
However, nothing in the Java Language Specification specifies what is supposed to happen with equal priority threads. On some systems these threads will be time-sliced and the runtime will allot a certain amount of time to a thread. When that time is up, the runtime preempts the running thread and switches to the next thread with the same priority.
On other systems, a running thread will not be preempted in favor of a thread with the same priority. It will continue to run until it blocks, explicitly yields control, or is preempted by a higher priority thread.
As for the advantages both derobert and pooria have highlighted them quite clearly.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string