Mapping User-level threads and Kernel-level threads - multithreading

How are User-level threads mapped to Kernel-level threads?

It varies by implementation. The three most common threading models are:
1-to-1: Each user-level thread has a corresponding entity that is scheduled by the kernel.
n-to-1: Each process is scheduled by the kernel. Thread scheduling takes place entirely in user space.
n-to-m: Each process has a pool of entities that are scheduled by the kernel. These are assigned to run particular user-level threads by a user-space scheduler that is part of the process.
Modern implementations are almost all 1-to-1.

There's a bit of confusion about the terminology used for referring to ULTs and KLTs.
Following are the two different interpretations. Please correct me if I got this wrong:
KLTs are needed to achieve concurrency in the kernel (Note the interpretation of Kernel as a Process or a live entity). This is true about Micro kernels like Symbian, where a kernel thread is responsible for every hardware resource of the system (e.g File Server, Location Server, Calendar Server, etc). However, in a kernel like Linux, which is mostly a library (and not a process or a living entity on its own), there's really no meaning for Kernel threads. In Linux, every thread you create is treated by the Kernel as a process and Kernel always runs either in the Process context or the Interrupt context.
Second interpretation is based on whether Threading (or concurrency) is visible to the Kernel or not. For instance, using setjmp, longjmp one can achieve concurrency at user space. Like already discussed, Kernel is totally unaware of this. This concurrency may be termed as ULT. And the thread whose creation the Kernel is aware of (one using Clone() system call) may be called KLT.

Related

Are user code in the end executed in kernel mode?

For what i learned from Operating System Concepts and online searching:
all user threads are finally mapped to kernel threads for being scheduled to physical CPUs
kernel threads can only be executed in kernel mode
above two arguments leads to the conclusion:
user code are all executed in kernel mode
is this right?
i have read the whole book and searched for many articles, the question still holds.
at Wikipedia, it says about LWP:
Kernel threads
Kernel threads are handled entirely by the kernel. They need not be associated with a process; a kernel can create them whenever it needs to perform a particular task. Kernel threads cannot execute in user mode. LWPs (in systems where they are a separate layer) bind to kernel threads and provide a user-level context. This includes a link to the shared resources of the process to which the LWP belongs. When a LWP is suspended, it needs to store its user-level registers until it resumes, and the underlying kernel thread must also store its own kernel-level registers.
also what does it means when saying about user-level registers and kernel level registers?
after digging and digging, i have following temp conclusion, but i am not sure. Hope the question further be answered and clearifed:
kernel thread, depending on discussion context, has two meanings:
when talking about user/kernel threading, kernel thread means a kernel task that totally execute in kernel mode and only execute kernel codes, like ksoftirqd for handling bottom half of interrupts
when taking about threading model, namely how user code is mapped into schedulable entities in kernel, kernel thread means a task that is schedulable by kernel
further about threading model and light weight processes in Linux:
in old times the operating system does not know thread, it only know processes(tasks) and threads are implmented by thread libraries totally in user side. There is a inherent problem for this that is if one user thread is blocked, such as I/O, all the user threads are blocked, because there is only one schedulable tasks in the kernel for this process. From the perspective of the kernel, the whole process is blocked. To solve this problem, light weight process(LWP), also called virtual processor(VP) is invented.
LWP is a intermedia data structure between user thread and a kernel thread(the second meaning above). LWP binds a user thread with a kernel thread(task), which in before is bounded with a user process. Simply put: in before a user process occupies a kernel thread(task), now with LWP a user thread can occupy a individual kernel thread(task), without sharing it with other user threads. (I think) This is why it is called light weight process. The advantage of this model is obvious, if one of the user thread is blocked, other user threads has ways to continue being executed by other kernel threads(tasks).
A kernel thread(task) acutually knows nothing about user process. It is just a task, a schedulable entity created, managed, destroyed totally by kernel itself. But a LWP belongs to a specific process and knows other LWPs that also belongs to the same one. LWP is like a bridge between user process and kernel thread(task).
When a kernel thread(task) that is bound to a LWP is scheduled by the kernel, the user level registers(pointed by LWP) is loaded into CPU, also the kernel thread(task) has registers and they are also loaded into CPU. From the standing point of CPU, a LWP is a kernel thread(task). It does not care it executes kernel code or user code.
user/kernel mode, user/kernel thread: they are independent. In Linux, a user thread created by pthread essentially is a kernel thread and this thread can execute in both user mode or kernel mode, depending on whether the thread is executing user code or kernel code.
All user threads are finally mapped to kernel threads.
That is not a useful way to think about threads. In most operating systems, a program can ask the OS to create a new thread and the program can provide a pointer to a function for the new thread to call. There's no "mapping" that happens there.* The new thread runs in exactly the same way as the program's original (a.k.a., "main") thread. It runs application code in user mode except, occasionally, when it makes a system call, and then for the duration of the system call it runs kernel code in kernel mode.
Many programming languages come with an OS-independent library that provides some kind of a Thread object. The thread object is not the same thing as the actual thread. It's more of a handle that the application uses to control the OS thread. If you like, you can say that those thread objects are "mapped" to OS threads, but that's still somewhat abusing the notion of what a "mapping" is.
kernel threads can only be executed in kernel mode
If you aren't writing OS code, it's best to avoid saying "kernel thread" altogether. In the Linux OS in particular, "kernel thread" means something, and it has nothing whatever to do with application code. Linux kernel threads are threads that are created by the OS for the OS, and they never run "user" (i.e., application) code.
It's possible for an application program to create and schedule its own threads, completely unknown to the OS. Some people call those "user threads." Some used to call them "green threads." Back in the old days, before any OS had thread support, we just called them "threads." Doing threads that way is a lot of work, for little reward. (Can't schedule them preemptively.) Outside of the realm of tiny, embedded, real-time systems, almost nobody bothers to do it anymore.
* But wait! Things will get more complicated in the near future when Java's Project Loom hits the main stream. Threads traditionally are expensive. In particular, each thread must have its own contiguous call stack—usually a chunk of at least a few megabytes—allocated to it. The goal of project loom is to make threads as cheap as any other object.
They way they intend to make threads "cheap" is to "virtualize" them, and to break up their call stacks into linked lists of reclaimable heap objects. Under project loom, a limited number of real OS threads that are scheduled by the OS scheduler will, in turn, schedule and execute the code of a multitude of "virtual" application threads, and so there really will be something going on that feels a bit like "mapping."
I won't be at all surprised if the same idea spreads to other languages.
There are two different meanings of kernel threads. When threading people talk about "kernel threads" they mean "threads the kernel knows about" i.e. "threads that are controlled by the kernel". When kernel people talk about "kernel threads" they mean "threads that run in kernel mode".
"Threads the kernel knows about" are contrasted to "user threads" which are hidden from the kernel and controlled by the program itself.
No, not all threads controlled by the kernel run in kernel mode. The kernel controls the scheduling of threads that run in kernel mode, and also threads that run in user mode.
The quote about LWPs is talking about systems where the scheduler thinks that all threads are kernel-mode threads. To run a user-mode thread (which they call an LWP because it's not really a thread because all threads are kernel-mode threads) the thread has to call a function like RunLWP(pointer_to_lwp);.
I don't know which system is like this. Linux is not like this; Windows is not like this. This is a weird, overly complicated design which is why it's not normally used.
The "registers" are where the CPU remembers what it is currently doing. The most important one is the "instruction pointer" register (some CPUs call it something different) which remembers which instruction is next. If you remember all of the register values, and then come back later and set them to the same values, the CPU will carry on like nothing happened. That's why threading works - the thread can't tell that it's been interrupted, because all of the registers have the same values as if it wasn't interrupted. Here's a list of registers on x86-class CPUs. You don't need to know them for this question - it just might be interesting.
When an interrupt happens, depending on the CPU type, the CPU will save the instruction pointer and maybe one or two other registers. The interrupt handler has to save the rest (or be careful not to change them). Here about halfway down you can see how an x86-class CPU switches from user-space to an interrupt handler when an interrupt occurs.
So this RunLWP function would save the current registers (from the kernel) and set them according to the last time the LWP stopped running. Then the LWP runs. Then when some interrupt happens, the interrupt handler would save the current registers (from user-space) and set them according to the saved kernel handlers, so the kernel code after RunLWP runs. Probably. Again, I don't know any actual system like this, but it seems like the logical way to do things. The reason it should return back to the kernel code instead of the user code is so that the kernel code can decide whether it wants to keep running the LWP or not.
I don't know why they would say the interrupt handler would save both the kernel-space and user-space registers. Current CPUs generally only have one set of registers which software has to swap out when it wants to make the CPU change what it is doing. RunLWP would have to save the kernel registers and load the user ones, then the interrupt handler would have to save the user registers and load the interrupt handler ones. It could be that the CPUs which these systems were designed for did have two sets of registers.

one-to-one multi-threading model

In silberschatz "Operating System Concepts" book, section 4.3.2 says that
one-to-one model provides more concurrency than the many-to-one model
by allowing another thread to run when a thread makes a blocking
system call. It also allows multiple threads to run in parallel on
multiprocessors.
I have two questions here:
How can one thread be blocked and other mapped on kernel thread?
Dont we know that if one thread is blocked, entire process of that
user-level thread is blocked?
The OS considers user-level threads
as one thread only. It cant be assigned to multiple
processors/cores. Isn't the below given line contradicting that
idea?
It also allows multiple threads to run in parallel on
multiprocessors
Your understanding of user level threads and kernel level threads is not correct, in particular you need to understand how user level threads are mapped to kernel level threads. So first lets define some terms
Kernel thread
A thread (schedulable task) that is created and managed by the kernel. Every kernel level thread is represented by some data structure which contains information related to the thread. In the case of Linux it is task_struct. Kernel threads are the only threads that are considered by the CPU scheduler for scheduling.
Note : Kernel thread is a bit of misnomer as Linux kernel doesn't distinguish between a thread and a process, schedulable task would better describe this entity.
User thread
A thread that is created and managed by some library such as JVM above kernel level. The library that creates these threads is responsible for their management that is which thread runs and when.
User level to kernel level mapping
Now you can create as many user level threads as you want but to execute them you need to create some kernel level thread (task_struct). This creation of kernel level threads can be done in many ways
One to one model
In this case whenever you create a user level thread your library asks the kernel to create a new kernel level thread. In the case of Linux your library will use clone system call to create a kernel level thread.
Many to one model
In this case your library creates only one kernel level thread (task_struct). No matter how many user level threads you create they all share the same kernel level thread, much like the processes running on a single core CPU. The point to understand is that your library here acts much like the CPU scheduler, it schedules many user level threads on single kernel level thread.
Now to your question
The OS considers user-level threads as one thread only. It can’t be
assigned to multiple processors/cores. Isn't the below given line
contradicting that idea?
If you were using many to one model, in that case you will have only one kernel level thread for all of your user level threads and hence they cannot run on different CPU’s.
But if you are using a one to one model then each of your user level threads has a corresponding kernel level thread that can be scheduled individually and hence user level threads can run on different CPU’s given that you have more than one CPU.
You are suffering from a confusing book.
There are real threads (aka kernel threads, 1 to 1 model) and there are simulated threads (aka user threads, many to 1 model).
Some books make this more confusing by throwing a hypothetical many to many model.
User threads are obsolete. Any operating system book worth reading these days would treat them that way and describe them in historical terms.
How can one thread be blocked and other mapped on kernel thread? Dont we know that if one thread is blocked, entire process of that user-level thread is blocked?
You either have user threads or kernel thread. An application that did both would be royally screwed up.
The OS considers user-level threads as one thread only. It cant be assigned to multiple processors/cores. Isn't the below given line contradicting that idea?
In ye olde days a process was considered to be an execution stream and an address space. There were no threads. When threads became necessary (largely due to the need for Ada support), they were simulated using timers. The behavior of threads varied by operating system.
In Eunuchs variants, blocking calls block the process entirely. Thus in simulated (user) threads a blocking call in one thread would block all threads. This is not true on all operating systems.
Now, a process is one or more execution streams and an address space. That is what you ought be learning; not a bunch of technobabble.
A book that talks about threads in terms of 1-to-1 or many-to-1 models is only fit to line cat boxes.

Mapping of user level and kernel level thread

While going through OPERATING SYSTEM PRINCIPLES, 7TH ED
(By Abraham Silberschatz, Peter Baer Galvin, Greg Gagne), i encountered a
statement in Thread Scheduling Section.It is given as -:
To run on a CPU, user-level threads must ultimately be mapped
to an associated kernel-level thread, although this mapping may
be indirect and may use a lightweight process (LWP).
The first half of the statement i.e
To run on a CPU, user-level threads must ultimately be mapped to an associated kernel-level
is trying to say that When a user level thread is executed ,it will need support from kernel thread like system calls.
But i am completely stuck in other half i.e
although this mapping may
be indirect and may use a lightweight process (LWP)
What does it really mean ???
Please help me out !
You're reading a book that is notoriously crapola. Threads are implemented in two ways.
In the olde days (and still persists on some operating systems) there were just processes. A process consisted of an execution stream and an address space.
When languages that needed thread support (e.g., Ada—"tasks") there was a need to create libraries to implement threads. The libraries used timers to switch among the various threads within the process. This is poor man's threading. The major drawback here is that, even when you have multiple processors, all the threads of a process run on the same processor. The threads are just interleaved execution within a single process that executes on one processors.
These are sometimes called "user level threads." Some books call this the "many-to-one model."
To say
To run on a CPU, user-level threads must ultimately be mapped to an associated kernel-level thread
is highly misleading. There [usually] ARE no kernel threads in this model; just processes. Multiple threads run interleaved in a process. To call this a mapping "to an associated kernel-level thread" is misleading and overly theoretical.
This is mumbo jumbo.
although this mapping may be indirect and may use a lightweight process (LWP)
The next stage in operating system evolution here was for the operating system to support threads directly. Instead of a process being an execution stream + address-space, a process became one-or-more-threads + address-space. Instead of scheduling processes for execution, the OS schedules threads for execution.
Those are kernel threads.
Your book is making the simple complex.
These days the term Light Weight Processes and threads are used interchangeably.
although this mapping may be indirect and may use a lightweight
process (LWP)
I know the above statement is confusing(Notice the 2 mays). I can think only 1 thing which the above statement signifies is that:
Earlier when linux supported only user-level threads, the kernel was unaware of the fact that there are multiple user-level threads, and the way it handled these multiple threads was by associating all of them to a light weight process(which kernel sees as a single scheduling and execution unit) at kernel level.
So associating a kernel-level thread with each user-level thread is kind of direct mapping and associating a single light weight process with each user-level thread is indirect mapping.

execution of user-level-threads on Kernel threads - many to one [duplicate]

So two questions here really. First, (and yes, I have searched this already, but wanted clarification), what is the difference between a user thread and a kernel thread? Is it simply that one is generated by a user program and the other by an OS, with the latter having access to privileged instructions? Are they conceptually the same or are there actual differences in the threads themselves?
Second, and the real problem of my question is: the book I am using says that "a relationship must exist between user threads and kernel threads," going on to list the different models of such a relationship. But the book fails to clearly explain why a user thread must always be mapped to a specific kernel thread. Why is this?
A kernel thread is a thread object maintained by the operating system. It is an actual thread that is capable of being scheduled and executed by the processor. Typically, kernel threads are heavyweight objects with permissions settings, priorities, etc. The kernel thread scheduler is in charge of scheduling kernel threads.
User programs can make their own thread schedulers too. They can make their own "threads" and simulate context-switches to switch between them. However, these threads aren't kernel threads. Each user thread can't actually run on its own, and the only way for a user thread to run is if a kernel thread is actually told to execute the code contained in a user thread. That said, user threads have major advantages over kernel threads. They can be a lot more lightweight, since they don't necessarily need to have their own priorities, can be managed by a single process (which might have better info about what threads need to run when), and don't create lots of kernel objects for purposes of security and locking.
The reason that user threads have to be associated with kernel threads is that by itself a user thread is just a bunch of data in a user program. Kernel threads are the real threads in the system, so for a user thread to make progress the user program has to have its scheduler take a user thread and then run it on a kernel thread. The mapping between user threads and kernel threads doesn't have to be one-to-one (1 : 1); you can have multiple user threads share the same kernel thread (only one of those user threads runs at a time), and you can have a single user thread which is rotated across different kernel threads in a 1 : n mapping.
I think a real world example will clear the confusion, so let’s see how things are done in Linux.
First of all Linux doesn’t differentiate between process and thread, entity that can be scheduled is called task in Linux and represented by task_struct. So whenever you execute a fork() system call, a new task_struct is created which holds data (or pointer) associated with new task.
So in Linux world a kernel thread means a task_struct object.
Because scheduler only knows about these entities which can be assigned to different CPU’s (logical or physical). In other words if you want Linux scheduler to schedule your process you must create a task_struct.
User thread is something that is supported and managed outside of kernel by some execution environment (EE from now on) such as JVM. These EE’s will provide you with some functions to create new threads.
But why a user thread must always be mapped to a specific kernel thread.
Let’s say you created some threads using your EE. eventually they must be executed by the CPU and from above explanation we know that the thread must have a task_struct in order to be assigned to some CPU. That is why the mapping must exist. It’s the duty of your EE to create task_structs.
If your EE uses many to one model then it will create only one task_struct for all the threads and it will schedule all these threads onto that task_struct. Think of it as there is one CPU (task_struct) and many processes (threads created in EE), your operating system (the EE) will multiplex these processes on that single CPU.
If it uses one to one model than there will be one task_struct for every thread created in EE. So when you create a new thread in your EE, corresponding task_struct gets created in the kernel.
Windows does things differentlly ( process and thread is different ) but general idea stays the same that is kernel thread is the entity that CPU scheduler considers for assignment hence user threads must be mapped to corresponding kernel threads (if you want CPU to execute them).

Difference between user-level and kernel-supported threads?

I've been looking through a few notes based on this topic, and although I have an understanding of threads in general, I'm not really to sure about the differences between user-level and kernel-level threads.
I know that processes are basically made up of multiple threads or a single thread, but are these thread of the two prior mentioned types?
From what I understand, kernel-supported threads have access to the kernel for system calls and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer when then utilise kernel-supported threads to perform operations that couldn't be normally performed due to its state?
Edit: The question was a little confusing, so I'm answering it two different ways.
OS-level threads vs Green Threads
For clarity, I usually say "OS-level threads" or "native threads" instead of "Kernel-level threads" (which I confused with "kernel threads" in my original answer below.) OS-level threads are created and managed by the OS. Most languages have support for them. (C, recent Java, etc) They are extremely hard to use because you are 100% responsible for preventing problems. In some languages, even the native data structures (such as Hashes or Dictionaries) will break without extra locking code.
The opposite of an OS-thread is a green thread that is managed by your language. These threads are given various names depending on the language (coroutines in C, goroutines in Go, fibers in Ruby, etc). These threads only exist inside your language and not in your OS. Because the language chooses context switches (i.e. at the end of a statement), it prevents TONS of subtle race conditions (such as seeing a partially-copied structure, or needing to lock most data structures). The programmer sees "blocking" calls (i.e. data = file.read() ), but the language translates it into async calls to the OS. The language then allows other green threads to run while waiting for the result.
Green threads are much simpler for the programmer, but their performance varies: If you have a LOT of threads, green threads can be better for both CPU and RAM. On the other hand, most green thread languages can't take advantage of multiple cores. (You can't even buy a single-core computer or phone anymore!). And a bad library can halt the entire language by doing a blocking OS call.
The best of both worlds is to have one OS thread per CPU, and many green threads that are magically moved around onto OS threads. Languages like Go and Erlang can do this.
system calls and other uses not available to user-level threads
This is only half true. Yes, you can easily cause problems if you call the OS yourself (i.e. do something that's blocking.) But the language usually has replacements, so you don't even notice. These replacements do call the kernel, just slightly differently than you think.
Kernel threads vs User Threads
Edit: This is my original answer, but it is about User space threads vs Kernel-only threads, which (in hindsight) probably wasn't the question.
User threads and Kernel threads are exactly the same. (You can see by looking in /proc/ and see that the kernel threads are there too.)
A User thread is one that executes user-space code. But it can call into kernel space at any time. It's still considered a "User" thread, even though it's executing kernel code at elevated security levels.
A Kernel thread is one that only runs kernel code and isn't associated with a user-space process. These are like "UNIX daemons", except they are kernel-only daemons. So you could say that the kernel is a multi-threaded program. For example, there is a kernel thread for swap. This forces all swap issues to get "serialized" into a single stream.
If a user thread needs something, it will call into the kernel, which marks that thread as sleeping. Later, the swap thread finds the data, so it marks the user thread as runnable. Later still, the "user thread" returns from the kernel back to userland as if nothing happened.
In fact, all threads start off in kernel space, because the clone() operation happens in kernel space. (And there's lots of kernel accounting to do before you can 'return' to a new process in user space.)
Before we go into comparison, let us first understand what a thread is. Threads are lightweight processes within the domain of independent processes. They are required because processes are heavy, consume a lot of resources and more importantly,
two separate processes cannot share a memory space.
Let's say you open a text editor. It's an independent process executing in the memory with a separate addressable location. You'll need many resources within this process, such as insert graphics, spell-checks etc. It's not feasible to create separate processes for each of these functionalities and maintain them independently in memory. To avoid this,
multiple threads can be created within a single process, which can
share a common memory space, existing independently within a process.
Now, coming back to your questions, one at a time.
I'm not really to sure about the differences between user-level and kernel-level threads.
Threads are broadly classified as user level threads and kernel level threads based on their domain of execution. There are also cases when one or many user thread maps to one or many kernel threads.
- User Level Threads
User level threads are mostly at the application level where an application creates these threads to sustain its execution in the main memory. Unless required, these thread work in isolation with kernel threads.
These are easier to create since they do not have to refer many registers and context switching is much faster than a kernel level thread.
User level thread, mostly can cause changes at the application level and the kernel level thread continues to execute at its own pace.
- Kernel Level Threads
These threads are mostly independent of the ongoing processes and are executed by the operating system.
These threads are required by the Operating System for tasks like memory management, process management etc.
Since these threads maintain, execute and report the processes required by the operating system; kernel level threads are more expensive to create and manage and context switching of these threads are slow.
Most of the kernel level threads can not be preempted by the user level threads.
MS DOS written for Intel 8088 didn't have dual mode of operation. Thus, a user level process had the ability to corrupt the entire operating system.
- User Level Threads mapped over Kernel Threads
This is perhaps the most interesting part. Many user level threads map over to kernel level thread, which in-turn communicate with the kernel.
Some of the prominent mappings are:
One to One
When one user level thread maps to only one kernel thread.
advantages: each user thread maps to one kernel thread. Even if one of the user thread issues a blocking system call, the other processes remain unaffected.
disadvantages: every user thread requires one kernel thread to interact and kernel threads are expensive to create and manage.
Many to One
When many user threads map to one kernel thread.
advantages: multiple kernel threads are not required since similar user threads can be mapped to one kernel thread.
disadvantage: even if one of the user thread issues a blocking system call, all the other user threads mapped to that kernel thread are blocked.
Also, a good level of concurrency cannot be achieved since the kernel will process only one kernel thread at a time.
Many to Many
When many user threads map to equal or lesser number of kernel threads. The programmer decides how many user threads will map to how many kernel threads. Some of the user threads might map to just one kernel thread.
advantages: a great level of concurrency is achieved. Programmer can decide some potentially dangerous threads which might issue a blocking system call and place them with the one-to-one mapping.
disadvantage: the number of kernel threads, if not decided cautiously can slow down the system.
The other part of your question:
kernel-supported threads have access to the kernel for system calls
and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer
when then utilise kernel-supported threads to perform operations that
couldn't be normally performed due to its state?
Partially correct. Almost all the kernel thread have access to system calls and other critical interrupts since kernel threads are responsible for executing the processes of the OS. User thread will not have access to some of these critical features. e.g. a text editor can never shoot a thread which has the ability to change the physical address of the process. But if needed, a user thread can map to kernel thread and issue some of the system calls which it couldn't do as an independent entity. The kernel thread would then map this system call to the kernel and would execute actions, if deemed fit.
Quote from here :
Kernel-Level Threads
To make concurrency cheaper, the execution aspect of process is separated out into threads. As such, the OS now manages threads and processes. All thread operations are implemented in the kernel and the OS schedules all threads in the system. OS managed threads are called kernel-level threads or light weight processes.
NT: Threads
Solaris: Lightweight processes(LWP).
In this method, the kernel knows about and manages the threads. No runtime system is needed in this case. Instead of thread table in each process, the kernel has a thread table that keeps track of all threads in the system. In addition, the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads.
Advantages:
Because kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having large number of threads than process having small number of threads.
Kernel-level threads are especially good for applications that frequently block.
Disadvantages:
The kernel-level threads are slow and inefficient. For instance, threads operations are hundreds of times slower than that of user-level threads.
Since kernel must manage and schedule threads as well as processes. It require a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity.
User-Level Threads
Kernel-Level threads make concurrency much cheaper than process because, much less state to allocate and initialize. However, for fine-grained concurrency, kernel-level threads still suffer from too much overhead. Thread operations still require system calls. Ideally, we require thread operations to be as fast as a procedure call. Kernel-Level threads have to be general to support the needs of all programmers, languages, runtimes, etc. For such fine grained concurrency we need still "cheaper" threads.
To make threads cheap and fast, they need to be implemented at user level. User-Level threads are managed entirely by the run-time system (user-level library).The kernel knows nothing about user-level threads and manages them as if they were single-threaded processes.User-Level threads are small and fast, each thread is represented by a PC,register,stack, and small thread control block. Creating a new thread, switiching between threads, and synchronizing threads are done via procedure call. i.e no kernel involvement. User-Level threads are hundred times faster than Kernel-Level threads.
Advantages:
The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads.
User-level threads does not require modification to operating systems.
Simple Representation: Each thread is represented simply by a PC, registers, stack and a small control block, all stored in the user process address space.
Simple Management: This simply means that creating a thread, switching between threads and synchronization between threads can all be done without intervention of the kernel.
Fast and Efficient: Thread switching is not much more expensive than a procedure call.
Disadvantages:
User-Level threads are not a perfect solution as with everything else, they are a trade off. Since, User-Level threads are invisible to the OS they are not well integrated with the OS. As a result, Os can make poor decisions like scheduling a process with idle threads, blocking a process whose thread initiated an I/O even though the process has other threads that can run and unscheduling a process with a thread holding a lock. Solving this requires communication between between kernel and user-level thread manager.
There is a lack of coordination between threads and operating system kernel. Therefore, process as whole gets one time slice irrespect of whether process has one thread or 1000 threads within. It is up to each thread to relinquish control to other threads.
User-level threads requires non-blocking systems call i.e., a multithreaded kernel. Otherwise, entire process will blocked in the kernel, even if there are runable threads left in the processes. For example, if one thread causes a page fault, the process blocks.
User Threads
The library provides support for thread creation, scheduling and management with no support from the kernel.
The kernel unaware of user-level threads creation and scheduling are done in user space without kernel intervention.
User-level threads are generally fast to create and manage they have drawbacks however.
If the kernel is single-threaded, then any user-level thread performing a blocking system call will cause the entire process to block, even if other threads are available to run within the application.
User-thread libraries include POSIX Pthreads, Mach C-threads,
and Solaris 2 UI-threads.
Kernel threads
The kernel performs thread creation, scheduling, and management in kernel space.
kernel threads are generally slower to create and manage than are user threads.
the kernel is managing the threads, if a thread performs a blocking system call.
A multiprocessor environment, the kernel can schedule threads on different processors.
5.including Windows NT, Windows 2000, Solaris 2, BeOS, and Tru64 UNIX (formerlyDigital UN1X)-support kernel threads.
Some development environments or languages will add there own threads like feature, that is written to take advantage of some knowledge of the environment, for example a GUI environment could implement some thread functionality which switch between user threads on each event loop.
A game library could have some thread like behaviour for characters. Sometimes the user thread like behaviour can be implemented in a different way, for example I work with cocoa a lot, and it has a timer mechanism which executes your code every x number of seconds, use fraction of a seconds and it like a thread. Ruby has a yield feature which is like cooperative threads. The advantage of user threads is they can switch at more predictable times. With kernel thread every time a thread starts up again, it needs to load any data it was working on, this can take time, with user threads you can switch when you have finished working on some data, so it doesn't need to be reloaded.
I haven't come across user threads that look the same as kernel threads, only thread like mechanisms like the timer, though I have read about them in older text books so I wonder if they were something that was more popular in the past but with the rise of true multithreaded OS's (modern Windows and Mac OS X) and more powerful hardware I wonder if they have gone out of favour.

Resources