Threads in Linux - linux

I have gone through mostly all the questions on here regarding the topic of Pthreads in Linux but there is a basic doubt which remains unresolved for me:
It is mentioned in various responses that when we create a POSIX thread on Linux, there is a 1:1 mapping between user thread and kernel thread.
My doubt is when we use pthread_create() in Linux, is there 1 user thread and a corresponding unique kernel thread created implicitly (i.e. a total of 2 threads, one of which is invisible to the user)?
OR
There only one kernel thread created and there is nothing anymore in newer Linux kernels called a user thread?

The NPTL (Native POSIX Thread Library) and the older LinuxThreads both use a 1:1 model, where each threads (or process) created by the user corresponds with one schedulable entity in the kernel.
However, you maybe confused by user-level threads, or fibers, those are threads of execution created via calls like makecontext() and swapcontext() that have N:1 model, the kernel doesn't know about user-level threads and their scheduling is done in user-space.

pthread_create() internally calls fork(), and vfork()/fork() internally calls clone(). So in most of the case it is 1:1 mapping.

Related

Do process and threads in the programming libraries or modules mean processes, kernel-level threads, or user-level threads?

I start to wonder about the difference between processes, kernel-level threads, and user-level threads.
Do process and threads in the Linux API mean processes, kernel-level threads, or user-level threads?
Same question for the standard modules in programming languages such as Python, Java and C#?
Thanks.
Linux processes and linux thread obviously will be "kernel level" because Linux is the kernel. But, you should be aware that the distinction between process and thread is not as sharp in Linux as in some other operating systems. Linux processes and threads are created by the clone system call (http://man7.org/linux/man-pages/man2/clone.2.html), and whether you call the result of clone a "process" or a "thread" depends on what options you give it.
As for language X or library Y, the question of whether threads are "user threads" (a.k.a., "green threads") or "kernel threads" (a.k.a., "native threads") will depend on what language/library you are talking about, and it may depend on what specific version and what specific implementation of the library or language you are talking about.
First let’s define the terms
User level thread:- a thread which is created and managed by some library outside of kernel. That is kernel is not directly aware of these threads.
Kernel level threads:- created and managed by kernel. For every kernel level thread kernel maintains some data structure to store associated information.
although these definitions are not universal most of the literature agrees upon them. ( Operating system concepts, modern operating systems etc)
When we talk about threads created by some library they are always user level threads. The point to understand is the mapping from user level threads to kernel level threads.
It’s up to the library (JVM, .NET) that what kind of mapping it uses. It can use one to one model where every user level thread will be mapped to its own kernel level thread. Or it can use many to one model where more than one user threads are mapped to same kernel thread.
As far as Linux is concerned it does not differentiate between process and thread but it provides you the ability to control the level of resource sharing between parent and child. You can do it by using clone system call. if you create a child process with maximum sharing than it’s effectively a thread. On the other hand if you pass flags to clone such that nothing is shared than its a process.
In summary threads in programming libraries are always user level threads. Because they are neither created nor managed (directly) by the kernel.

How are threads/processes parked and woken in Linux, prior to futex?

Before the futex system calls existed in Linux, what underlying system calls were used by threading libraries like pthreads to block/sleep a thread and to subsequently wake those threads from userland?
For example, if a thread tries to acquire a mutex, the userland implementation will block the thread (perhaps after a short spinning interval), but I can't find the syscalls that are used for this (other than futex which are a relatively recent creation).
Before futex and current implementation of pthreads for Linux, the NPTL (require kernel 2.6 and newer), there were two other threading libraries with POSIX Thread API for Linux: linuxthreads and NGPT (which was based on Gnu Pth. LinuxThreads was the only widely used libpthread for years (and it can still be used in some strange & unmaintained micro-libc to work on 2.4; other micro-libc variants may have own builtin implementation of pthread-like API on top of futex+clone). And Gnu Pth is not thread library, it is single process thread with user-level "thread" switching.
You should know that there are several Threading Models when we check does the kernel knows about some or all of user threads (how many CPU cores can be used with adding threads to the program; what is the cost of having the thread / how many threads may be started). Models are named as M:N where M is userspace thread number and N is thread number schedulable by OS kernel:
"1:1" ''kernel-level threading'' - every userspace thread is schedulable by OS kernel. This is implemented in Linuxthreads, NPTL and many modern OS.
"N:1" ''user-level threading'' - userspace threads are planned by the userspace, they all are invisible to the kernel, it only schedules one process (and it may use only 1 CPU core). Gnu Pth (GNU Portable Threads) is example of it, and there are many other implementations for some computer architectures.
"M:N" ''hybrid threading'' - there are some entities visible and schedulable by OS kernel, but there may be more user-space threads in them. And sometimes user-space threads will migrate between kernel-visible threads.
With 1:1 model there are many classic sleep mechanisms/APIs in Unix like select/poll and signals and other variants of IPC APIs. As I remember, Linuxthreads used separate processes for every thread (with fully shared memory) and there was special manager "thread" (process) to emulate some POSIX thread features. Wikipedia says that SIGUSR1/SIGUSR2 were used in Linuxthreads for some internal communication between threads, same says IBM "The synchronization of primitives is achieved by means of signals. For example, threads block until awoken by signals.". Check also the project FAQ http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#H.4 "With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?"
LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation.
On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32 signals available and the kernel reserves all of them but two: SIGUSR1 and SIGUSR2. So, LinuxThreads has no choice but use those two signals.
With "N:1" model thread may call some blocking syscall and block everything (some libraries may convert some blocking syscalls into async, or use some SIGALRM or SIGVTALRM magic); or it may call some (very) special internal threading function which will do user-space thread switching by rewriting machine state register (like switch_to in linux kernel, save IP/SP and other regs, restore IP/SP and regs of other thread). So, kernel does not wake any user thread directly from userland, it just schedules whole process; and user space scheduler implement thread synchronization logic (or just calls sched_yield or select when there is no threads to work).
With M:N model things are very complicated... Don't know much about NGPT... There is one paragraph about NGPT in POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330 page 5
There is a new pthread library under development called NGPT. This library is based on the GNU Pth library, which is an M:1 library. NGPT extends Pth by using multiple Linux tasks, thus creating an M:N library. It attempts to preserve Pth’s pthread compatibility while also using multiple Linux tasks for concurrency, but this effort is hampered by the underlying differences in the Linux threading model. The NGPT library at present uses non-blocking wrappers around blocking system calls to avoid
blocking in the kernel.
Some papers and posts: POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330, LWN post about NPTL 0.1
The futex system call is used extensively in all synchronization
primitives and other places which need some kind of
synchronization. The futex mechanism is generic enough to support
the standard POSIX synchronization mechanisms with very little
effort. ... Futexes also allow the implementation of inter-process
synchronization primitives, a sorely missed feature in the old
LinuxThreads implementation (Hi jbj!).
NPTL design pdf:
5.5 Synchronization Primitives
The implementation of the synchronization primitives such as mutexes, read-write
locks, conditional variables, semaphores, and barriers requires some form of kernel
support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application.
Fortunately some new functionality was added to the kernel to implement all kinds
of synchronization primitives: futexes [Futex]. The underlying principle is simple but
powerful enough to be adaptable to all kinds of uses. Callers can block in the kernel
and be woken either explicitly, as a result of an interrupt, or after a timeout.
Futex stands for "fast userspace mutex." It's simply an abstraction over mutexes which is considered faster and more convenient than traditional mutex mechanisms because it implements the wait system for you. Before and after futex(), threads were put to sleep and awoken via a change in their process state. The process states are:
Running state
Sleeping state
Un-interruptible sleeping state (i.e. blocking for a syscall like read() or write()
Defunct/zombie state
When a thread is suspended, it is put into (interruptible) 'sleep' state. Later, it can be woken via the wake_up() function, which operates on its task structure within the kernel. As far as I can tell, wake_up is a kernel function, not a syscall. The kernel doesn't need a syscall to wake or sleep a task; it (or a process) simply changes the task structure to reflect the state of the process. When the Linux scheduler next deals with that process, it treats it according to its state (again, the states are listed above).
Short story: futex() implements a wait system for you. Without it, you need a data structure that's accessible from the main thread and from the sleeping thread in order to wake up a sleeping thread. All of this is done with userland code. The only thing you might need from the kernel is a mutex--the specifics of which do include locking mechanisms and mutex datastructures, but don't inherently wake or sleep the thread. The syscalls you're looking for don't exist. Essentially, most of what you're talking about can be achieved from userspace, without a syscall, by manually keeping track of data conditions that determine whether and when to sleep or wake a thread.

execution of user-level-threads on Kernel threads - many to one [duplicate]

So two questions here really. First, (and yes, I have searched this already, but wanted clarification), what is the difference between a user thread and a kernel thread? Is it simply that one is generated by a user program and the other by an OS, with the latter having access to privileged instructions? Are they conceptually the same or are there actual differences in the threads themselves?
Second, and the real problem of my question is: the book I am using says that "a relationship must exist between user threads and kernel threads," going on to list the different models of such a relationship. But the book fails to clearly explain why a user thread must always be mapped to a specific kernel thread. Why is this?
A kernel thread is a thread object maintained by the operating system. It is an actual thread that is capable of being scheduled and executed by the processor. Typically, kernel threads are heavyweight objects with permissions settings, priorities, etc. The kernel thread scheduler is in charge of scheduling kernel threads.
User programs can make their own thread schedulers too. They can make their own "threads" and simulate context-switches to switch between them. However, these threads aren't kernel threads. Each user thread can't actually run on its own, and the only way for a user thread to run is if a kernel thread is actually told to execute the code contained in a user thread. That said, user threads have major advantages over kernel threads. They can be a lot more lightweight, since they don't necessarily need to have their own priorities, can be managed by a single process (which might have better info about what threads need to run when), and don't create lots of kernel objects for purposes of security and locking.
The reason that user threads have to be associated with kernel threads is that by itself a user thread is just a bunch of data in a user program. Kernel threads are the real threads in the system, so for a user thread to make progress the user program has to have its scheduler take a user thread and then run it on a kernel thread. The mapping between user threads and kernel threads doesn't have to be one-to-one (1 : 1); you can have multiple user threads share the same kernel thread (only one of those user threads runs at a time), and you can have a single user thread which is rotated across different kernel threads in a 1 : n mapping.
I think a real world example will clear the confusion, so let’s see how things are done in Linux.
First of all Linux doesn’t differentiate between process and thread, entity that can be scheduled is called task in Linux and represented by task_struct. So whenever you execute a fork() system call, a new task_struct is created which holds data (or pointer) associated with new task.
So in Linux world a kernel thread means a task_struct object.
Because scheduler only knows about these entities which can be assigned to different CPU’s (logical or physical). In other words if you want Linux scheduler to schedule your process you must create a task_struct.
User thread is something that is supported and managed outside of kernel by some execution environment (EE from now on) such as JVM. These EE’s will provide you with some functions to create new threads.
But why a user thread must always be mapped to a specific kernel thread.
Let’s say you created some threads using your EE. eventually they must be executed by the CPU and from above explanation we know that the thread must have a task_struct in order to be assigned to some CPU. That is why the mapping must exist. It’s the duty of your EE to create task_structs.
If your EE uses many to one model then it will create only one task_struct for all the threads and it will schedule all these threads onto that task_struct. Think of it as there is one CPU (task_struct) and many processes (threads created in EE), your operating system (the EE) will multiplex these processes on that single CPU.
If it uses one to one model than there will be one task_struct for every thread created in EE. So when you create a new thread in your EE, corresponding task_struct gets created in the kernel.
Windows does things differentlly ( process and thread is different ) but general idea stays the same that is kernel thread is the entity that CPU scheduler considers for assignment hence user threads must be mapped to corresponding kernel threads (if you want CPU to execute them).

Mapping User-level threads and Kernel-level threads

How are User-level threads mapped to Kernel-level threads?
It varies by implementation. The three most common threading models are:
1-to-1: Each user-level thread has a corresponding entity that is scheduled by the kernel.
n-to-1: Each process is scheduled by the kernel. Thread scheduling takes place entirely in user space.
n-to-m: Each process has a pool of entities that are scheduled by the kernel. These are assigned to run particular user-level threads by a user-space scheduler that is part of the process.
Modern implementations are almost all 1-to-1.
There's a bit of confusion about the terminology used for referring to ULTs and KLTs.
Following are the two different interpretations. Please correct me if I got this wrong:
KLTs are needed to achieve concurrency in the kernel (Note the interpretation of Kernel as a Process or a live entity). This is true about Micro kernels like Symbian, where a kernel thread is responsible for every hardware resource of the system (e.g File Server, Location Server, Calendar Server, etc). However, in a kernel like Linux, which is mostly a library (and not a process or a living entity on its own), there's really no meaning for Kernel threads. In Linux, every thread you create is treated by the Kernel as a process and Kernel always runs either in the Process context or the Interrupt context.
Second interpretation is based on whether Threading (or concurrency) is visible to the Kernel or not. For instance, using setjmp, longjmp one can achieve concurrency at user space. Like already discussed, Kernel is totally unaware of this. This concurrency may be termed as ULT. And the thread whose creation the Kernel is aware of (one using Clone() system call) may be called KLT.

Threads: Why must all user threads be mapped to a kernel thread?

So two questions here really. First, (and yes, I have searched this already, but wanted clarification), what is the difference between a user thread and a kernel thread? Is it simply that one is generated by a user program and the other by an OS, with the latter having access to privileged instructions? Are they conceptually the same or are there actual differences in the threads themselves?
Second, and the real problem of my question is: the book I am using says that "a relationship must exist between user threads and kernel threads," going on to list the different models of such a relationship. But the book fails to clearly explain why a user thread must always be mapped to a specific kernel thread. Why is this?
A kernel thread is a thread object maintained by the operating system. It is an actual thread that is capable of being scheduled and executed by the processor. Typically, kernel threads are heavyweight objects with permissions settings, priorities, etc. The kernel thread scheduler is in charge of scheduling kernel threads.
User programs can make their own thread schedulers too. They can make their own "threads" and simulate context-switches to switch between them. However, these threads aren't kernel threads. Each user thread can't actually run on its own, and the only way for a user thread to run is if a kernel thread is actually told to execute the code contained in a user thread. That said, user threads have major advantages over kernel threads. They can be a lot more lightweight, since they don't necessarily need to have their own priorities, can be managed by a single process (which might have better info about what threads need to run when), and don't create lots of kernel objects for purposes of security and locking.
The reason that user threads have to be associated with kernel threads is that by itself a user thread is just a bunch of data in a user program. Kernel threads are the real threads in the system, so for a user thread to make progress the user program has to have its scheduler take a user thread and then run it on a kernel thread. The mapping between user threads and kernel threads doesn't have to be one-to-one (1 : 1); you can have multiple user threads share the same kernel thread (only one of those user threads runs at a time), and you can have a single user thread which is rotated across different kernel threads in a 1 : n mapping.
I think a real world example will clear the confusion, so let’s see how things are done in Linux.
First of all Linux doesn’t differentiate between process and thread, entity that can be scheduled is called task in Linux and represented by task_struct. So whenever you execute a fork() system call, a new task_struct is created which holds data (or pointer) associated with new task.
So in Linux world a kernel thread means a task_struct object.
Because scheduler only knows about these entities which can be assigned to different CPU’s (logical or physical). In other words if you want Linux scheduler to schedule your process you must create a task_struct.
User thread is something that is supported and managed outside of kernel by some execution environment (EE from now on) such as JVM. These EE’s will provide you with some functions to create new threads.
But why a user thread must always be mapped to a specific kernel thread.
Let’s say you created some threads using your EE. eventually they must be executed by the CPU and from above explanation we know that the thread must have a task_struct in order to be assigned to some CPU. That is why the mapping must exist. It’s the duty of your EE to create task_structs.
If your EE uses many to one model then it will create only one task_struct for all the threads and it will schedule all these threads onto that task_struct. Think of it as there is one CPU (task_struct) and many processes (threads created in EE), your operating system (the EE) will multiplex these processes on that single CPU.
If it uses one to one model than there will be one task_struct for every thread created in EE. So when you create a new thread in your EE, corresponding task_struct gets created in the kernel.
Windows does things differentlly ( process and thread is different ) but general idea stays the same that is kernel thread is the entity that CPU scheduler considers for assignment hence user threads must be mapped to corresponding kernel threads (if you want CPU to execute them).

Resources