Is Pthread library actually a user thread solution? - multithreading

The title might not be clear enough because I don't know how to define my questions actually.
I understand Pthread is a thread library meeting POSIX standard (about POSIX, see wikipedia: http://en.wikipedia.org/wiki/Posix). It is available in Unix-like OS.
About thread, I read that there are three different models:
User level thread: the kernel does not know it. User himself creates/implements/destroy threads.
Kernel level thread: kernel directly supports multiple threads of control in a process.
Light weight process(LWP): scheduled by kernel but can be bounded with user threads.
Did you see my confusion? When I call pthread_create() to create a thread, did I create a user level thread? I guess so. So can I say, Pthread offers a user level solution for threads? It can not manipulate kernel/LWP?

#paulsm4 I am doubtful about your comment that kernel knows every thing. In this particular context of user level threads, the kernel is unaware of the fact that such a thing is happening. A user level thread's scheduling is maintained by the user himself (via the interface provided by a library) and the kernel ends up allotting just a single kernel thread to the whole process. Kernel would treat the process as a single threaded and any blocking call by one of the threads would end up blocking all the threads of that process.
Refer to http://www.personal.kent.edu/~rmuhamma/OpSystems/Myos/threads.htm

In Linux, pthread is implemented as a lightweight process. Kernel (v2.6+) is actually implemented with NPTL. Let me quote the wiki content:
NPTL is a so-called 1×1 threads library, in that threads created by the user (via the pthread_create() library function) are in 1-1 correspondence with schedulable entities in the kernel (tasks, in the Linux case). This is the simplest possible threading implementation.
So pthread in linux kernel is actually implemented as kernel thread.

pthreads, per se, isn't really a threading library. pthreads is the interface which a specific threading library implements, using the concurrency resources available on that platform. So there's a pthreads implementation on linux, on bsd, on solaris, etc., and while the interface (the header files and the meaning of the calls) is the same, the implementation of each is different.
So what pthread_create actually does, in terms of kernel thread objects, varies between OSes and pthread library implementations. At a first approximation, you don't need to know (that's stuff that the pthread abstraction allows you to not need to know about). Eventually you might need to see "behind the curtain", but for most pthread users that's not necessary.
If you want to know what a /specific/ pthread implementation does, on a specific OS, you'll need to clarify your question. What Solaris and Linux do, for example, is very different.

Q: I understand Pthread is a thread library meeting POSIX standard
A: Yes. Actually, "Pthreads" stands for "Posix threads":
http://en.wikipedia.org/wiki/Pthreads
Q: It is available in Unix-like OS.
A: Actually, it's available for many different OSs ... including Windows, MacOS ... and, of course, Linux, BSD and Solaris.
Q: About thread, I read that there are three different models
Now you're getting fuzzy. "Threads" is a very generic term. There are many, many different models. And many, many different ways you can characterize and/or implement "threads". Including stuff like the Java threading model, or the Ada threading model.
Q: When I call pthread_create() to create a thread, did I create a
user level thread?
A: Yes: Just about everything you do in user space is "protected" in your own, private "user space".
Q: User level thread: the kernel does not know it.
A: No. The kernel knows everything :)
Q: Kernel level thread: kernel directly supports multiple threads of
control in a process.
A: Yes, there is such a thing as "kernel threads".
And, as it happens, Linux makes EXTENSIVE use of kernel threads. For example, every single process in a Linux system is a "kernel thread". And every user-created pthread is ALSO implemented as a new "kernel thread". As are "worker threads" (which are completely invisible to any user-level process).
But this is an advanced topic you do NOT need to understand in order to effectively use pthreads. Here's a great book that discussed this - and many other topics - in detail:
Linux Kernel Development, Robert Love
Remember: "Pthreads" is an interface. How it's implemented depends on the platform. Linux uses kernel threads; Windows uses Win32 threads, etc.
===========================================================================
ADDENDUM:
Since people still seem to be hitting this old thread, I thought it would be useful to reference this post:
https://stackoverflow.com/a/11255174/421195
Linux typically uses two implementations of pthreads:
LinuxThreads and Native
POSIX Thread Library(NPTL),
although the former is largely obsolete. Kernel from 2.6 provides
NPTL, which provides much closer conformance to SUSv3, and perform
better especially when there are many threads.
You can query the
specific implementation of pthreads under shell using command:
getconf GNU_LIBPTHREAD_VERSION
You can also get a more detailed implementation difference in The
Linux Programming Interface.
"Pthreads" is a library, based on the Posix standard. How a pthreads library is implemented will differ from platform to platform and library to library.

I find previous answers not as satisfying or clear as I would have liked so here goes:
When you call
pthread_create(...)
you always create a new user-level thread. And assuming that there is OS, there is always one or more kernel thread...but let's dive deeper:
According to "Operating system concepts" 10th edition,the actual classification we should be looking at (when it comes to thread libraries) is how the user level threads are mapped onto kernel threads (and that's what the question really meant).
The models are one to one (each user-level thread within a single process is mapped to a different kernel thread),many to one (the thread library is "user level" so all of the different threads within a single process are mapped to a single kernel thread,and the threads data structures, context switch etc are dealt with at user level and not by the OS [meaning that if a thread blocks on some I/O call, the entire process might potentially block]), and many to many (something in between,obviously the number of user-level threads is greater or equal to the number of kernel threads it is being mapped onto).
Now,pthreads is a specification and not an implementation, and the implementation does depend on the OS to which it is written. It could be any one of those models (notice that "many to many" is very flexible).
So,as an example,on Linux and Windows (the most popular OSs for years now,where the model is "one to one") the implementation is "one to one".

Pthreads is just a standardized interface for threading libraries. Whether an OS thread or a lightweight thread is created depends on the library you use. Nevertheless, my first guest would be that your threads are “real” OS-level threads.

Related

Do process and threads in the programming libraries or modules mean processes, kernel-level threads, or user-level threads?

I start to wonder about the difference between processes, kernel-level threads, and user-level threads.
Do process and threads in the Linux API mean processes, kernel-level threads, or user-level threads?
Same question for the standard modules in programming languages such as Python, Java and C#?
Thanks.
Linux processes and linux thread obviously will be "kernel level" because Linux is the kernel. But, you should be aware that the distinction between process and thread is not as sharp in Linux as in some other operating systems. Linux processes and threads are created by the clone system call (http://man7.org/linux/man-pages/man2/clone.2.html), and whether you call the result of clone a "process" or a "thread" depends on what options you give it.
As for language X or library Y, the question of whether threads are "user threads" (a.k.a., "green threads") or "kernel threads" (a.k.a., "native threads") will depend on what language/library you are talking about, and it may depend on what specific version and what specific implementation of the library or language you are talking about.
First let’s define the terms
User level thread:- a thread which is created and managed by some library outside of kernel. That is kernel is not directly aware of these threads.
Kernel level threads:- created and managed by kernel. For every kernel level thread kernel maintains some data structure to store associated information.
although these definitions are not universal most of the literature agrees upon them. ( Operating system concepts, modern operating systems etc)
When we talk about threads created by some library they are always user level threads. The point to understand is the mapping from user level threads to kernel level threads.
It’s up to the library (JVM, .NET) that what kind of mapping it uses. It can use one to one model where every user level thread will be mapped to its own kernel level thread. Or it can use many to one model where more than one user threads are mapped to same kernel thread.
As far as Linux is concerned it does not differentiate between process and thread but it provides you the ability to control the level of resource sharing between parent and child. You can do it by using clone system call. if you create a child process with maximum sharing than it’s effectively a thread. On the other hand if you pass flags to clone such that nothing is shared than its a process.
In summary threads in programming libraries are always user level threads. Because they are neither created nor managed (directly) by the kernel.

Is there a difference in how threads work in Linux vs Unix vs Solaris

I'm just curious to if there is any difference to how threads work in the different flavors of Unix.
In particular Linux, Solaris and the original Unix
Any thoughts?
The original Unix does not use threads for multiprogramming. An article from 1987 comparing the Mach and Unix kernel states that "neither Unix System V nor 4.3 BSD provide a way to manage more than one thread of control within an address space".
Modern Unix clones adhere to the POSIX Threads specification. The POSIX Threads standard only appeared in 1995. The underlying implementation in Linux is based on the clone() and futex() system calls, which are implemented by the kernel. The clone() system call creates a new lightweight process that can share a memory space with its child. So, for example, the pthread_create() library call internally calls the clone() system call. The futex() system call implements a synchronization primitive that can be used for creating larger synchronization operations, like mutexes, semaphores, etc. So, for example, pthread_mutex_lock() will call futex() internally.
Using POSIX terminology, Linux implementation model is a "kernel-thread model", also known as 1:1 model, where 1 kernel lightweight process is used for 1 user visible thread.
Linux also had different threading implementations over time. More description about threading support in Linux in pthreads(7), clone(2) and futex(2) manual pages.
Other implementations may be completely different and still implement the POSIX threads API. FreeBSD used a M:N implementation, or "hybrid model", where an user visible thread could be managed either by the kernel, or by a user space library. See the kse(2) manual page for a complete description. I have once traced the current 1:1 FreeBSD threading implementation to an article in kerneltrap:
http://web.archive.org/web/20110512021159/http://kerneltrap.org/node/624
OpenSolaris doesn't seem to describe its own threading implementation on man pages. I could find instead a document comparing the OpenSolaris kernel with Linux 2.6 and Windows Vista. There, a main difference between the Linux and OpenSolaris implementation is that in Linux, each process is a lightweight process, unifying threads and processes, while in OpenSolaris one process contains multiple lightweight processes, in practice separating in the kernel the process from the kernel thread.
References:
http://en.wikipedia.org/wiki/Thread_(computing)
http://en.wikipedia.org/wiki/POSIX_Threads
http://repository.cmu.edu/cgi/viewcontent.cgi?article=2728&context=compsci
http://man7.org/linux/man-pages/man7/pthreads.7.html
http://man7.org/linux/man-pages/man2/clone.2.html
http://man7.org/linux/man-pages/man2/futex.2.html
http://pubs.opengroup.org/onlinepubs/7908799/xsh/threads.html
http://www.freebsd.org/cgi/man.cgi?query=kse&apropos=0&sektion=0&manpath=FreeBSD+7.2-RELEASE&format=html
http://web.archive.org/web/20110512021159/http://kerneltrap.org/node/624
http://www.unix.com/man-page/opensolaris/5/threads/
http://www.infoq.com/articles/kernel-comparison-unix-zhu

How safe is pthread robust mutex?

I m thinking to use Posix robust mutexes to protect shared resource among different processes (on Linux). However there are some doubts about safety in difference scenarios. I have the following questions:
Are robust mutexes implemented in the kernel or in user code?
If latter, what would happen if a process happens to crash while in a call to pthread_mutex_lock or pthread_mutex_unlock and while a shared pthread_mutex datastructure is getting updated?
I understand that if a process locked the mutex and dies, a thread in another process will be awaken and return EOWNERDEAD. However, what would happen if the process dies (in unlikely case) exactly when the pthread_mutex datastructure (in shared memory) is being updated? Will the mutex get corrupted in that case? What would happen to another process that is mapped to the same shared memory if it were to call a pthread_mutex function?
Can the mutex still be recovered in this case?
This question applies to any pthread object with PTHREAD_PROCESS_SHARED attribute. Is it safe to call functions like pthread_mutex_lock, pthread_mutex_unlock, pthread_cond_signal, etc. concurrently on the same object from different processes? Are they thread-safe across different processes?
From the man-page for pthreads:
Over time, two threading implementations have been provided by the
GNU C library on Linux:
LinuxThreads
This is the original Pthreads implementation. Since glibc
2.4, this implementation is no longer supported.
NPTL (Native POSIX Threads Library)
This is the modern Pthreads implementation. By comparison
with LinuxThreads, NPTL provides closer conformance to the
requirements of the POSIX.1 specification and better
performance when creating large numbers of threads. NPTL is
available since glibc 2.3.2, and requires features that are
present in the Linux 2.6 kernel.
Both of these are so-called 1:1 implementations, meaning that each
thread maps to a kernel scheduling entity. Both threading
implementations employ the Linux clone(2) system call. In NPTL,
thread synchronization primitives (mutexes, thread joining, and so
on) are implemented using the Linux futex(2) system call.
And from man futex(7):
In its bare form, a futex is an aligned integer which is touched only
by atomic assembler instructions. Processes can share this integer
using mmap(2), via shared memory segments or because they share
memory space, in which case the application is commonly called
multithreaded.
An additional remark found here:
(In case you’re wondering how they work in shared memory: Futexes are keyed upon their physical address)
Summarizing, Linux decided to implement pthreads on top of their "native" futex primitive, which indeed lives in the user process address space. For shared synchronization primitives, this would be shared memory and the other processes will still be able to see it, after one process dies.
What happens in case of process termination? Ingo Molnar wrote an article called Robust Futexes about just that. The relevant quote:
Robust Futexes
There is one race possible though: since adding to and removing from the
list is done after the futex is acquired by glibc, there is a few
instructions window for the thread (or process) to die there, leaving
the futex hung. To protect against this possibility, userspace (glibc)
also maintains a simple per-thread 'list_op_pending' field, to allow the
kernel to clean up if the thread dies after acquiring the lock, but just
before it could have added itself to the list. Glibc sets this
list_op_pending field before it tries to acquire the futex, and clears
it after the list-add (or list-remove) has finished
Summary
Where this leaves you for other platforms, is open-ended. Suffice it to say that the Linux implementation, at least, has taken great care to meet our common-sense expectation of robustness.
Seeing that other operating systems usually resort to Kernel-based synchronization primitives in the first place, it makes sense to me to assume their implementations would be even more naturally robust.
Following the documentation from here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html, it does read that in a fully POSIX compliant OS, shared mutex with the robust flag will behave in the way you'd expect.
The problem obviously is that not all OS are fully POSIX compliant. Not even those claiming to be. Process shared mutexes and in particular robust ones are among those finer points that are often not part of an OS's implementation of POSIX.

Kernel Level Thread Library

I have to implement kernel level thread but while searching on the net I found that there are three ways to create kernel level thread in linux:
NPTL
kthread
linuxThreads
It was written somewhere that linuxThreads are now abandoned. But I am unable to find current support of NPTL & kthread. Also I am unable to find any source that can simply explain me how to use their functionality.
Which is the currently supported and good library to use kernel level thread?
Also pls share any resource for installing these library and also using them?
You are confusing two very different definitions of "kernel thread".
LinuxThreads and NPTL are implementations of POSIX pthreads for user-space processes. They use a 1-to-1 mapping of kernel scheduling entities to user-space threads. They are sometimes described as kernel threads implementations only because they create threads that are scheduled by the kernel.
LinuxThreads is unsupported and entirely obsolete. NPTL is now part of glibc, so you already have it. There's nothing special to install. You use these the same way you use any POSIX threading library, with calls to functions like pthread_create.
Actual kernel threads run kernel code. None of those libraries are relevant since they're all user-space libraries. Have a look at functions like kthread_run. There's no magic, no secret. Write kernel code the way similar kernel code is written. (Knowledge and experience in writing kernel code is needed. It's, unfortunately, not simple.)
I assume that; if you really wanted to create a kernel thread, you would already know about these things.
I think, you want to create multi-threaded applications and trying to find info about user-level multi-threading functions.
And yes, these threads you created will be managed by the kernel itself. This is what you are looking for :: POSIX Threads

kernel thread native thread os thread

can any one please tell me. Are all term "kernel thread", "native thread" and "Os thread" represent kernel thread? Or they are different? If they are different what is relationship among all?
There's no real standard for that. Terminology varies depending on context. However I'll try to explain the different kind of threads that I know of (and add fibers just for completeness as I've seen people call them threads).
-- Threading within the kernel
These are most likely what your kernel thread term refers to. They only exist at the kernel level. They allow (a somewhat limited) parallel execution of the kernel code itself.
-- Application threading
These are what the term thread generally means. They are separate threads of parallel execution which may be scheduled on different processors, that share the same address space and are handled as a single process by the operating system.
The POSIX standard defines the properties threads should have in POSIX compliant systems (in fact the libraries and how each library entry is supposed to behave). Windows threading model is extremely similar to the POSIX one and, AFAIK, it's safe to talk of threading in general the way I did: parallel execution that happens within the same process and can be scheduled on different processors.
-- Ancient linux threading
In the early days the linux kernel did not support threading. However it did support creating two different processes that shared the same address space. There was a project (LinuxThreads) that tried to use this to implement some sort of threading abilities.
The problem was, of course, that the kernel would still treat them as separate processes. The result was therefore not POSIX compliant. For example the treatment of signals was problematic (as signals are a process level concept). It was IN THIS VERY SPECIFIC CONTEXT that the term "native" started to become common. It refers to "native" as in "kernel level" support for threading.
With help from the kernel actual support for POSIX compliant threading was finally implemented. Today that's the only kind of threading that really deserves the name. The old way is, in fact, not real threading at all. It's a sharing of the address space by multiple processes, and as such should be referred to. But there was a time when that was referred to as threading (as it was the only thing you could do with Linux).
-- User level and Green threading
This is another context where "native" is often used to contrast to another threading model. Green threads and userl level threads are threads that do happen within the same process, but they are totally handled at userlevel. Green threads are used in virtual machines (especially those that implement pcode execution, as is the case for the java virtual machine), and they are also implemented at library level by many languages (examples: Haskell, Racket, Smalltalk).
These threads do not need to rely on any threading facilities by the kernel (but often do rely on asynchronous I/O). As such they generally cannot schedule on separate processors. In these contexts "native thread" or "OS thread" could be used to refer to the actual kernel scheduled threads in contrast to the green/user level threads.
Note that "cannot be scheduled on separate processors" is only true if they are used alone. In an hybrid system that has both user level/green threads and native/os threads, it may be possible to create exactly one native/os thread for each processor (and on some systems to set the affinity mask so that each only runs on a specific processor) and then effectively assign the userlevel threads to these.
-- Fibers and cooperative multitasking
I have seen some people call these threads. It's improper, the correct name is fibers. They are also a model of parallel execution, but contrary to threads (and processes) they are cooperative. Which means that whenever a fiber is running, the other fibers will not run until the running fiber voluntarily "yields" execution accepting to be suspended and eventually resumed later.

Resources