Is the operating system aware of application threads? - multithreading

My CS professor told to the class that the OS has no idea that an application has launched threads. Is this true?

It depends on the type of thread. Threads implemented purely at user-level would be unknown to the operating system. This can be done with signals and setjmp and longjmp (see www.gnu.org/s/pth/rse-pmt.ps for details). Alternatively, if you are talking about something such as Linux pthreads, which only implements of subset of the pthreads specification, specifically the part that involves create new threads of execution that the kernel are aware of and schedules then the kernel is aware.
If you want to see more detail about how the kernel is aware you can look at the the clone system call. This system call can be used to create a new thread of execution that shares the address space of the calling process.
Also in the case of user-space implemented threading you will not get true parallelism, in the sense that two threads will be executing at the exact same time on different cores/hardware threads, because the operating system, which does the scheduling, does not know about the multiple threads.

It depends upon the operating system. Older operating system had no threads. Programming libraries would implement threads (e.g., Ada tasks) with timers. The library included a thread scheduler.
It is increasingly common now for operating systems to schedule threads for execution. There, the OS is aware of threads.

Related

Do process and threads in the programming libraries or modules mean processes, kernel-level threads, or user-level threads?

I start to wonder about the difference between processes, kernel-level threads, and user-level threads.
Do process and threads in the Linux API mean processes, kernel-level threads, or user-level threads?
Same question for the standard modules in programming languages such as Python, Java and C#?
Thanks.
Linux processes and linux thread obviously will be "kernel level" because Linux is the kernel. But, you should be aware that the distinction between process and thread is not as sharp in Linux as in some other operating systems. Linux processes and threads are created by the clone system call (http://man7.org/linux/man-pages/man2/clone.2.html), and whether you call the result of clone a "process" or a "thread" depends on what options you give it.
As for language X or library Y, the question of whether threads are "user threads" (a.k.a., "green threads") or "kernel threads" (a.k.a., "native threads") will depend on what language/library you are talking about, and it may depend on what specific version and what specific implementation of the library or language you are talking about.
First let’s define the terms
User level thread:- a thread which is created and managed by some library outside of kernel. That is kernel is not directly aware of these threads.
Kernel level threads:- created and managed by kernel. For every kernel level thread kernel maintains some data structure to store associated information.
although these definitions are not universal most of the literature agrees upon them. ( Operating system concepts, modern operating systems etc)
When we talk about threads created by some library they are always user level threads. The point to understand is the mapping from user level threads to kernel level threads.
It’s up to the library (JVM, .NET) that what kind of mapping it uses. It can use one to one model where every user level thread will be mapped to its own kernel level thread. Or it can use many to one model where more than one user threads are mapped to same kernel thread.
As far as Linux is concerned it does not differentiate between process and thread but it provides you the ability to control the level of resource sharing between parent and child. You can do it by using clone system call. if you create a child process with maximum sharing than it’s effectively a thread. On the other hand if you pass flags to clone such that nothing is shared than its a process.
In summary threads in programming libraries are always user level threads. Because they are neither created nor managed (directly) by the kernel.

System threads vs not-system threads

I noted that very often it occurs expression: "system thread". What does it mean in the fact? In particular, I cannot imagine not-system threads. Just, the system must be aware of thread. The operating system ( a scheduler) switches a context so he must know it!
For example, on the fourth page it is written about system threads:
http://www.dabeaz.com/python/GIL.pdf
A system thread is something provided by the OS. The OS kernel is in charge of scheduling system threads. If your runtime provides something like threads and a scheduler, then you have non-system threads. These are often called green threads. Sometimes non-system threads are more efficient, or the system doesn't provide threads. For Python, examples of non-system threads would be provided by greenlet or eventlet.
Threads are a construct of the operating system, which is itself just a program, so one could implement a thread scheduler in another program on top of the OS if so they desire (usually they don't reinvent the wheel though). The pertinent components would likely include some interrupt mechanism, a memory manager (to virtualize memory allocation), and a priority queue of instruction pointers for each thread.
The concept of green threads, event loops, cooperative multitasking and coroutines is generally what is meant by non-system threads.
It essentially refers to ways of structuring programs so that instead of blocking a thread to do things like IO, we allow the thread to be used by another task.
When we park a native thread, the OS can schedule another thread to use that CPU. With cooperative multitasking approaches it is also possible to have the application choose which task to execute next.

Work stealing and Kernel-level thread

Work stealing is a common strategy for User-level Thread. Each process has a work queue for taking work, and will steal from others' queue when they are out of work to do.
Is there any kernel that implements such strategy for Kernel-level thread ? If not, what is the reason ?
I believe in Linux there is a notion of thread-migration in kernel-level thread, which migrates thread from high-load processor to low-load processor but that seems like a different algorithm. But correct me if I'm wrong.
Thanks
The work stealing scheduler is a parallel computation scheduler. It is usually on user level libraries (like Intel tbb: https://www.threadingbuildingblocks.org/) or even languages like Cilk (https://software.intel.com/en-us/intel-cilk-plus)
Kernel-level threads are scheduled by the operating system, and as so the scheduling techniques are quite different. For instance, in work-stealing scheduler, one of the objectives is to limit memory usage (as proven in the original paper: http://supertech.csail.mit.edu/papers/steal.pdf) and to achieve that the threads are stored in a deque. However, in operating system's schedulers the main objective is to be fair between the users, give each process/kernel thread a fair amount of time to run (as max-min fairness states: http://en.wikipedia.org/wiki/Max-min_fairness), etc. Operating System's schedulers even use different priorities among kernel threads/processes (please see http://en.wikipedia.org/wiki/Completely_Fair_Scheduler or http://en.wikipedia.org/wiki/Multilevel_feedback_queue). For that reason, work stealing implementations are made in user-level, since their objective is to schedule user-level threads inside a process and not kernel-threads.

kernel thread native thread os thread

can any one please tell me. Are all term "kernel thread", "native thread" and "Os thread" represent kernel thread? Or they are different? If they are different what is relationship among all?
There's no real standard for that. Terminology varies depending on context. However I'll try to explain the different kind of threads that I know of (and add fibers just for completeness as I've seen people call them threads).
-- Threading within the kernel
These are most likely what your kernel thread term refers to. They only exist at the kernel level. They allow (a somewhat limited) parallel execution of the kernel code itself.
-- Application threading
These are what the term thread generally means. They are separate threads of parallel execution which may be scheduled on different processors, that share the same address space and are handled as a single process by the operating system.
The POSIX standard defines the properties threads should have in POSIX compliant systems (in fact the libraries and how each library entry is supposed to behave). Windows threading model is extremely similar to the POSIX one and, AFAIK, it's safe to talk of threading in general the way I did: parallel execution that happens within the same process and can be scheduled on different processors.
-- Ancient linux threading
In the early days the linux kernel did not support threading. However it did support creating two different processes that shared the same address space. There was a project (LinuxThreads) that tried to use this to implement some sort of threading abilities.
The problem was, of course, that the kernel would still treat them as separate processes. The result was therefore not POSIX compliant. For example the treatment of signals was problematic (as signals are a process level concept). It was IN THIS VERY SPECIFIC CONTEXT that the term "native" started to become common. It refers to "native" as in "kernel level" support for threading.
With help from the kernel actual support for POSIX compliant threading was finally implemented. Today that's the only kind of threading that really deserves the name. The old way is, in fact, not real threading at all. It's a sharing of the address space by multiple processes, and as such should be referred to. But there was a time when that was referred to as threading (as it was the only thing you could do with Linux).
-- User level and Green threading
This is another context where "native" is often used to contrast to another threading model. Green threads and userl level threads are threads that do happen within the same process, but they are totally handled at userlevel. Green threads are used in virtual machines (especially those that implement pcode execution, as is the case for the java virtual machine), and they are also implemented at library level by many languages (examples: Haskell, Racket, Smalltalk).
These threads do not need to rely on any threading facilities by the kernel (but often do rely on asynchronous I/O). As such they generally cannot schedule on separate processors. In these contexts "native thread" or "OS thread" could be used to refer to the actual kernel scheduled threads in contrast to the green/user level threads.
Note that "cannot be scheduled on separate processors" is only true if they are used alone. In an hybrid system that has both user level/green threads and native/os threads, it may be possible to create exactly one native/os thread for each processor (and on some systems to set the affinity mask so that each only runs on a specific processor) and then effectively assign the userlevel threads to these.
-- Fibers and cooperative multitasking
I have seen some people call these threads. It's improper, the correct name is fibers. They are also a model of parallel execution, but contrary to threads (and processes) they are cooperative. Which means that whenever a fiber is running, the other fibers will not run until the running fiber voluntarily "yields" execution accepting to be suspended and eventually resumed later.

Is Pthread library actually a user thread solution?

The title might not be clear enough because I don't know how to define my questions actually.
I understand Pthread is a thread library meeting POSIX standard (about POSIX, see wikipedia: http://en.wikipedia.org/wiki/Posix). It is available in Unix-like OS.
About thread, I read that there are three different models:
User level thread: the kernel does not know it. User himself creates/implements/destroy threads.
Kernel level thread: kernel directly supports multiple threads of control in a process.
Light weight process(LWP): scheduled by kernel but can be bounded with user threads.
Did you see my confusion? When I call pthread_create() to create a thread, did I create a user level thread? I guess so. So can I say, Pthread offers a user level solution for threads? It can not manipulate kernel/LWP?
#paulsm4 I am doubtful about your comment that kernel knows every thing. In this particular context of user level threads, the kernel is unaware of the fact that such a thing is happening. A user level thread's scheduling is maintained by the user himself (via the interface provided by a library) and the kernel ends up allotting just a single kernel thread to the whole process. Kernel would treat the process as a single threaded and any blocking call by one of the threads would end up blocking all the threads of that process.
Refer to http://www.personal.kent.edu/~rmuhamma/OpSystems/Myos/threads.htm
In Linux, pthread is implemented as a lightweight process. Kernel (v2.6+) is actually implemented with NPTL. Let me quote the wiki content:
NPTL is a so-called 1×1 threads library, in that threads created by the user (via the pthread_create() library function) are in 1-1 correspondence with schedulable entities in the kernel (tasks, in the Linux case). This is the simplest possible threading implementation.
So pthread in linux kernel is actually implemented as kernel thread.
pthreads, per se, isn't really a threading library. pthreads is the interface which a specific threading library implements, using the concurrency resources available on that platform. So there's a pthreads implementation on linux, on bsd, on solaris, etc., and while the interface (the header files and the meaning of the calls) is the same, the implementation of each is different.
So what pthread_create actually does, in terms of kernel thread objects, varies between OSes and pthread library implementations. At a first approximation, you don't need to know (that's stuff that the pthread abstraction allows you to not need to know about). Eventually you might need to see "behind the curtain", but for most pthread users that's not necessary.
If you want to know what a /specific/ pthread implementation does, on a specific OS, you'll need to clarify your question. What Solaris and Linux do, for example, is very different.
Q: I understand Pthread is a thread library meeting POSIX standard
A: Yes. Actually, "Pthreads" stands for "Posix threads":
http://en.wikipedia.org/wiki/Pthreads
Q: It is available in Unix-like OS.
A: Actually, it's available for many different OSs ... including Windows, MacOS ... and, of course, Linux, BSD and Solaris.
Q: About thread, I read that there are three different models
Now you're getting fuzzy. "Threads" is a very generic term. There are many, many different models. And many, many different ways you can characterize and/or implement "threads". Including stuff like the Java threading model, or the Ada threading model.
Q: When I call pthread_create() to create a thread, did I create a
user level thread?
A: Yes: Just about everything you do in user space is "protected" in your own, private "user space".
Q: User level thread: the kernel does not know it.
A: No. The kernel knows everything :)
Q: Kernel level thread: kernel directly supports multiple threads of
control in a process.
A: Yes, there is such a thing as "kernel threads".
And, as it happens, Linux makes EXTENSIVE use of kernel threads. For example, every single process in a Linux system is a "kernel thread". And every user-created pthread is ALSO implemented as a new "kernel thread". As are "worker threads" (which are completely invisible to any user-level process).
But this is an advanced topic you do NOT need to understand in order to effectively use pthreads. Here's a great book that discussed this - and many other topics - in detail:
Linux Kernel Development, Robert Love
Remember: "Pthreads" is an interface. How it's implemented depends on the platform. Linux uses kernel threads; Windows uses Win32 threads, etc.
===========================================================================
ADDENDUM:
Since people still seem to be hitting this old thread, I thought it would be useful to reference this post:
https://stackoverflow.com/a/11255174/421195
Linux typically uses two implementations of pthreads:
LinuxThreads and Native
POSIX Thread Library(NPTL),
although the former is largely obsolete. Kernel from 2.6 provides
NPTL, which provides much closer conformance to SUSv3, and perform
better especially when there are many threads.
You can query the
specific implementation of pthreads under shell using command:
getconf GNU_LIBPTHREAD_VERSION
You can also get a more detailed implementation difference in The
Linux Programming Interface.
"Pthreads" is a library, based on the Posix standard. How a pthreads library is implemented will differ from platform to platform and library to library.
I find previous answers not as satisfying or clear as I would have liked so here goes:
When you call
pthread_create(...)
you always create a new user-level thread. And assuming that there is OS, there is always one or more kernel thread...but let's dive deeper:
According to "Operating system concepts" 10th edition,the actual classification we should be looking at (when it comes to thread libraries) is how the user level threads are mapped onto kernel threads (and that's what the question really meant).
The models are one to one (each user-level thread within a single process is mapped to a different kernel thread),many to one (the thread library is "user level" so all of the different threads within a single process are mapped to a single kernel thread,and the threads data structures, context switch etc are dealt with at user level and not by the OS [meaning that if a thread blocks on some I/O call, the entire process might potentially block]), and many to many (something in between,obviously the number of user-level threads is greater or equal to the number of kernel threads it is being mapped onto).
Now,pthreads is a specification and not an implementation, and the implementation does depend on the OS to which it is written. It could be any one of those models (notice that "many to many" is very flexible).
So,as an example,on Linux and Windows (the most popular OSs for years now,where the model is "one to one") the implementation is "one to one".
Pthreads is just a standardized interface for threading libraries. Whether an OS thread or a lightweight thread is created depends on the library you use. Nevertheless, my first guest would be that your threads are “real” OS-level threads.

Resources