Is there a way to use pthreads without a scheduler, so context switch occurs only if a thread explicitly yields, or is blocked on a mutex/cond? If not, is there a way to minimize the scheduling overhead, so that forced context switches will occur as rarely as possible?
The question refers to the Linux gcc/g++ implementation of POSIX threads.
You can use Pth (a.k.a. GNU Portable Threads), a non-preemptive thread library. Configuring it with --enable-pthread will create a plug-in replacement for pthreads. I just built and tested this on my Mac and it works fine for a simple pthreads program.
From the README:
Pth is a very portable POSIX/ANSI-C based library for Unix platforms
which provides non-preemptive priority-based scheduling for multiple
threads of execution (aka `multithreading') inside event-driven
applications. All threads run in the same address space of the server
application, but each thread has its own individual program-counter,
run-time stack, signal mask and errno variable.
The thread scheduling itself is done in a cooperative way, i.e., the
threads are managed by a priority- and event-based non-preemptive
scheduler. The intention is, that this way one can achieve better
portability and run-time performance than with preemptive scheduling.
The event facility allows threads to wait until various types of
events occur, including pending I/O on filedescriptors, asynchronous
signals, elapsed timers, pending I/O on message ports, thread and
process termination, and even customized callback functions.
Additionally Pth provides an optional emulation API for POSIX.1c
threads (`Pthreads') which can be used for backward compatibility to
existing multithreaded applications.
If you have a process running in normal user land, context switches will naturally happen as part of the system operation - there is always another process that needs the CPU time. Preemptive context switches between your threads are quite well optimized by the OS already and are bound to be necessary sometimes.
If you really happen to have problems with excessive context switching, you are best off tweaking the Linux scheduler first, which is off-topic here. pthread_setschedprio and pthread_setschedparam can set some hints, but are limited to setting priorities, and the interpretation of these priorities is implementation-defined, i.e. up to the Linux scheduler.
Related
Before the futex system calls existed in Linux, what underlying system calls were used by threading libraries like pthreads to block/sleep a thread and to subsequently wake those threads from userland?
For example, if a thread tries to acquire a mutex, the userland implementation will block the thread (perhaps after a short spinning interval), but I can't find the syscalls that are used for this (other than futex which are a relatively recent creation).
Before futex and current implementation of pthreads for Linux, the NPTL (require kernel 2.6 and newer), there were two other threading libraries with POSIX Thread API for Linux: linuxthreads and NGPT (which was based on Gnu Pth. LinuxThreads was the only widely used libpthread for years (and it can still be used in some strange & unmaintained micro-libc to work on 2.4; other micro-libc variants may have own builtin implementation of pthread-like API on top of futex+clone). And Gnu Pth is not thread library, it is single process thread with user-level "thread" switching.
You should know that there are several Threading Models when we check does the kernel knows about some or all of user threads (how many CPU cores can be used with adding threads to the program; what is the cost of having the thread / how many threads may be started). Models are named as M:N where M is userspace thread number and N is thread number schedulable by OS kernel:
"1:1" ''kernel-level threading'' - every userspace thread is schedulable by OS kernel. This is implemented in Linuxthreads, NPTL and many modern OS.
"N:1" ''user-level threading'' - userspace threads are planned by the userspace, they all are invisible to the kernel, it only schedules one process (and it may use only 1 CPU core). Gnu Pth (GNU Portable Threads) is example of it, and there are many other implementations for some computer architectures.
"M:N" ''hybrid threading'' - there are some entities visible and schedulable by OS kernel, but there may be more user-space threads in them. And sometimes user-space threads will migrate between kernel-visible threads.
With 1:1 model there are many classic sleep mechanisms/APIs in Unix like select/poll and signals and other variants of IPC APIs. As I remember, Linuxthreads used separate processes for every thread (with fully shared memory) and there was special manager "thread" (process) to emulate some POSIX thread features. Wikipedia says that SIGUSR1/SIGUSR2 were used in Linuxthreads for some internal communication between threads, same says IBM "The synchronization of primitives is achieved by means of signals. For example, threads block until awoken by signals.". Check also the project FAQ http://pauillac.inria.fr/~xleroy/linuxthreads/faq.html#H.4 "With LinuxThreads, I can no longer use the signals SIGUSR1 and SIGUSR2 in my programs! Why?"
LinuxThreads needs two signals for its internal operation. One is used to suspend and restart threads blocked on mutex, condition or semaphore operations. The other is used for thread cancellation.
On ``old'' kernels (2.0 and early 2.1 kernels), there are only 32 signals available and the kernel reserves all of them but two: SIGUSR1 and SIGUSR2. So, LinuxThreads has no choice but use those two signals.
With "N:1" model thread may call some blocking syscall and block everything (some libraries may convert some blocking syscalls into async, or use some SIGALRM or SIGVTALRM magic); or it may call some (very) special internal threading function which will do user-space thread switching by rewriting machine state register (like switch_to in linux kernel, save IP/SP and other regs, restore IP/SP and regs of other thread). So, kernel does not wake any user thread directly from userland, it just schedules whole process; and user space scheduler implement thread synchronization logic (or just calls sched_yield or select when there is no threads to work).
With M:N model things are very complicated... Don't know much about NGPT... There is one paragraph about NGPT in POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330 page 5
There is a new pthread library under development called NGPT. This library is based on the GNU Pth library, which is an M:1 library. NGPT extends Pth by using multiple Linux tasks, thus creating an M:N library. It attempts to preserve Pth’s pthread compatibility while also using multiple Linux tasks for concurrency, but this effort is hampered by the underlying differences in the Linux threading model. The NGPT library at present uses non-blocking wrappers around blocking system calls to avoid
blocking in the kernel.
Some papers and posts: POSIX Threads and the Linux Kernel, Dave McCracken, OLS2002,330, LWN post about NPTL 0.1
The futex system call is used extensively in all synchronization
primitives and other places which need some kind of
synchronization. The futex mechanism is generic enough to support
the standard POSIX synchronization mechanisms with very little
effort. ... Futexes also allow the implementation of inter-process
synchronization primitives, a sorely missed feature in the old
LinuxThreads implementation (Hi jbj!).
NPTL design pdf:
5.5 Synchronization Primitives
The implementation of the synchronization primitives such as mutexes, read-write
locks, conditional variables, semaphores, and barriers requires some form of kernel
support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application.
Fortunately some new functionality was added to the kernel to implement all kinds
of synchronization primitives: futexes [Futex]. The underlying principle is simple but
powerful enough to be adaptable to all kinds of uses. Callers can block in the kernel
and be woken either explicitly, as a result of an interrupt, or after a timeout.
Futex stands for "fast userspace mutex." It's simply an abstraction over mutexes which is considered faster and more convenient than traditional mutex mechanisms because it implements the wait system for you. Before and after futex(), threads were put to sleep and awoken via a change in their process state. The process states are:
Running state
Sleeping state
Un-interruptible sleeping state (i.e. blocking for a syscall like read() or write()
Defunct/zombie state
When a thread is suspended, it is put into (interruptible) 'sleep' state. Later, it can be woken via the wake_up() function, which operates on its task structure within the kernel. As far as I can tell, wake_up is a kernel function, not a syscall. The kernel doesn't need a syscall to wake or sleep a task; it (or a process) simply changes the task structure to reflect the state of the process. When the Linux scheduler next deals with that process, it treats it according to its state (again, the states are listed above).
Short story: futex() implements a wait system for you. Without it, you need a data structure that's accessible from the main thread and from the sleeping thread in order to wake up a sleeping thread. All of this is done with userland code. The only thing you might need from the kernel is a mutex--the specifics of which do include locking mechanisms and mutex datastructures, but don't inherently wake or sleep the thread. The syscalls you're looking for don't exist. Essentially, most of what you're talking about can be achieved from userspace, without a syscall, by manually keeping track of data conditions that determine whether and when to sleep or wake a thread.
I was looking at the differences between user-level threads and kernel-level threads, which I basically understood.
What's not clear to me is the point of implementing user-level threads at all.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
An answer to a previously asked question (similar to mine) was something like:
No modern operating system actually maps n user-level threads to 1
kernel-level thread.
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
Could you help me understand this, please?
I strongly recommend Modern Operating Systems 4th Edition by Andrew S. Tanenbaum (starring in shows such as the debate about Linux; also participating: Linus Torvalds). Costs a whole lot of bucks but it's definitely worth it if you really want to know stuff. For eager students and desperate enthusiasts it's great.
Your questions answered
[...] what's not clear to me is the point of implementing User-level threads
at all.
Read my post. It is comprehensive, I daresay.
If the kernel is unaware of the existence of multiple threads within a
single process, then which benefits could I experience?
Read the section "Disadvantages" below.
I have read a couple of articles that stated that user-level
implementation of threads is advisable only if such threads do not
perform blocking operations (which would cause the entire process to
block).
Read the subsection "No coordination with system calls" in "Disadvantages."
All citations are from the book I recommended in the top of this answer, Chapter 2.2.4, "Implementing Threads in User Space."
Advantages
Enables threads on systems without threads
The first advantage is that user-level threads are a way to work with threads on a system without threads.
The first, and most obvious, advantage is that
a user-level threads package can be implemented on an operating system that does not support threads. All operating systems used to
fall into this category, and even now some still do.
No kernel interaction required
A further benefit is the light overhead when switching threads, as opposed to switching to the kernel mode, doing stuff, switching back, etc. The lighter thread switching is described like this in the book:
When a thread does something that may cause it to become blocked
locally, for example, waiting for another thread in its process to
complete some work, it calls a run-time system procedure. This
procedure checks to see if the thread must be put into blocked state.
If, so it stores the thread’s registers (i.e., its own) [...] and
reloads the machine registers with the new thread’s saved values. As soon as the stack
pointer and program counter have been switched, the new thread comes
to life again automatically. If the machine happens to have an
instruction to store all the registers and another one to load them
all, the entire thread switch can be done in just a handful of in-
structions. Doing thread switching like this is at least an order of
magnitude—maybe more—faster than trapping to the kernel and is a
strong argument in favor of user-level threads packages.
This efficiency is also nice because it spares us from incredibly heavy context switches and all that stuff.
Individually adjusted scheduling algorithms
Also, hence there is no central scheduling algorithm, every process can have its own scheduling algorithm and is way more flexible in its variety of choices. In addition, the "private" scheduling algorithm is way more flexible concerning the information it gets from the threads. The number of information can be adjusted manually and per-process, so it's very finely-grained. This is because, again, there is no central scheduling algorithm needing to fit the needs of every process; it has to be very general and all and must deliver adequate performance in every case. User-level threads allow an extremely specialized scheduling algorithm.
This is only restricted by the disadvantage "No automatic switching to the scheduler."
They [user-level threads] allow each process to have its own
customized scheduling algorithm. For some applications, for example,
those with a garbage-collector thread, not having to worry about a
thread being stopped at an inconvenient moment is a plus. They also
scale better, since kernel threads invariably require some table space
and stack space in the kernel, which can be a problem if there are a
very large number of threads.
Disadvantages
No coordination with system calls
The user-level scheduling algorithm has no idea if some thread has called a blocking read system call. OTOH, a kernel-level scheduling algorithm would've known because it can be notified by the system call; both belong to the kernel code base.
Suppose that a thread reads from the keyboard before any keys have
been hit. Letting the thread actually make the system call is
unacceptable, since this will stop all the threads. One of the main
goals of having threads in the first place was to allow each one to
use blocking calls, but to prevent one blocked thread from affecting
the others. With blocking system calls, it is hard to see how this
goal can be achieved readily.
He goes on that system calls could be made non-blocking but that would be very inconvenient and compatibility to existing OSes would be drastically hurt.
Mr Tanenbaum also says that the library wrappers around the system calls (as found in glibc, for example) could be modified to predict when a system cal blocks using select but he utters that this is inelegant.
Building upon that, he says that threads do block often. Often blocking requires many system calls. And many system calls are bad. And without blocking, threads become less useful:
For applications that are essentially entirely CPU bound and rarely
block, what is the point of having threads at all? No one would
seriously propose computing the first n prime numbers or playing chess
using threads because there is nothing to be gained by doing it that
way.
Page faults block per-process if unaware of threads
The OS has no notion of threads. Therefore, if a page fault occurs, the whole process will be blocked, effectively blocking all user-level threads.
Somewhat analogous to the problem of blocking system calls is the
problem of page faults. [...] If the program calls or jumps to an
instruction that is not in memory, a page fault occurs and the
operating system will go and get the missing instruction (and its
neighbors) from disk. [...] The process is blocked while the necessary
instruction is being located and read in. If a thread causes a page
fault, the kernel, unaware of even the existence of threads, naturally
blocks the entire process until the disk I/O is complete, even though
other threads might be runnable.
I think this can be generalized to all interrupts.
No automatic switching to the scheduler
Since there is no per-process clock interrupt, a thread acquires the CPU forever unless some OS-dependent mechanism (such as a context switch) occurs or it voluntarily releases the CPU.
This prevents usual scheduling algorithms from working, including the Round-Robin algorithm.
[...] if a thread starts running, no other thread in that process
will ever run unless the first thread voluntarily gives up the CPU.
Within a single process, there are no clock interrupts, making it
impossible to schedule processes round-robin fashion (taking turns).
Unless a thread enters the run-time system of its own free will, the scheduler will never get a chance.
He says that a possible solution would be
[...] to have the run-time system request a clock signal (interrupt) once a
second to give it control, but this, too, is crude and messy to
program.
I would even go on further and say that such a "request" would require some system call to happen, whose drawback is already explained in "No coordination with system calls." If no system call then the program would need free access to the timer, which is a security hole and unacceptable in modern OSes.
What's not clear to me is the point of implementing user-level threads at all.
User-level threads largely came into the mainstream due to Ada and its requirement for threads (tasks in Ada terminology). At the time, there were few multiprocessor systems and most multiprocessors were of the master/slave variety. Kernel threads simply did not exist. User threads had to be created to implement languages like Ada.
If the kernel is unaware of the existence of multiple threads within a single process, then which benefits could I experience?
If you have kernel threads, threads multiple threads within a single process can run simultaneously. In user threads, the threads always execute interleaved.
Using threads can simplify some types of programming.
I have read a couple of articles that stated user-level implementation of threads is advisable only if such threads do not perform blocking operations (which would cause the entire process to block).
That is true on Unix and maybe not all unix implementations. User threads on many operating systems function perfectly fine with blocking I/O.
This being said, what's the difference between a sequential execution of all the threads and a "parallel" execution of them, considering they cannot take advantage of multiple processors and independent scheduling?
In user threads. there is never parallel execution. In kernel threads, the can be parallel execution IF there are multiple processors. On a single processor system, there is not much advantage to using kernel threads over single threads (contra: note the blocking I/O issue on Unix and user threads).
But for some reason, many people on the Internet state that user-level threads can never take advantage of multiple processors.
In user threads, the process manages its own "threads" by interleaving execution within itself. The process can only have a thread run in the processor that the process is running in.
If the operating system provides system services to schedule code to run on a different processor, user threads could run on multiple processors.
I conclude by saying that for practicable purposes there are no advantages to user threads over kernel threads. There are those that will assert that there are performance advantages, but for there to be such an advantage it would be system dependent.
My CS professor told to the class that the OS has no idea that an application has launched threads. Is this true?
It depends on the type of thread. Threads implemented purely at user-level would be unknown to the operating system. This can be done with signals and setjmp and longjmp (see www.gnu.org/s/pth/rse-pmt.ps for details). Alternatively, if you are talking about something such as Linux pthreads, which only implements of subset of the pthreads specification, specifically the part that involves create new threads of execution that the kernel are aware of and schedules then the kernel is aware.
If you want to see more detail about how the kernel is aware you can look at the the clone system call. This system call can be used to create a new thread of execution that shares the address space of the calling process.
Also in the case of user-space implemented threading you will not get true parallelism, in the sense that two threads will be executing at the exact same time on different cores/hardware threads, because the operating system, which does the scheduling, does not know about the multiple threads.
It depends upon the operating system. Older operating system had no threads. Programming libraries would implement threads (e.g., Ada tasks) with timers. The library included a thread scheduler.
It is increasingly common now for operating systems to schedule threads for execution. There, the OS is aware of threads.
can any one please tell me. Are all term "kernel thread", "native thread" and "Os thread" represent kernel thread? Or they are different? If they are different what is relationship among all?
There's no real standard for that. Terminology varies depending on context. However I'll try to explain the different kind of threads that I know of (and add fibers just for completeness as I've seen people call them threads).
-- Threading within the kernel
These are most likely what your kernel thread term refers to. They only exist at the kernel level. They allow (a somewhat limited) parallel execution of the kernel code itself.
-- Application threading
These are what the term thread generally means. They are separate threads of parallel execution which may be scheduled on different processors, that share the same address space and are handled as a single process by the operating system.
The POSIX standard defines the properties threads should have in POSIX compliant systems (in fact the libraries and how each library entry is supposed to behave). Windows threading model is extremely similar to the POSIX one and, AFAIK, it's safe to talk of threading in general the way I did: parallel execution that happens within the same process and can be scheduled on different processors.
-- Ancient linux threading
In the early days the linux kernel did not support threading. However it did support creating two different processes that shared the same address space. There was a project (LinuxThreads) that tried to use this to implement some sort of threading abilities.
The problem was, of course, that the kernel would still treat them as separate processes. The result was therefore not POSIX compliant. For example the treatment of signals was problematic (as signals are a process level concept). It was IN THIS VERY SPECIFIC CONTEXT that the term "native" started to become common. It refers to "native" as in "kernel level" support for threading.
With help from the kernel actual support for POSIX compliant threading was finally implemented. Today that's the only kind of threading that really deserves the name. The old way is, in fact, not real threading at all. It's a sharing of the address space by multiple processes, and as such should be referred to. But there was a time when that was referred to as threading (as it was the only thing you could do with Linux).
-- User level and Green threading
This is another context where "native" is often used to contrast to another threading model. Green threads and userl level threads are threads that do happen within the same process, but they are totally handled at userlevel. Green threads are used in virtual machines (especially those that implement pcode execution, as is the case for the java virtual machine), and they are also implemented at library level by many languages (examples: Haskell, Racket, Smalltalk).
These threads do not need to rely on any threading facilities by the kernel (but often do rely on asynchronous I/O). As such they generally cannot schedule on separate processors. In these contexts "native thread" or "OS thread" could be used to refer to the actual kernel scheduled threads in contrast to the green/user level threads.
Note that "cannot be scheduled on separate processors" is only true if they are used alone. In an hybrid system that has both user level/green threads and native/os threads, it may be possible to create exactly one native/os thread for each processor (and on some systems to set the affinity mask so that each only runs on a specific processor) and then effectively assign the userlevel threads to these.
-- Fibers and cooperative multitasking
I have seen some people call these threads. It's improper, the correct name is fibers. They are also a model of parallel execution, but contrary to threads (and processes) they are cooperative. Which means that whenever a fiber is running, the other fibers will not run until the running fiber voluntarily "yields" execution accepting to be suspended and eventually resumed later.
Is there an advantage of the operating system understanding the characteristics of how a thread may be used? For example, what if there were a way in Java when creating a new thread to indicate that it would be used for intensive CPU calculations vs will block for I/O. Wouldn't thread scheduling improve if this were a capability?
I'm not sure what you're actually expecting the OS to do with the information that a thread is I/O or compute. The things which actually make the most difference to how threads get scheduled (ie thread priority and thread CPU affinity) are already exposed by APIs (and support for NUMA aspects are starting to appear in mainstream OS APIs too).
If by a "compute thread" you mean it's something doing background processing and less important than a GUI thread (from the point of view of maintaining app responsiveness) probably the most useful thing you can do is lower the priority of the compute threads a little.
That's what OS processes do. The OS has sophisticated scheduling for the processes. The OS tracks I/O use and CPU use and dynamically adjusts priorities so that CPU-intensive processing doesn't interfere with I/O.
If you want those features, use a proper OS process.
Is that even necessary? Threads blocking on I/O will cause CPU-intensive threads to run. The operating system decides how to schedule threads. AFAIK there's no way to give any hints with Java.
Yes, it is very important to understand them specially if you are one of those architects who like opening lot of threads, specially on windows.
Jeff Richter over at Wintellect has a library called PowerThreading. It is very useful if you are developing applications on .NET, but since you are talking about JAVA, it is still better to understand OS threads, kernel models and how the interrupts work.