How does a process schedule its own threads - multithreading

After the Kernel schedules a process that has threads, How does said process schedule its own threads during its time splice?

For most modern kernels, the kernel only schedules threads, and processes are mostly just a container for the threads to execute inside (e.g. a container that contains a virtual address space, however many threads, and a few other scraps like file handles).
For some kernels (mostly very old unix kernels that existed before threads were invented) the kernel schedules processes and then a user-space library emulates threads. For this to work properly all of the "blocking" system calls (e.g. write()) have to be replaced by asynchronous system calls (e.g. aio_write()) so that other threads in the process can be given CPU time; however I wouldn't want to assume it works properly (e.g. if any thread blocks, then maybe all threads in the process block).
Also it may not work when there's multiple CPUs (kernel gives a process one CPU, but then from the kernel's perspective that process is running and can't use a second CPU). There are sophisticated work-arounds for this (to support "M:N threading") but it's just easier and better to fix the scheduler so it works with threads. Fortunately/unfortunately this didn't matter much in the early days because very few computers had more than one CPU anyway.
Lastly; it doesn't work for thread priorities - e.g. one process might keep CPU busy executing an unimportant/low priority thread while another process doesn't get that CPU time when it desperately needs it for an important/high priority thread. This occurs because no process knows about threads belonging to other processes and the kernel only knows about processes and not threads.
Of course these are also the reasons why every kernel adopted "kernel schedules threads and not processes" (and those that didn't died).

It's down to jargon definitions, but threads are simply a bunch of processes sharing an address space. Older Unixes even called them Light Weight Processes.
With that classical understanding of threads, the answer is that, these days, it's the OS that does the scheduling and each thread gets its own timeslices.
Some OSes do things to "the whole process" - e.g. Windows will give the process that has mouse focus a priority boost (all it's threads get dynamically notched up a few priority places), to make that application appear to be more sprightly (this goes back to Windows 3).
Other operating systems will increase the priority of a thread dynamically, to solve priority inversion situations. This is where a low priority thread that has control of a resource (I/O, or perhaps a semaphore) is blocking a higher priority thread from running (because the resource is not available. This is the priority inversion, and it's solved by the OS boosting the priority of the blocking thread until it gives up the required resource.

Either the kernel schedules the threads or the kernel schedules processes simulates thread by scheduling it own threads.
Usually, the process schedules its own threads using a library that sets timers. When the timer handler saves the current "thread's" registers then loads a new set of registers from another "thread."


At what points in a program the system switch threads

I know that threads cannot actually run in parallel on the same core, but in a regular desktop system there is normally hundreds or even thousands of threads. Which is of course much more than today's average of 4 core CPU's. So the system actually running some thread for X time and then switches to run another thread for Y amount of time an so on.
My question is, how does the system decide how much time to execute each thread?
I know that when a program is calling sleep() on a thread for an amount of time, the operation system can use this time to execute other threads, but what happens when a program does not call sleep at all?
int main(int argc, char const *argv[])
return 0;
When does the operating system decide to suspend this thread and excutre another?
The OS keeps a container of all those threads that can use CPU execution, (usually such threads are described as being'ready'). On most desktop systems, this is a very small fraction of the total number of threads. Most threads in such systems are waiting on either I/O, (this includes sleeping - waiting on timer I/O), or inter-thread signaling; such threads cannot use CPU execution and so the OS does not dispatch them onto cores.
A software syscall, (eg. a request to open a file, a request to sleep or wait for a signal from another thread), or a hardware interrupt from a peripheral device, (eg. a disk controller, NIC, KB, mouse), may cause the set of ready threads to change and so initiate a scheduling run.
When run, the shceduler decides on what set of ready threads to assign to the available cores. The algorithm it uses is a compromise that tries to optimize overall performance by balancing the need for expensive context-switches with the need for responsive I/O. The kernel CAN stop any thread on any core an preempt it, but it would surely prefer not to:)
My question is, how does the system decide how much time to execute
each thread?
Essentially, it does not. If the set of ready threads is not greater than the number of cores, there is no need to stop/control/influence a CPU-intensive loop - it can be allowed to run on forever, taking up a whole core.
Note that your example is very poor - the printf() call will request output from the OS and, if not immediately available, the OS will block your seemingly 'CPU only' thread until it is.
but what happens when a program does not call sleep at all?
It's just one more thread. If it is purely CPU-intensive, then whether it runs continually depends upon the loading on the box and the number of cores available, as already described. It can, of course, get blocked by requesting I/O or electing to wait for a signal from another thread, so removing itself from the set of ready threads.
Note that one I/O device is a hardware timer. This is very useful for timing out system calls and providing Sleep() functionality. It usually does have a side-effect on those boxes where the number of ready threads is larger than the number of cores available to run them, (ie. the box is overloaded or the task/s it runs have no limits on CPU use). It can result in sharing out the available cores around the ready threads, so giving the illusion of running more threads than it's actually physically capable of, (try not to get hung up on Sleep() and the timer interrupt - it's one of many interrupts that can change thread state).
It is this behaviour of the timer hardware, interrupt and driver that gives rise to the apalling 'quantum', 'time-sharing', 'round-robin' etc. etc.etc. confusion and FUD that surrounds the operation of modern preemptive kernels.
A preemptive kernel, and it's drivers etc, is a state-machine. Syscalls from running threads and hardware interrupts from peripheral devices go in, a set of running threads comes out.
It depends which type of scheduling your OS is using for example lets take
Round Robbin:
In order to schedule processes fairly, a round-robin scheduler generally employs time-sharing, giving each job a time slot or quantum(its allowance of CPU time), and interrupting the job if it is not completed by then. The job is resumed next time a time slot is assigned to that process. If the process terminates or changes its state to waiting during its attributed time quantum, the scheduler selects the first process in the ready queue to execute.
There are others scheduling algorithms as well you will find this link useful:
The operating system has a component called the scheduler that decides which thread should run and for how long. There are essentially two basic kinds of schedulers: cooperative and preemptive. Cooperative scheduling requires that the threads cooperate and regularly hand control back to the operating system, for example by doing some kind of IO. Most modern operating systems use preemptive scheduling.
In preemptive scheduling the operating system gives a time slice for the thread to run. The OS does this by setting a handler for a CPU timer: the CPU regularly runs a piece of code (the scheduler) that checks if the current thread's time slice is over, and possibly decides to give the next time slice to a thread that is waiting to run. The size of the time slice and how to choose the next thread depends on the operating system and the scheduling algorithm you use. When the OS switches to a new thread it saves the state of the CPU (register contents, program counter etc) for the current thread into main memory, and restores the state of the new thread - this is called a context switch.
If you want to know more, the Wikipedia article on Scheduling has lots of information and pointers to related topics.

Process with multiple threads on multiprocessor system. How do they work?

So I was reading about Processes and Threads and I had a question. Following is the scenario.
Uniprocessor Environment
I understand that the OS rotates the processes over processor for a particular time period.(quantum) . Now I get it when the process is single threaded, ie just one path of execution. In that case, whenever it is assigned the processor, it continues with it's execution. Let's say the process forks and or just creates a new thread. Now how does the entire process works? Is it that the OS will say to process P "Go on, continue with execution" and the Process within itself will pick the new thread or the parent thread on rotation? So that if there are more than two threads, the rotation seems fair to each thread. Or does the OS actually interacts with the threads? (In that case I am not sure what happens).
Multiprocessor Environment
Now say I have a multiprocessor environment. Now in this case, if there was just uni-threaded process, then OS will assign either of the processors to it and on it will go with it's execution. Now say, there are multiple threads in the Process. Now if I assign one of the processor to the process, and ask it to continue it's execution, and the Process has to pick either of the thread for it's execution, then there never will be parallel processing going on in that specific process. Since the process will have to put either of it's threads on the processor.
So how does it happen in both the cases?
Process Scheduing
Operating Systems ultimately control these types of thread scheduling.
Windows systems are priority-based and so will allow a process to consume more resources that others. This is why your machine can 'hang', if a process has been escalated to a high priority. Priorities are ranged between 1-31 as far as I know.
Mac OS / Linux / Unix are time-based, allowing all processes to have equal amounts of CPU time. Therefore loading more processes will slow your system down as they all share a smaller slice of execution time.
Uniprocessor Environment
The OS is ultimately responsible for this but switching processes involves (I cannot guarantee accuracy here, but its just an indication):
Halting a process / thread
Storing the current stack (code location)
Storing the current registers of the CPU
Asking the kernel for the next process/thread to run
Kernel indicates which one has to be run
OS reloads the registers from the cache
OS reloads the current stack for the next application.
Resumes the process
Obviously the more threads and processes you have running, the slower it will become. The problem is that the time taken to switch processes can actually take longer than the time allowed to execute the process.
Threads are just child processes of a single process. For a single processor, it just looks like additional work.
Multi-processor Environment
Multi-processor environments work differently as the cache is shared amongst processors. I believe these are called L1 (Level) and L2 caches. So the difference is that processor A can reload the state stored by processor B without conflicts. 'Hyper-threading' also has the same approach, although this is processor specific. The difference here is that a processor could solely control a specific process - this is called 'CPU Affinity' Its not encouraged for every process, but it does allow an application to have a dedicated processor to work off.
This is OS-specific, of course, but most operating systems schedule at the thread level. A process is just a grouping of threads. For example, on Linux, threads are called "tasks" and each is scheduled independently. They are created with the clone call. What is typically called a thread is a task which shares its address space (and other resources such as file descriptors, mount points, etc.) with the creating task. Note that the clone call can also create what is typically called a process if the flags to enable sharing are not passed.
Considering the above, any thread may be scheduled at any time on any processor, no matter how many processors there are available. That said, most OSs also attempt to maintain some measure of processor affinity to avoid excessive cache misses, but usually if a thread is runnable and a different CPU is available, it will change CPUs. Often there is also a way to specify which CPUs a particular thread may execute upon.
Doesn't matter whether there is 1 or 128 processors. The OS manages access to resources to try an efficiently match up requests with availabilty, and that includes CPU execution. If a thread is running, it has already managed to get some CPU but, if it requests a resource that is not immediately available, it no longer needs any CPU until that other resource does become free, and so the OS will remove CPU execution from it and, if there is another thread that is waiting for CPU, it will hand it over. When the requested reource does become available, the thread will be made ready again. If there is a core free, it will be made running 'immediately', if not, the CPU scheduling algorithm makes a decision on whether to stop a currently-running thread to free up a core or to leave the newly-ready thrad waiting.
It's better to try and ignore things like 'time-slice, quantum, priority' - it causes much confusion and FUD. If a running thread wants something it cannot have yet, it doesn't need any more CPU cycles, and the OS will take them away and, if another thread needs it, apply them there. That is why preemptive multitaskers exist - to match up threads with resources in an attempt to maximize forward progress.

user threads v.s. kernel threads

Could someone help clarify my understanding of kernel threads. I heard that, on Linux/Unix, kernel threads(such as those of system calls) get executed faster than user threads. But, aren't those user threads scheduled by kernel and executed using kernel threads? could someone please tell me what is the difference between a kernel thread and a user thread other than the fact that they have access to different address spaces. what are other difference between them? Is it true that on a single processor box, when user thread is running, kernel will be suspended?
Thanks in advance,
I heard that, on Linux/Unix, kernel threads(such as those of system calls) get executed faster than user threads.
This is a largely inaccurate statement.
Kernel threads are used for "background" tasks internal to the kernel, such as handling interrupts and flushing data to disk. The bulk of system calls are processed by the kernel within the context of the process that called them.
Kernel threads are scheduled more or less the same way as user processes. Some kernel threads have higher than default priority (up to realtime priority in some cases), but saying that they are "executed faster" is misleading.
Is it true that on a single processor box, when user thread is running, kernel will be suspended?
Of course. Only one process can be running at a time on a single CPU core.
That being said, there are a number of situations where the kernel can interrupt a running task and switch to another one (which may be a kernel thread):
When the timer interrupt fires. By default, this occurs 100 times every second.
When the task makes a blocking system call (such as select() or read()).
When a CPU exception occurs in the task (e.g, a memory access fault).

Difference between user-level and kernel-supported threads?

I've been looking through a few notes based on this topic, and although I have an understanding of threads in general, I'm not really to sure about the differences between user-level and kernel-level threads.
I know that processes are basically made up of multiple threads or a single thread, but are these thread of the two prior mentioned types?
From what I understand, kernel-supported threads have access to the kernel for system calls and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer when then utilise kernel-supported threads to perform operations that couldn't be normally performed due to its state?
Edit: The question was a little confusing, so I'm answering it two different ways.
OS-level threads vs Green Threads
For clarity, I usually say "OS-level threads" or "native threads" instead of "Kernel-level threads" (which I confused with "kernel threads" in my original answer below.) OS-level threads are created and managed by the OS. Most languages have support for them. (C, recent Java, etc) They are extremely hard to use because you are 100% responsible for preventing problems. In some languages, even the native data structures (such as Hashes or Dictionaries) will break without extra locking code.
The opposite of an OS-thread is a green thread that is managed by your language. These threads are given various names depending on the language (coroutines in C, goroutines in Go, fibers in Ruby, etc). These threads only exist inside your language and not in your OS. Because the language chooses context switches (i.e. at the end of a statement), it prevents TONS of subtle race conditions (such as seeing a partially-copied structure, or needing to lock most data structures). The programmer sees "blocking" calls (i.e. data = ), but the language translates it into async calls to the OS. The language then allows other green threads to run while waiting for the result.
Green threads are much simpler for the programmer, but their performance varies: If you have a LOT of threads, green threads can be better for both CPU and RAM. On the other hand, most green thread languages can't take advantage of multiple cores. (You can't even buy a single-core computer or phone anymore!). And a bad library can halt the entire language by doing a blocking OS call.
The best of both worlds is to have one OS thread per CPU, and many green threads that are magically moved around onto OS threads. Languages like Go and Erlang can do this.
system calls and other uses not available to user-level threads
This is only half true. Yes, you can easily cause problems if you call the OS yourself (i.e. do something that's blocking.) But the language usually has replacements, so you don't even notice. These replacements do call the kernel, just slightly differently than you think.
Kernel threads vs User Threads
Edit: This is my original answer, but it is about User space threads vs Kernel-only threads, which (in hindsight) probably wasn't the question.
User threads and Kernel threads are exactly the same. (You can see by looking in /proc/ and see that the kernel threads are there too.)
A User thread is one that executes user-space code. But it can call into kernel space at any time. It's still considered a "User" thread, even though it's executing kernel code at elevated security levels.
A Kernel thread is one that only runs kernel code and isn't associated with a user-space process. These are like "UNIX daemons", except they are kernel-only daemons. So you could say that the kernel is a multi-threaded program. For example, there is a kernel thread for swap. This forces all swap issues to get "serialized" into a single stream.
If a user thread needs something, it will call into the kernel, which marks that thread as sleeping. Later, the swap thread finds the data, so it marks the user thread as runnable. Later still, the "user thread" returns from the kernel back to userland as if nothing happened.
In fact, all threads start off in kernel space, because the clone() operation happens in kernel space. (And there's lots of kernel accounting to do before you can 'return' to a new process in user space.)
Before we go into comparison, let us first understand what a thread is. Threads are lightweight processes within the domain of independent processes. They are required because processes are heavy, consume a lot of resources and more importantly,
two separate processes cannot share a memory space.
Let's say you open a text editor. It's an independent process executing in the memory with a separate addressable location. You'll need many resources within this process, such as insert graphics, spell-checks etc. It's not feasible to create separate processes for each of these functionalities and maintain them independently in memory. To avoid this,
multiple threads can be created within a single process, which can
share a common memory space, existing independently within a process.
Now, coming back to your questions, one at a time.
I'm not really to sure about the differences between user-level and kernel-level threads.
Threads are broadly classified as user level threads and kernel level threads based on their domain of execution. There are also cases when one or many user thread maps to one or many kernel threads.
- User Level Threads
User level threads are mostly at the application level where an application creates these threads to sustain its execution in the main memory. Unless required, these thread work in isolation with kernel threads.
These are easier to create since they do not have to refer many registers and context switching is much faster than a kernel level thread.
User level thread, mostly can cause changes at the application level and the kernel level thread continues to execute at its own pace.
- Kernel Level Threads
These threads are mostly independent of the ongoing processes and are executed by the operating system.
These threads are required by the Operating System for tasks like memory management, process management etc.
Since these threads maintain, execute and report the processes required by the operating system; kernel level threads are more expensive to create and manage and context switching of these threads are slow.
Most of the kernel level threads can not be preempted by the user level threads.
MS DOS written for Intel 8088 didn't have dual mode of operation. Thus, a user level process had the ability to corrupt the entire operating system.
- User Level Threads mapped over Kernel Threads
This is perhaps the most interesting part. Many user level threads map over to kernel level thread, which in-turn communicate with the kernel.
Some of the prominent mappings are:
One to One
When one user level thread maps to only one kernel thread.
advantages: each user thread maps to one kernel thread. Even if one of the user thread issues a blocking system call, the other processes remain unaffected.
disadvantages: every user thread requires one kernel thread to interact and kernel threads are expensive to create and manage.
Many to One
When many user threads map to one kernel thread.
advantages: multiple kernel threads are not required since similar user threads can be mapped to one kernel thread.
disadvantage: even if one of the user thread issues a blocking system call, all the other user threads mapped to that kernel thread are blocked.
Also, a good level of concurrency cannot be achieved since the kernel will process only one kernel thread at a time.
Many to Many
When many user threads map to equal or lesser number of kernel threads. The programmer decides how many user threads will map to how many kernel threads. Some of the user threads might map to just one kernel thread.
advantages: a great level of concurrency is achieved. Programmer can decide some potentially dangerous threads which might issue a blocking system call and place them with the one-to-one mapping.
disadvantage: the number of kernel threads, if not decided cautiously can slow down the system.
The other part of your question:
kernel-supported threads have access to the kernel for system calls
and other uses not available to user-level threads.
So, are user-level threads simply threads created by the programmer
when then utilise kernel-supported threads to perform operations that
couldn't be normally performed due to its state?
Partially correct. Almost all the kernel thread have access to system calls and other critical interrupts since kernel threads are responsible for executing the processes of the OS. User thread will not have access to some of these critical features. e.g. a text editor can never shoot a thread which has the ability to change the physical address of the process. But if needed, a user thread can map to kernel thread and issue some of the system calls which it couldn't do as an independent entity. The kernel thread would then map this system call to the kernel and would execute actions, if deemed fit.
Quote from here :
Kernel-Level Threads
To make concurrency cheaper, the execution aspect of process is separated out into threads. As such, the OS now manages threads and processes. All thread operations are implemented in the kernel and the OS schedules all threads in the system. OS managed threads are called kernel-level threads or light weight processes.
NT: Threads
Solaris: Lightweight processes(LWP).
In this method, the kernel knows about and manages the threads. No runtime system is needed in this case. Instead of thread table in each process, the kernel has a thread table that keeps track of all threads in the system. In addition, the kernel also maintains the traditional process table to keep track of processes. Operating Systems kernel provides system call to create and manage threads.
Because kernel has full knowledge of all threads, Scheduler may decide to give more time to a process having large number of threads than process having small number of threads.
Kernel-level threads are especially good for applications that frequently block.
The kernel-level threads are slow and inefficient. For instance, threads operations are hundreds of times slower than that of user-level threads.
Since kernel must manage and schedule threads as well as processes. It require a full thread control block (TCB) for each thread to maintain information about threads. As a result there is significant overhead and increased in kernel complexity.
User-Level Threads
Kernel-Level threads make concurrency much cheaper than process because, much less state to allocate and initialize. However, for fine-grained concurrency, kernel-level threads still suffer from too much overhead. Thread operations still require system calls. Ideally, we require thread operations to be as fast as a procedure call. Kernel-Level threads have to be general to support the needs of all programmers, languages, runtimes, etc. For such fine grained concurrency we need still "cheaper" threads.
To make threads cheap and fast, they need to be implemented at user level. User-Level threads are managed entirely by the run-time system (user-level library).The kernel knows nothing about user-level threads and manages them as if they were single-threaded processes.User-Level threads are small and fast, each thread is represented by a PC,register,stack, and small thread control block. Creating a new thread, switiching between threads, and synchronizing threads are done via procedure call. i.e no kernel involvement. User-Level threads are hundred times faster than Kernel-Level threads.
The most obvious advantage of this technique is that a user-level threads package can be implemented on an Operating System that does not support threads.
User-level threads does not require modification to operating systems.
Simple Representation: Each thread is represented simply by a PC, registers, stack and a small control block, all stored in the user process address space.
Simple Management: This simply means that creating a thread, switching between threads and synchronization between threads can all be done without intervention of the kernel.
Fast and Efficient: Thread switching is not much more expensive than a procedure call.
User-Level threads are not a perfect solution as with everything else, they are a trade off. Since, User-Level threads are invisible to the OS they are not well integrated with the OS. As a result, Os can make poor decisions like scheduling a process with idle threads, blocking a process whose thread initiated an I/O even though the process has other threads that can run and unscheduling a process with a thread holding a lock. Solving this requires communication between between kernel and user-level thread manager.
There is a lack of coordination between threads and operating system kernel. Therefore, process as whole gets one time slice irrespect of whether process has one thread or 1000 threads within. It is up to each thread to relinquish control to other threads.
User-level threads requires non-blocking systems call i.e., a multithreaded kernel. Otherwise, entire process will blocked in the kernel, even if there are runable threads left in the processes. For example, if one thread causes a page fault, the process blocks.
User Threads
The library provides support for thread creation, scheduling and management with no support from the kernel.
The kernel unaware of user-level threads creation and scheduling are done in user space without kernel intervention.
User-level threads are generally fast to create and manage they have drawbacks however.
If the kernel is single-threaded, then any user-level thread performing a blocking system call will cause the entire process to block, even if other threads are available to run within the application.
User-thread libraries include POSIX Pthreads, Mach C-threads,
and Solaris 2 UI-threads.
Kernel threads
The kernel performs thread creation, scheduling, and management in kernel space.
kernel threads are generally slower to create and manage than are user threads.
the kernel is managing the threads, if a thread performs a blocking system call.
A multiprocessor environment, the kernel can schedule threads on different processors.
5.including Windows NT, Windows 2000, Solaris 2, BeOS, and Tru64 UNIX (formerlyDigital UN1X)-support kernel threads.
Some development environments or languages will add there own threads like feature, that is written to take advantage of some knowledge of the environment, for example a GUI environment could implement some thread functionality which switch between user threads on each event loop.
A game library could have some thread like behaviour for characters. Sometimes the user thread like behaviour can be implemented in a different way, for example I work with cocoa a lot, and it has a timer mechanism which executes your code every x number of seconds, use fraction of a seconds and it like a thread. Ruby has a yield feature which is like cooperative threads. The advantage of user threads is they can switch at more predictable times. With kernel thread every time a thread starts up again, it needs to load any data it was working on, this can take time, with user threads you can switch when you have finished working on some data, so it doesn't need to be reloaded.
I haven't come across user threads that look the same as kernel threads, only thread like mechanisms like the timer, though I have read about them in older text books so I wonder if they were something that was more popular in the past but with the rise of true multithreaded OS's (modern Windows and Mac OS X) and more powerful hardware I wonder if they have gone out of favour.

what is kernel thread dispatching?

Can someone give me an easy to understand definition of kernel thread dispatching or just thread dispatching if there's no difference between the two?
From what I understand it's just doing a context switch while the currently active thread waits on a lock from another thread, so the CPU goes and does something else while this thread is in blocking mode.
I might however have misunderstood.
It's basically the process by which the operating system determines which of the many active threads is sent (dispatched) to the CPU for processing at any given point.
Each operating system has its own implementation, but the basic concept is to keep a sorted list of threads by priority, and dispatch them as needed to the CPU. Time slicing is added to allow multiple programs to run concurrently, etc.
