I have some doubts regarding concurrency of posix threads in multiprocessor machine. I have found similar questions in SO regarding it but didnt find conclusive answer.
Below is my understanding. I want to know if i am correct.
Posix threads are user level threads and kernel is not aware of it.
Kernel scheduler will treat Process( with all its threads) as one entity for scheduling. It is the thread library that in turn chooses which thread to run. It can slice the cpu time given by the kernel among the run-able threads.
User threads can run on different cpu cores. ie Let threads T1 & T2 be created by a Process(T), then T1 can run in Cpu1 and T2 can run in Cpu2 BUT they cant run concurrently.
Please let me know if my understanding in correct.
Thanks...
Since you marked your question with "Linux" tag I'm going to answer it according to standard pthreads implementation under linux. If you are talking about "green" threads, which are scheduled at the VM/language level instead of the OS, then your answers are mostly correct. But my comments below are on Linux pthreads.
1) Posix threads are user level threads and kernel is not aware of it.
No this is certainly not correct. The Linux kernel and the pthreads libraries work together to administer the threads. The kernel does the context switching, scheduling, memory management, cache memory management, etc.. There is other administration done at the user level of course but without he kernel, much of the power of pthreads would be lost.
2) Kernel scheduler will treat Process( with all its threads) as one entity for scheduling. It is the thread library that in turn chooses which thread to run. It can slice the cpu time given by the kernel among the run-able threads.
No, the kernel treats each process-thread as one entity. It has it's own rules about time slicing that take processes (and process priorities) into consideration but each sub-process thread is a schedulable entity.
3) User threads can run on different cpu cores. ie Let threads T1 & T2 be created by a Process(T), then T1 can run in Cpu1 and T2 can run in Cpu2 BUT they cant run concurrently.
No. Concurrent executing is expected for multi-threaded programs. That's why synchronization and mutexes are so important and why programmers put up with the complexity of multithreaded programming.
One way to prove this to you is to look at the output of ps with -L option to show the associated threads. ps usually wraps multiple threaded processes into one line but with -L you can see that the kernel has a separate virtual process-id for each thread:
ps -ef | grep 20587
foo 20587 1 1 Apr09 ? 00:16:39 java -server -Xmx1536m ...
versus
ps -eLf | grep 20587
foo 20587 1 20587 0 641 Apr09 ? 00:00:00 java -server -Xmx1536m ...
foo 20587 1 20588 0 641 Apr09 ? 00:00:30 java -server -Xmx1536m ...
foo 20587 1 20589 0 641 Apr09 ? 00:00:03 java -server -Xmx1536m ...
...
I'm not sure if Linux threads still do this but historically pthreads used the clone(2) system call to create another thread copy of itself:
Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.
This is different from fork(2) which is used when another full process is created.
POSIX does not specify how the threads created with pthread_create are scheduled on to processor cores. This is up to the implementation.
However, I would expect the following in a quality implementation, and this is the case for current versions of linux:
Threads are full kernel threads, and scheduled by the kernel
Threads from the same process can run concurrently on separate processors
i.e. all 3 of your numbered statements are wrong with current implementations of linux, but could in theory be true for another implementation that also conformed to POSIX.
Most POSIX implementations use OS support to provide thread functionality - they wrap the system calls needed to manage threads. As such, the behaviour of the threads re. scheduling etc depends upon the underlying OS.
So, on most modern OS:
A process with POSIX threads is no different to the kernel than any other process with multiple threads.
The kernel scheduler dispatches threads, not processes. A 'process' is often regarded as a higher-level construct that has code, memory-management, quotas, auditing, and security permissions, but not execution. A process can do nothing unless a thread runs its code, which is why, when a process is created, a 'main thread' is created at the same time, else nothing would run. The OS scheduling algorithm may use the process that the thread runs as one of the parameters to decide on which set of ready threads to run next - it's cheaper to swap one thread out with one from the same process - but it does not have to.
Slicing the cpu time given by the kernel among the run-able threads is a side-effect of the OS timer interrupt when there are more ready threads than there are cores to run them. Any machine that regularly has to resort to (ugh! - I hate the term), 'time slicing' should be regarded as overloaded and should have more CPU or less work. Threads should ideally only become ready when signaled by another thread or an IO driver, not because the OS timer interrupt has decided to run it in place of another thread that could still be doing useful work.
They just can run concurrently. If two threads are ready and there are two cores, the threads are dispatched onto the two cores. It does not matter if they are from the same process or not.
Related
Can the threads spawned by a process run on different cores of a multicore system?
Let's say I have a process P from which I have spawned two threads t1 and t2 and it's a multicore system with two cores C1 and C2. My questions are:
Can the treads t1 and t2 will they run on the same memory space as process P?
Can a thread t1 will execute in a different core than which process P is running on? eg: Process P is running on the core C1 and thread t1 is running in core C2?
Can the threads spawned from a process run on different cores of a multicore system?
Yes. Assuming that the hardware has multiple cores, and provided that the operating system supports / permits this. (Modern operating systems do support it. Whether it is permitted typically depends on the admin policy.)
Will the threads t1 and t2 run in the same memory space as process P?
Yes. They will use the same memory / virtual address space.
Can a thread t1 execute in a different core than which process P is running on? For example, process P is running on the core C1 and thread t1 is running in core C2?
This question does not make sense.
A POSIX process doesn't have the ability to execute code. It is the processes threads that execute code. So the idea of a "process running on core C1" is nonsensical.
Remember: every (live) POSIX process has at least one thread. The process starts with one thread, and that thread may spawn others if it needs to. The actual allocation of threads to cores is done by the operating system and will vary over the life time of the process.
This is how threads work in modern operating systems. For Linux, the current (POSIX compliant) way of implementing threads was introduced with Linux 2.6 in 2003. Prior to the Linux 2.6 kernel, Linux did not have true native threads. Instead it had a facility called LinuxThreads:
"LinuxThreads had a number of problems, mainly owing to the implementation, which used the clone system call to create a new process sharing the parent's address space. For example, threads had distinct process identifiers, causing problems for signal handling; LinuxThreads used the signals SIGUSR1 and SIGUSR2 for inter-thread coordination, meaning these signals could not be used by programs."
(From Wikipedia.)
In the (pre 2003!) LinuxThreads model, a "thread" was actually a process, and a process could share its address space with other processes.
Generally this depends on how threads are implemented in the OS scheduler.
That said, all modern OSes I know of will explicitly try to dispense threads in a way, that a good balance between cost of context switches and good parallelity is achieved. This implies, that if there is at least one idle core and no power capping / power napping / power preserving mode is active the waiting thread will be scheduled to the idle core.
If harsh power management is in place, the scheduler might chose to wait a tick or two before waking up that idle core - if the already running core frees up it can save a lot of juice.
I have read that a process and a thread are the same thing in Linux, for example in this question it says:
There is absolutely no difference between a thread and a process on
Linux.
But I don't understand how can a process and a thread means the same thing. I mean a thread is what gets executed by the CPU, and a process is simply an "enclosure" for the threads which allows the threads to have shared memory. This image shows the relationship between a process and its threads:
So clearly a process and a thread does not mean the same thing!
Linux didn't use to have special support for (POSIX) threads, and it simply treated them as processes that shared their address space as well as a few other resources (filedescriptors, signal actions, ...) with other "processes".
That implementation, while elegant, made certain things required for threads by POSIX difficult, so Linux did end up gaining that special support for threads and your premise is now no longer true.
Nevertheless, processes and threads still both remain represented as tasks within the kernel (but now the kernel has support for grouping those tasks into thread groups as well and APIs for working with those ((tgkill, tkill, exit_group, ...)).
You can google LinuxThreads and NPTL threads to learn more about the topic.
After the Kernel schedules a process that has threads, How does said process schedule its own threads during its time splice?
For most modern kernels, the kernel only schedules threads, and processes are mostly just a container for the threads to execute inside (e.g. a container that contains a virtual address space, however many threads, and a few other scraps like file handles).
For some kernels (mostly very old unix kernels that existed before threads were invented) the kernel schedules processes and then a user-space library emulates threads. For this to work properly all of the "blocking" system calls (e.g. write()) have to be replaced by asynchronous system calls (e.g. aio_write()) so that other threads in the process can be given CPU time; however I wouldn't want to assume it works properly (e.g. if any thread blocks, then maybe all threads in the process block).
Also it may not work when there's multiple CPUs (kernel gives a process one CPU, but then from the kernel's perspective that process is running and can't use a second CPU). There are sophisticated work-arounds for this (to support "M:N threading") but it's just easier and better to fix the scheduler so it works with threads. Fortunately/unfortunately this didn't matter much in the early days because very few computers had more than one CPU anyway.
Lastly; it doesn't work for thread priorities - e.g. one process might keep CPU busy executing an unimportant/low priority thread while another process doesn't get that CPU time when it desperately needs it for an important/high priority thread. This occurs because no process knows about threads belonging to other processes and the kernel only knows about processes and not threads.
Of course these are also the reasons why every kernel adopted "kernel schedules threads and not processes" (and those that didn't died).
It's down to jargon definitions, but threads are simply a bunch of processes sharing an address space. Older Unixes even called them Light Weight Processes.
With that classical understanding of threads, the answer is that, these days, it's the OS that does the scheduling and each thread gets its own timeslices.
Extras
Some OSes do things to "the whole process" - e.g. Windows will give the process that has mouse focus a priority boost (all it's threads get dynamically notched up a few priority places), to make that application appear to be more sprightly (this goes back to Windows 3).
Other operating systems will increase the priority of a thread dynamically, to solve priority inversion situations. This is where a low priority thread that has control of a resource (I/O, or perhaps a semaphore) is blocking a higher priority thread from running (because the resource is not available. This is the priority inversion, and it's solved by the OS boosting the priority of the blocking thread until it gives up the required resource.
Either the kernel schedules the threads or the kernel schedules processes simulates thread by scheduling it own threads.
Usually, the process schedules its own threads using a library that sets timers. When the timer handler saves the current "thread's" registers then loads a new set of registers from another "thread."
After reading this SO question I got a few doubts. Please help in understanding.
Scheduling involves deciding when to run a process and for what quantum of time.
Does linux kernel schedule a thread or a process? As process and thread are not differentiated inside kernel how a scheduler treats them?
How quantum for each thread is decided?
a. If a quantum of time (say 100us) is decided for a process is that getting shared between all the threads of the process? or
b. A quantum for each thread is decided by the scheduler?
Note: Questions 1 and 2 are related and may look the same but just wanted to be clear on how things are working posted them both here.
The Linux scheduler (on recent Linux kernels, e.g. 3.0 at least) is scheduling schedulable tasks or simply tasks.
A task may be :
a single-threaded process (e.g. created by fork without any thread library)
any thread inside a multi-threaded process (including its main thread), in particular Posix threads (pthreads)
kernel tasks, which are started internally in the kernel and stay in kernel land (e.g. kworker, nfsiod, kjournald , kauditd, kswapd etc etc...)
In other words, threads inside multi-threaded processes are scheduled like non-threaded -i.e. single threaded- processes.
The low-level clone(2) syscall creates user-land schedulable tasks (and can be used both for creating fork-ed process or for implementation of thread libraries, like pthread). Unless you are a low-level thread library implementor, you don't want to use clone directly.
AFAIK, for multi-threaded processes, the kernel is (almost) not scheduling the process, but each individual thread inside (including the main thread).
Actually, there is some notion of thread groups and affinity in the scheduling, but I don't know them well
These days, processors have generally more than one core, and each core is running a task (at some given instant) so you do have several tasks running in parallel.
CPU quantum times are given to tasks, not to processes
The NPTL implementation of POSIX thread specifications sees thread as a different process inside kernel, having unique task_struct (and therefore pid too) so each thread is schedulable in itself as mentioned. Therefore each thread gets its own timeslice and is scheduled just like processes as mentioned above.
Just to add, Currently Linux scheduler is also capable of scheduling not only single tasks ( a simple process), but groups of processes or even users ( all processes, belonging to a user) as a whole. This allows implementing of group scheduling, where CPU time is first divided between process groups and then distributed within those groups to single threads.
Linux threads does not directly operate on processes or threads, but works with schedulable entities. Represented by struct sched_entity.
It's fair to say that every process/thread is a sched_entity but the converse might not be true.
To know detailed process scheduling, refer here
I found an answer to the question here. But I don't understand some ideas in the answer. For instance, lightweight process is said to share its logical address space with other processes. What does it mean? I can understand the same situation with 2 threads: both of them share one address space, so both of them can read any variables from bss segment (for example). But we've got a lot of different processes with different bss sections, and I don't know, how to share all of them.
I am not sure that answers are correct here, so let me post my version.
There is a difference between process - LWP (lightweight process) and user thread. I will leave process definition aside since that's more or less known and focus on LWP vs user threads.
LWP is what essentially are called today threads. Originally, user thread meant a thread that is managed by the application itself and the kernel does not know anything about it.
LWP, on the other hand, is a unit of scheduling and execution by the kernel.
Example:
Let's assume that system has 3 other processes running and scheduling is round-robin without priorities. And you have 1 processor/core.
Option 1. You have 2 user threads using one LWP. That means that from OS perspective you have ONE scheduling unit. Totally there are 4 LWP running (3 others + 1 yours). Your LWP gets 1/4 of total CPU time and since you have 2 user threads, each of them gets 1/8 of total CPU time (depends on your implementation)
Option2. You have 2 LWP. From OS perspective, you have TWO scheduling units. Totally there are 5 LWP running. Your LWP gets 1/5 of total CPU time EACH and your application get's 2/5 of CPU.
Another rough difference - LWP has pid (process id), user threads do not.
For some reason, naming got little messed and we refer to LWP as threads.
There are definitely more differences, but please, refer to slides.
http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt
EDIT:
After posting, I found a good article that explains everything in more details and is in better English than I write.
http://www.thegeekstuff.com/2013/11/linux-process-and-threads/
From MSDN, Threads and Processes:
Processes exist in the operating system and correspond to what users
see as programs or applications. A thread, on the other hand, exists
within a process. For this reason, threads are sometimes referred to
as light-weight processes. Each process consists of one or more
threads.
Based on Tanenbaum's book "Distributes Systems", light weight processes is generally referred to a hybrid form of user-level thread and kernel-level thread. An LWP runs in the context of a single process, and there can be several LWPs per process. In addition each LWP can be running its own (user-level) thread. Multi-threaded applications are constructed by creating threads (with thread library package), and subsequently assigning each thread to an LWP.
The biggest advantage of using this hybrid approach is that creating, destroying, and synchronizing threads is relatively cheap and do not need any kernel intervention. Beside that, provided that a process has enough LWPs, a blocking system call will not suspend the entire process.
Threads run within processes.
Each process may contain one or more threads.
If kernel doesn't know anything about the threads running in the process, we have threads running on user space and thus no multiprocessing capabilities are available.
On the other hand, we can have threads running on the kernel space; this means that each process runs on a different CPU. This enables us multiprocessing, but as you may assume it is more expensive in terms of operating system resources.
Finally, there is a solution that lies somewhere in the middle; we group threads together into LWP. Each group runs on different CPU, but the threads in the group cannot be multi processed. That's because kernel in this version knows only about the groups (which are multiprocessed) but nothing about the threads that they contain.
Hope it is clear enough.
IMO, LWP is a kernel thread binding which can be created and executed in the user context.
If I'm not mistaken, you can attach user threads to a single LWP to potentially increase the level of concurrency without involving a system call.
Thread is basically task assigned with one goal and enough information to perform a specific task .
A process can create multiple threads for doing its work as fast as possible.
e.g A portion of program may need input output, a portion may need permissions.
User level thread are those that can be handled by thread library.
On the other hand kernel level thread (which needs to deal with hadrware)are also called LWP(light weight process) to maximize the use of system and so the system does not halt upon just one system call.
From here.
Each LWP is a kernel resource in a kernel pool, and is attached and detached to a thread on a per thread basis. This happens as threads are scheduled or created and destroyed.
A process contains one or more threads in it and a thread can do anything a process can do. Also threads within a process share the same address space because of which cost of communication between threads is low as it is using the same code section, data section and OS resources, so these all features of thread makes it a "lightweight process".