Multithreading for Compute-bound jobs - multithreading

How does the process/thread scheduler work on a typical system, with respect to fairness and granularity? Does the scheduler pass instructions to the processor by switching between processes or between threads? If the latter is the case, then, can I improve the performance of my compute-bound jobs by spawning more threads in my process? The literature on this topic seems to use process and thread interchangeably. For clarity, I use the definition of a process where it is a collection of 1 or more threads of execution.
I presume that multiple threads do NOT improve compute-bound jobs on a single processor, implying that the scheduler's scheduling granularity is at the process level. For example, if there are N processes, each process gets 1/N-th of the processor, no matter how many threads if spawns (if the scheduler is fair, of course).
I found a related conversation here: How Linux handles threads and process scheduling
Linux doesn't differentiate between threads and processes, so threads are actually treated like processes with shared memory. If this is the case, it would seem that I could improve compute-bound run time by spawning more threads. Am I interpreting this correctly?

Related

Threads of one process in parallel execution

I know, that threads exist in borders of process: each process has at least 1 thread and thread can't exist without process; threads share memory and processes does not(without special manipulations) and so on. Also we can load CPU cores by giving it multiple processes to execute at the same time.
But can we execute multiple threads of the SAME process at one time(i mean real parallel execution, not pseudo-parallel) and if we can, is it better than using mupltiple processes and why?
Thank you for answer!
Threads are basically lightweight processes. OS threads can be executed in parallel, real parallel execution requires just having multiple CPU cores.
The threads have lower isolation since they share memory and can clobber each other's memory, unlike processes. The upside is that they generally have less metadata associated with them and are easier/faster to create, so you can have more of them running at the same time, than processes.

Why is process scheduling not called thread scheduling?

I found out that Linux and Windows both schedule threads and not processes.
Source
So I don't understand why we call it "process scheduling" any more. Shouldn't we be calling it thread scheduling? The idea of shared memory for threads of the same process just seems to be a technicality that has to be taken care of while actually running the threads (we could assume 2 threads of the same process to be a 2 single threaded processes sharing memory).
Are there any operating systems that schedule processes and when it is time for a process to run, specially decide how to run its threads?
OS-scheduled threads are a relatively new feature. It was not that long ago when a separate path of execution on Unix meant creating an entirely new process. So there is historical resistance.
Some systems (Unix variants, VMS) schedule processes, not threads. Process scheduling is likely to remain the way to go in real time operating systems.
In process scheduling resources are allocated to each process differently i.e suppose you create 2 processes then each process will get his own resources(file buffer,i/o files, CPU control etc). In this, time is wasted when scheduling is done. As first process is called then resources are allocated to that process when second process is called then resources are allocated to that process so resources are allocated separately to each process and also context switching time increases during scheduling.
Thread is basically a small unit of process. So one process can have many threads. But here resources are shared between different threads as they are one part of process, so multitasking is available and also context switching time is less.

How Linux scheduler schedules processes on multi-core processors?

Multi-core processors exploits thread level parallelism, it means that multiple threads runs in parallel. Suppose, a process has only one thread then, do the other cores remain idle during the execution of this process? In linux system, scheduler consider processes and threads both as a task. It doesn't differentiate between process and thread while scheduling it. So, does this means that different cores executes different threads of different processes in parallel?
When context-switch happens, does this happen only for one core or for all the cores of the cpu?
You are right: processes and threads are the same from the Linux scheduler's point of view. These tasks are queued according to the scheduler's rules and wait for their turn.
There are scheduling rules such as priority or CPU affinity (to prevent a thread to migrate to another core and preserve cache data).
A context switch may happen on a core every fixed amount of time (a time slice) because the CPU automatically runs some kernel code periodically to permit preemption. Depending on the scheduler's rules, a task can be run for many time slices. A context switch can also occur when a thread calls functions that makes it unrunnable (eg. waiting for IO).
In some cases, if not all, there is one scheduling process per core which does all that.
There is also a similar question on superuser

Does linux schedule a process or a thread?

After reading this SO question I got a few doubts. Please help in understanding.
Scheduling involves deciding when to run a process and for what quantum of time.
Does linux kernel schedule a thread or a process? As process and thread are not differentiated inside kernel how a scheduler treats them?
How quantum for each thread is decided?
a. If a quantum of time (say 100us) is decided for a process is that getting shared between all the threads of the process? or
b. A quantum for each thread is decided by the scheduler?
Note: Questions 1 and 2 are related and may look the same but just wanted to be clear on how things are working posted them both here.
The Linux scheduler (on recent Linux kernels, e.g. 3.0 at least) is scheduling schedulable tasks or simply tasks.
A task may be :
a single-threaded process (e.g. created by fork without any thread library)
any thread inside a multi-threaded process (including its main thread), in particular Posix threads (pthreads)
kernel tasks, which are started internally in the kernel and stay in kernel land (e.g. kworker, nfsiod, kjournald , kauditd, kswapd etc etc...)
In other words, threads inside multi-threaded processes are scheduled like non-threaded -i.e. single threaded- processes.
The low-level clone(2) syscall creates user-land schedulable tasks (and can be used both for creating fork-ed process or for implementation of thread libraries, like pthread). Unless you are a low-level thread library implementor, you don't want to use clone directly.
AFAIK, for multi-threaded processes, the kernel is (almost) not scheduling the process, but each individual thread inside (including the main thread).
Actually, there is some notion of thread groups and affinity in the scheduling, but I don't know them well
These days, processors have generally more than one core, and each core is running a task (at some given instant) so you do have several tasks running in parallel.
CPU quantum times are given to tasks, not to processes
The NPTL implementation of POSIX thread specifications sees thread as a different process inside kernel, having unique task_struct (and therefore pid too) so each thread is schedulable in itself as mentioned. Therefore each thread gets its own timeslice and is scheduled just like processes as mentioned above.
Just to add, Currently Linux scheduler is also capable of scheduling not only single tasks ( a simple process), but groups of processes or even users ( all processes, belonging to a user) as a whole. This allows implementing of group scheduling, where CPU time is first divided between process groups and then distributed within those groups to single threads.
Linux threads does not directly operate on processes or threads, but works with schedulable entities. Represented by struct sched_entity.
It's fair to say that every process/thread is a sched_entity but the converse might not be true.
To know detailed process scheduling, refer here

Threads inside a Process

Processes get CPU time as managed by the OS process scheduler.
Since threads run in parallel within a single process, does this mean that a process's CPU time is further distributed(sliced) among threads?
Or can the scheduler directly distribute CPU time among threads bypassing the parent process?
I suspect the answer varies with the OS. On Windows, the process is not merely bypassed, but completely ignored -- all the scheduler deals with is threads. Processes are relevant only to the degree that all non-kernel threads do have to belong to some process, and every process has to contain at least one thread.
The threads are run/scheduled by the operating system and therefore they get their own CPU time. The process CPU time is just the sum of the CPU times of all the threads in the process.
If you want your process to schedule the tasks itself, you should use fibers (Windows). These are a kind of threads but they are not scheduled by the OS. The process should handle the scheduling of fibers itself.
For Windows see http://msdn.microsoft.com/en-us/library/ms681917%28VS.85%29.aspx

Resources