How "threads" get CPU and time slice? - linux

Kindly, help me understand the following 'thread' concepts:
If concurrently running threads are part of a running process, how is time slice divided between multiple threads of a same process?
Also, since there is no new Process Control Clock created, how do they get their share of CPU allocation? Is it like, dispatcher lets TCB access CPU?

That's operating system scheduling job. The OS has a thread pool of active threads, it implements scheduling algorithm to make sure each thread is given amount of CPU time to run. For example Linux uses Completely Fair Scheduling

Related

Context Switch: Thread vs Process

From what I understand, scheduling is based on processes, not threads. Then let's say I'm running two programs with the same logic, but one with multi-processing(10 processes) and the other one with multi-threading(10 threads). Then, since scheduling is based on processes, wouldn't the program with multi-processing dominate 10/11 of cpu time? The multi-threaded program would only have 1/11 of cpu time and 10 threads share that tiny time slice.
What am I missing?

Process vs thread with example

I read articles on processes vs threads, but I am still not clear on the difference.
Suppose a process is using the CPU/Processor, doing some big calculation that takes 10 minutes. How will another process run at the same time in parallel? In a single core vs a dual core processor?
Same thing for threads, how will another thread run in parallel when the CPU/Processor is engaged with another thread?
How is context switching different for threads and for processes? I mean both process and threads use the same RAM memory, so what's the difference?
From my vague memory of Operating Systems I can offer you a little bit of help. First you have to know the difference between concurrent and simultaneous. They are not the same thing; simultaneous means both things occur at the same time and concurrent means they appear to be running simultaneously but in reality they're switching so fast you can't tell.
Processes and threads can be considered similar, but a big difference is that a process is much larger than a thread. For that reason, it is not good to have switching between processes. There is too much information in a process that would have to be saved and reloaded each time the CPU decides to switch processes.
A thread on the other hand is smaller and so it is better for switching. A process may have multiple threads that run concurrently, meaning not at the same exact time, but run together and switch between them. The context switching here is better because a thread won't have as much information to store/reload.
If you only have a single core then you can only do concurrent execution, for the most part. Once you have multiple cores you can have threads run on both cores and thus have simultaneous execution. It is up to the Operating System to schedule when threads run, when processes get to run, when to switch, how to switch them, etc. The Operating System gives you the illusion that work is being done simultaneously when this is not always the case.
If you have more confusion feel free to comment.
A process is a thing very related to the Operating System (OS). The thread is in the simplest terms, is an executing program. One or more threads run in the context of the process. The Java Virtual Machine (JVM) is a process in your OS.
And inside the JVM you can have multiple threads running concurrently.
The processor is a resource of your machine, like the memory. Your OS let your process to share the available resources, in our simple case processors and memory.
When you develop in Java, all processor in your machine are available resources.
When you develop your solution, you can have even multiple Java processes (i.e. multiple JVM) running a single or multiple thread each. But this mostly depends by your problem.
The real difference between a process and a thread is that both have an executing program, but threads share the same memory. This let your threads to theoretically work on the same data, but you have pay the complexity of concurrency and synchronisation.
Each CPU only runs one thread in a process at a time. However the OS can stop and save a thread and load and run another quickly (as little as 0.0001 seconds) This gives the illusion that many threads are running at once, even though only one is running.

Why is process scheduling not called thread scheduling?

I found out that Linux and Windows both schedule threads and not processes.
Source
So I don't understand why we call it "process scheduling" any more. Shouldn't we be calling it thread scheduling? The idea of shared memory for threads of the same process just seems to be a technicality that has to be taken care of while actually running the threads (we could assume 2 threads of the same process to be a 2 single threaded processes sharing memory).
Are there any operating systems that schedule processes and when it is time for a process to run, specially decide how to run its threads?
OS-scheduled threads are a relatively new feature. It was not that long ago when a separate path of execution on Unix meant creating an entirely new process. So there is historical resistance.
Some systems (Unix variants, VMS) schedule processes, not threads. Process scheduling is likely to remain the way to go in real time operating systems.
In process scheduling resources are allocated to each process differently i.e suppose you create 2 processes then each process will get his own resources(file buffer,i/o files, CPU control etc). In this, time is wasted when scheduling is done. As first process is called then resources are allocated to that process when second process is called then resources are allocated to that process so resources are allocated separately to each process and also context switching time increases during scheduling.
Thread is basically a small unit of process. So one process can have many threads. But here resources are shared between different threads as they are one part of process, so multitasking is available and also context switching time is less.

In Linux scheduler, how do different processes containing multiple threads get fair time quota?

I know linux scheduler will schedule the task_struct which is a thread. Then if we have two processes, e.g., A contains 100 threads while B is single thread, how can the two processes be scheduled fairly, considering if each thread would be scheduled fairly?
In addition, so in Linux, context switch between threads from the same process would be faster than that between threads from different processes, right? Since the latter will have something to do with process control block while the former wouldn't.
The point you are missing here is, how scheduler looks at threads or tasks. Well, the Linux kernel scheduler will treat them as individual scheduling entity, therefore will be counted and scheduled differently.
Now let's see what CFS documentation says - it has a simplistic approach of giving out even slice of CPU time to each runnable process, therefore, if there are 4 runnable process/threads they'll get 25% of cpu time each. But on real hardware it's not possible and to fix the issue vruntime was introduced (take more on this from here
Now come back to your example, if process A creates 100 threads and B creates 1 thread then the # of running processes or threads becomes 103 (assuming all are runnable state) then CFS will evenly share the cpu using formula 1/103 (cpu/number of running tasks). And the context switching is same for all the scheduling entities, threads only shares task's internal mm_struct and when they run they have their own sets of registers, task status to load up to start with. Hope this will help to understand better.

Threads inside a Process

Processes get CPU time as managed by the OS process scheduler.
Since threads run in parallel within a single process, does this mean that a process's CPU time is further distributed(sliced) among threads?
Or can the scheduler directly distribute CPU time among threads bypassing the parent process?
I suspect the answer varies with the OS. On Windows, the process is not merely bypassed, but completely ignored -- all the scheduler deals with is threads. Processes are relevant only to the degree that all non-kernel threads do have to belong to some process, and every process has to contain at least one thread.
The threads are run/scheduled by the operating system and therefore they get their own CPU time. The process CPU time is just the sum of the CPU times of all the threads in the process.
If you want your process to schedule the tasks itself, you should use fibers (Windows). These are a kind of threads but they are not scheduled by the OS. The process should handle the scheduling of fibers itself.
For Windows see http://msdn.microsoft.com/en-us/library/ms681917%28VS.85%29.aspx

Resources