Do I need to change the process priority or scheduler? - linux

I'm running on Linux (Ubuntu). I have a process which contains 20 threads All the threads have the same default priority & scheduler - 20. The default is not a RT scheduler. In my process all threads need to get the same priority
(there is no thread which is more important than the other thread).
The other processes on the system are the OS processes .
I'm running on Intel I7 (quad core).
When using top I'm getting 140-150% CPU.
It seems that sometime the processes is choking without reason (some threads take longer to calculate some data)
I read some papers on priority & scheduler, but they didn't note if the change can harm the OS. Do I need to change the scheduler or priority of my process (threads)?
I didn't change yet, because I don't know if the change will harm the OS.

Related

Can the threads of the same process run on different core?

Can the threads spawned by a process run on different cores of a multicore system?
Let's say I have a process P from which I have spawned two threads t1 and t2 and it's a multicore system with two cores C1 and C2. My questions are:
Can the treads t1 and t2 will they run on the same memory space as process P?
Can a thread t1 will execute in a different core than which process P is running on? eg: Process P is running on the core C1 and thread t1 is running in core C2?
Can the threads spawned from a process run on different cores of a multicore system?
Yes. Assuming that the hardware has multiple cores, and provided that the operating system supports / permits this. (Modern operating systems do support it. Whether it is permitted typically depends on the admin policy.)
Will the threads t1 and t2 run in the same memory space as process P?
Yes. They will use the same memory / virtual address space.
Can a thread t1 execute in a different core than which process P is running on? For example, process P is running on the core C1 and thread t1 is running in core C2?
This question does not make sense.
A POSIX process doesn't have the ability to execute code. It is the processes threads that execute code. So the idea of a "process running on core C1" is nonsensical.
Remember: every (live) POSIX process has at least one thread. The process starts with one thread, and that thread may spawn others if it needs to. The actual allocation of threads to cores is done by the operating system and will vary over the life time of the process.
This is how threads work in modern operating systems. For Linux, the current (POSIX compliant) way of implementing threads was introduced with Linux 2.6 in 2003. Prior to the Linux 2.6 kernel, Linux did not have true native threads. Instead it had a facility called LinuxThreads:
"LinuxThreads had a number of problems, mainly owing to the implementation, which used the clone system call to create a new process sharing the parent's address space. For example, threads had distinct process identifiers, causing problems for signal handling; LinuxThreads used the signals SIGUSR1 and SIGUSR2 for inter-thread coordination, meaning these signals could not be used by programs."
(From Wikipedia.)
In the (pre 2003!) LinuxThreads model, a "thread" was actually a process, and a process could share its address space with other processes.
Generally this depends on how threads are implemented in the OS scheduler.
That said, all modern OSes I know of will explicitly try to dispense threads in a way, that a good balance between cost of context switches and good parallelity is achieved. This implies, that if there is at least one idle core and no power capping / power napping / power preserving mode is active the waiting thread will be scheduled to the idle core.
If harsh power management is in place, the scheduler might chose to wait a tick or two before waking up that idle core - if the already running core frees up it can save a lot of juice.

How does a process schedule its own threads

After the Kernel schedules a process that has threads, How does said process schedule its own threads during its time splice?
For most modern kernels, the kernel only schedules threads, and processes are mostly just a container for the threads to execute inside (e.g. a container that contains a virtual address space, however many threads, and a few other scraps like file handles).
For some kernels (mostly very old unix kernels that existed before threads were invented) the kernel schedules processes and then a user-space library emulates threads. For this to work properly all of the "blocking" system calls (e.g. write()) have to be replaced by asynchronous system calls (e.g. aio_write()) so that other threads in the process can be given CPU time; however I wouldn't want to assume it works properly (e.g. if any thread blocks, then maybe all threads in the process block).
Also it may not work when there's multiple CPUs (kernel gives a process one CPU, but then from the kernel's perspective that process is running and can't use a second CPU). There are sophisticated work-arounds for this (to support "M:N threading") but it's just easier and better to fix the scheduler so it works with threads. Fortunately/unfortunately this didn't matter much in the early days because very few computers had more than one CPU anyway.
Lastly; it doesn't work for thread priorities - e.g. one process might keep CPU busy executing an unimportant/low priority thread while another process doesn't get that CPU time when it desperately needs it for an important/high priority thread. This occurs because no process knows about threads belonging to other processes and the kernel only knows about processes and not threads.
Of course these are also the reasons why every kernel adopted "kernel schedules threads and not processes" (and those that didn't died).
It's down to jargon definitions, but threads are simply a bunch of processes sharing an address space. Older Unixes even called them Light Weight Processes.
With that classical understanding of threads, the answer is that, these days, it's the OS that does the scheduling and each thread gets its own timeslices.
Extras
Some OSes do things to "the whole process" - e.g. Windows will give the process that has mouse focus a priority boost (all it's threads get dynamically notched up a few priority places), to make that application appear to be more sprightly (this goes back to Windows 3).
Other operating systems will increase the priority of a thread dynamically, to solve priority inversion situations. This is where a low priority thread that has control of a resource (I/O, or perhaps a semaphore) is blocking a higher priority thread from running (because the resource is not available. This is the priority inversion, and it's solved by the OS boosting the priority of the blocking thread until it gives up the required resource.
Either the kernel schedules the threads or the kernel schedules processes simulates thread by scheduling it own threads.
Usually, the process schedules its own threads using a library that sets timers. When the timer handler saves the current "thread's" registers then loads a new set of registers from another "thread."

Process with multiple threads on multiprocessor system. How do they work?

So I was reading about Processes and Threads and I had a question. Following is the scenario.
Uniprocessor Environment
I understand that the OS rotates the processes over processor for a particular time period.(quantum) . Now I get it when the process is single threaded, ie just one path of execution. In that case, whenever it is assigned the processor, it continues with it's execution. Let's say the process forks and or just creates a new thread. Now how does the entire process works? Is it that the OS will say to process P "Go on, continue with execution" and the Process within itself will pick the new thread or the parent thread on rotation? So that if there are more than two threads, the rotation seems fair to each thread. Or does the OS actually interacts with the threads? (In that case I am not sure what happens).
Multiprocessor Environment
Now say I have a multiprocessor environment. Now in this case, if there was just uni-threaded process, then OS will assign either of the processors to it and on it will go with it's execution. Now say, there are multiple threads in the Process. Now if I assign one of the processor to the process, and ask it to continue it's execution, and the Process has to pick either of the thread for it's execution, then there never will be parallel processing going on in that specific process. Since the process will have to put either of it's threads on the processor.
So how does it happen in both the cases?
Cheers.
Process Scheduing
Operating Systems ultimately control these types of thread scheduling.
Windows systems are priority-based and so will allow a process to consume more resources that others. This is why your machine can 'hang', if a process has been escalated to a high priority. Priorities are ranged between 1-31 as far as I know.
Mac OS / Linux / Unix are time-based, allowing all processes to have equal amounts of CPU time. Therefore loading more processes will slow your system down as they all share a smaller slice of execution time.
Uniprocessor Environment
The OS is ultimately responsible for this but switching processes involves (I cannot guarantee accuracy here, but its just an indication):
Halting a process / thread
Storing the current stack (code location)
Storing the current registers of the CPU
Asking the kernel for the next process/thread to run
Kernel indicates which one has to be run
OS reloads the registers from the cache
OS reloads the current stack for the next application.
Resumes the process
Obviously the more threads and processes you have running, the slower it will become. The problem is that the time taken to switch processes can actually take longer than the time allowed to execute the process.
Threads are just child processes of a single process. For a single processor, it just looks like additional work.
Multi-processor Environment
Multi-processor environments work differently as the cache is shared amongst processors. I believe these are called L1 (Level) and L2 caches. So the difference is that processor A can reload the state stored by processor B without conflicts. 'Hyper-threading' also has the same approach, although this is processor specific. The difference here is that a processor could solely control a specific process - this is called 'CPU Affinity' Its not encouraged for every process, but it does allow an application to have a dedicated processor to work off.
This is OS-specific, of course, but most operating systems schedule at the thread level. A process is just a grouping of threads. For example, on Linux, threads are called "tasks" and each is scheduled independently. They are created with the clone call. What is typically called a thread is a task which shares its address space (and other resources such as file descriptors, mount points, etc.) with the creating task. Note that the clone call can also create what is typically called a process if the flags to enable sharing are not passed.
Considering the above, any thread may be scheduled at any time on any processor, no matter how many processors there are available. That said, most OSs also attempt to maintain some measure of processor affinity to avoid excessive cache misses, but usually if a thread is runnable and a different CPU is available, it will change CPUs. Often there is also a way to specify which CPUs a particular thread may execute upon.
Doesn't matter whether there is 1 or 128 processors. The OS manages access to resources to try an efficiently match up requests with availabilty, and that includes CPU execution. If a thread is running, it has already managed to get some CPU but, if it requests a resource that is not immediately available, it no longer needs any CPU until that other resource does become free, and so the OS will remove CPU execution from it and, if there is another thread that is waiting for CPU, it will hand it over. When the requested reource does become available, the thread will be made ready again. If there is a core free, it will be made running 'immediately', if not, the CPU scheduling algorithm makes a decision on whether to stop a currently-running thread to free up a core or to leave the newly-ready thrad waiting.
It's better to try and ignore things like 'time-slice, quantum, priority' - it causes much confusion and FUD. If a running thread wants something it cannot have yet, it doesn't need any more CPU cycles, and the OS will take them away and, if another thread needs it, apply them there. That is why preemptive multitaskers exist - to match up threads with resources in an attempt to maximize forward progress.

MultiCore CPUs, Multithreading and context switching?

Let's say we have a CPU with 20 cores and a process with 20 CPU-intensive independent of each other threads: One thread per CPU core.
I'm trying to figure out whether context switching happens in this case. I believe it happens because there are system processes in the operating system that need CPU-time too.
I understand that there are different CPU architectures and some answers may vary but can you please explain:
How context switching happens e.g. on Linux or Windows and some known CPU architectures? And what happens under the hood on modern hardware?
What if we have 10 cores and 20 threads or the other way around?
How to calculate how many threads we need if we have n CPUs?
Does CPU cache(L1/L2) gets empty after context switching?
Thank you
How context switching happens e.g. on Linux or Windows and some known
CPU architectures? And what happens under the hood on modern hardware?
A context-switch happens when an interrupt occurs and that interrupt, together with the kernel thread and process state data, specify a set of running threads that is different than the set running before the interrupt. Note that, in OS terms, an interrupt may be either a 'real' hardware interrupt that causes a driver to run and that driver requests a scheduling run, or a syscall from a thread that is already running. In either case, the OS scheduling state-machine decides whether to change the set of threads running on the available cores.
The kernel can change the set of running threads by stopping thread/s and running others. It can stop any thread running on any core by queueing up a premption request and generating a hardware interrupt of that core to force the core to run its interprocessor driver to handle the request.
What if we have 10 cores and 20 threads?
Depends on what the threads are doing. If they are in any other state than ready/running, (eg blocked on I/O or inter-thread comms), there will be no context-switching between them because nothing is running. If they are all ready/running, 10 of them will run forever on the 10 cores until there is an interrupt. Most systems have a periodic timer interrupt that can have the effect of sharing the available cores around the threads.
or the other way around
10 threads run on 10 cores. The other 10 cores are halted. The OS may move the threads around the cores, eg. to prevent uneven heat dissipation across the die.
How to calculate how many threads we need if we have n CPUs?
App-dependent. It would be nice if all cores were always used up 100% on exactly as many ready threads as cores but, since most threads are blocked for much more time than they are running, it's difficult, except in some end-cases, (eg - your '20 CPU-intensive threads on 20 cores'), to come up with any optimal number.
Does CPU cache(L1/L2) gets empty after context switching?
Maybe - it depends entirely on the data usage of the threads. The caches will get reloaded on-demand, as usual. There is no 'context-switch total cache reload' but, if the threads access different, large arrays of data while running, then the (L1 at least), cache will indeed get fully reloaded during the thread run.

Threads inside a Process

Processes get CPU time as managed by the OS process scheduler.
Since threads run in parallel within a single process, does this mean that a process's CPU time is further distributed(sliced) among threads?
Or can the scheduler directly distribute CPU time among threads bypassing the parent process?
I suspect the answer varies with the OS. On Windows, the process is not merely bypassed, but completely ignored -- all the scheduler deals with is threads. Processes are relevant only to the degree that all non-kernel threads do have to belong to some process, and every process has to contain at least one thread.
The threads are run/scheduled by the operating system and therefore they get their own CPU time. The process CPU time is just the sum of the CPU times of all the threads in the process.
If you want your process to schedule the tasks itself, you should use fibers (Windows). These are a kind of threads but they are not scheduled by the OS. The process should handle the scheduling of fibers itself.
For Windows see http://msdn.microsoft.com/en-us/library/ms681917%28VS.85%29.aspx

Resources