Is a multi-user and multi-processor environment useful with threading? - multithreading

Taking CPU affinity into account, will such an environment be useful with threading? Or will there be a performance degradation in such a system, if multiple users login and spawn multiple kernel and user threads?

When you say "taking CPU affinity into account" - are you saying that all processes have CPU affinity in this hypothetical system? Or is that just as one extra possible bit of information?
Using multiple threads will slow things down a bit if the system is already loaded (so there are more runnable threads than cores) but if there are often times where there are only (say) 2 users and 4 cores available, threading may help.
Another typical use for threads is to do something "in the background" whether that's explicitly using threads or using async calls. At that point multi-threading can definitely give a benefit (e.g. a non-hanging UI) without actually using more than one core simultaneously for much of the time.

Related

Why Linux distributes threads among NUMA nodes almost equally?

I'm running an application with multiple threads and it seems Linux is distributing threads among NUMA nodes almost equally. Say my application spawns 4 threads and my machine has 4 sockets. I observe that each thread is assigned to a NUMA node distributing threads among all nodes almost equally.
Is there any reason for this? why not assign all on one socket and then fill the next one?
The best binding for an application is dependent of what the application does. It is often a good idea to spread thread on different NUMA nodes so to maximize the memory throughput as all NUMA nodes can theoretically be used in this case (assuming the application is well written and NUMA aware). If all threads are bound to the same NUMA node, then only the memory of the node can be efficiently accessed (access to memory of other NUMA node is possible but slower and pages will not be automatically efficiently map due to the first touch policy which is generally the default one on most machine). When some threads communicate a lot, it is often better to put them on the same NUMA node so not to pay latency overheads. In some cases, it can even be better to put them on the same core (but different hardware threads) so to speed up synchronization operations like locks and atomics.
If you want the scheduling and the binding to be efficient, you need to provide more information to the OS or do it yourself. I strongly advise you to bind threads to specific cores. This is easy with HPC runtimes/tools like OpenMP (but a pain if your application use low-level threads unless you do not care about platform portability). As for NUMA, you can specify the policy using numactl. More information is provided in this answer.
In practice, HPC applications generally use manual binding so to improve performance. OS scheduler are generally not very good to bind thread automatically efficiently. Few years ago, there was even bugs in the scheduler causing inefficient behaviours: see The Linux Scheduler: a Decade of Wasted Cores. To my knowledge, such problem is not so uncommon in this field and not restricted to Linux. Efficient NUMA-aware OS scheduling is far from being easy.

Multiprocessing vs multithreading misconception?

From my understanding, multithreading means under one process, multiple threads that containing instructions, registers, stack, etc,
1, run concurrently on single thread/core cpu device
2, run parallelly on multi core cpu device (just for example 10 threads on 10 core cpu)
And multiprocessing I thought means different processes run parallelly on multi core cpu device.
And today after reading an article, it got me thinking if I am wrong or the article is wrong.
https://medium.com/better-programming/is-node-js-really-single-threaded-7ea59bcc8d64
Multiprocessing is the use of two or more CPUs
(processors) within a single computer system. Now, as there are
multiple processors available, multiple processes can be executed at a
time.
Isn't it the same as a multithreading process that runs on a multi core cpu device??
What did I miss? or maybe it's me not understanding multiprocessing fully.
Multiprocessing means running multiple processes in accordance to the operating system scheduling algorithm. Modern operating systems use some variation of time sharing to run user process in a pseudo-parallel mode. In presence of multiple cpus, the OS can take advantage of them and run some processes in real parallel mode.
Processes in contrast to threads are independent from each other in respect of memory and other process context. They could talk to each other using Inter Process Communication (IPC) mechanisms. Shared resources can be allocated for the processes and require process level synchronization to access them.
Threads, on the other hand share the same memory location and other process context. They can access the same memory location and need to be synchronized using thread synchronization techniques, like mutexes and conditional variables.
Both threads and processes are scheduled by the operating system in similar manner. So, the quote you provided is not completely correct. You do not need multiple cpus for multi-processing, however you need them to allow few processes to really run at the same time. There could be as many processes as cores which run simultaneously, however other processes will share the cpus as well in time-sharing manner.

tasks Scheduler and CPU isolation in Linux

I'm a kernel noob including schedulers. I understand that there is a IO scheduler and a task scheduler and according to this post IO scheduler uses normal tasks that are handled by the task schedule in the end.
So if I run an user space thread that was assigned to an isolated core (using isolcpus) and it will do some IO operation, will the the
task created by the IO scheduler get executed on the isolated core ?
Since CFS seems to favor user interaction does this mean that CPU intensive threads might get a lower CPU time in the long run?
Isolating cores can help mitigate this issue?
Isolating cores can decrease the scheduling latency (the time it takes for a thread that was marked as runnable to get executed ) for
the threads that are pined to the isolated cores?
So if I run an user space thread that was assigned to an isolated core
(using isolcpus) and it will do some IO operation, will the the task
created by the IO scheduler get executed on the isolated core ?
What isolcpus is doing is taking that particular core out of kernel list of cpu where it can schedule tasks. So once you isolate a cpu from kernel's list of cpus it will never schedule any task on that core, no matter whether that core is idle or is being used by some other process/thread.
Since CFS seems to favor user interaction does this mean that CPU
intensive threads might get a lower CPU time in the long run?
Isolating cores can help mitigate this issue?
Isolating cpu has a different use altogether in my opinion. Basically if your applications has both fast threads(threads with no system calls, and are latency sensitive) and slow threads(threads with system calls) you would want to have dedicated cpu cores for your fast threads so that they are not interrupted by kernel's scheduling process and hence can run to their completion without any noise. Fast threads are usually latency sensitive. On the other hand slow threads or threads which are not really latency sensitive and are doing supporting logic for your application need not have dedicated cpu cores. As mentioned earlier isloting cpu servers a different purpose. We do all this all the time in our organization.
Isolating cores can decrease the scheduling latency (the time it takes
for a thread that was marked as runnable to get executed ) for the
threads that are pined to the isolated cores?
Since you are taking cpus from kernel's list of cpus this will surely impact other threads and processes, but then again you would want to pay extra thought and attention to what really is your latency sensitive code and you would want to separate it from your non-latency sensitive code.
Hope it helps.

will schedule threads of the same process to different cores benefitial

Well, I'm currently working on Linux scheduler, and i wonder if there is a situation that threads run on different cores will accelerate process in Linux. i already heard that pin threads of the same process to different cores will make cache 'hot', thus it's beneficial. however, if not pinned, but try to dynamic binding threads to different cores, will it have any benefits or what's the pitfalls? Thanks!

What is the safe number of Threads to be initialised in Windows Phone

I am aware that Windows Phone devices are diverse in hardware, especially CPU resources.
But, by practical experience, does anybody know about that number of Threads that can be run simultaneously so as to prevent device performance issues and battery consumption.
By threads, i mean, the ones initialized by Thread class, using .Start() method.
The maximum number of thread safely possible in WP depends on the amount of resources consumed and work done by the Threads. The more the resources / work done by threads, more will be load on CPU.
Try using Thread Pool for better performance.
Hope it satisfies your query :)

Resources