What is the difference in scheduling threads? - linux

I am currently learning about simultaneous multi-threading, multi-core and multi-processor scheduling of threads. I checked some information, my understanding is:
If there is a processor that supports simultaneous multi-threading, it turns one physical core into two logical cores. There are two processes, P1 and P2.
My understanding: In Linux, each process is composed of at least one thread? So scheduling is based on thread scheduling?
P1 and P2 are respectively scheduled to two logical cores. They operate independently. This is the first situation. If there is a process P3, it consists of two threads t1 and t2. Schedule t1 and t2 to different logical cores respectively. So what is the difference between scheduling two different processes to separate logical cores and scheduling different threads in the same process to logical cores?
My understanding: The process is the smallest unit of the system to
allocate resources, and threads share the resources of the process.
Threads in a process share virtual memory, PCB, and can access the
same data. Therefore, when scheduling different threads in a process
and scheduling threads in different processes, there is no difference
for the processor. The difference lies in the address translation of
the page table and whether the cache can be shared. For a multi-core
processor, the processor does not care whether the threads belong to
the same process. The consistency of the data is guaranteed by MESI.
The physical location of the data is guaranteed by the page table.
Is my understanding correct?

Right, there's no difference. The kernel just schedules tasks; each user task refers to a page table (whether that's shared with any other task or not).
Each logical CPU core has its own page-table pointer (e.g. x86 CR3).
And yes, cache coherency is maintained by hardware. The Linux kernel's hand-rolled atomics (using volatile, and inline asm for RMWs and barriers) depend on that.

Related

What does it mean when we say "4 cores 8 threads"?

When I run lscpu on my host, it shows
CPU(s): 8
Thread(s) per core: 2
Core(s) per socket: 4
My host has 4 physical CPUs, but 8 logical CPUs due to 2 threads per core. ok, "2 threads per core" means one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity? So this is parallel concept?
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel. In most cases one CPU can execute one thread at a time, right?
I'd like to confirm my understanding above is correct. Thanks
Ref for concurrent and parallel difference: What is the difference between concurrency and parallelism?
This concept is called Simultaneous multithreading (SMT). It is implemented in many processor, from x86-64 (both AMD and Intel) to POWER. The idea is to execute 2 threads concurrently. Some operation can be parallel regarding the specific target architecture.
one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity?
No. Hardware threads (also called logical cores) are not equivalent to cores (ie. in opposition to physical cores). Some processor units are statically allocated for the hardware threads while some units are dynamically allocated for the hardware thread meaning the threads share the available resources.
The initial idea was to execute something useful when a core was stalling on some operations like memory reads. With 2 hardware threads, a core can execute the instructions of another thread if the current one is waiting on memory, for example due to a cache miss. Memory-bound parallel codes that are limited by the latency of the RAM like naive transpositions or linked-list traversals can benefit from this mechanism.
The SMT implementation has significantly improved over time. Especially in x86-64 processor recently. Nowadays, hardware threads of modern processor can execute computing instructions truly in parallel. For example, an Intel Skylake processor can execute up to 4 arithmetic instructions at a time per cycle, thanks to 4 ALUs. 1 thread can execute 4 instructions per cycle only if the instructions are independent (during the target cycles). This is not always possible as some loops are inherently sequential and do not contain enough independent instruction for each loop (eg. cumulative sum). With a 2-way SMT enabled, 2 software threads can be scheduled on the same core and the core can execute 2 instructions of each thread completely in parallel in a given cycle. It can even load balance the number of instruction regarding the needs of each thread in real time (eg. 1 vs 3 instructions per cycle). In the end, latency-bound codes can be up to 2 times faster on a 2-way SMT processor like Skylake. That being said, it does not speed up codes that can already fully use all the available processor computing units. For example, a parallel matrix multiplication using an optimized BLAS library will nearly always be slower with 2 software threads running per core than with only 1 software thread per core. The execution can be slower because hardware thread share some resources like caches and they can conflict each other with 2 threads per core running simultaneously. Put it shortly, efficient codes should not benefit from it, but people tends to write inefficient code and it is not rare for compilers to fail to generate efficient codes saturating computing units of a core (they often need some help).
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel.
I would like to set the record straight: software threads and hardware threads are two very different things despite the name.
A software thread is a logical OS unit that can be scheduled on a hardware thread. A hardware thread can be seen as a physical part of a processor core (it is actually a naive simplistic view). A software thread is a part of an OS process. The OS is responsible for the scheduling of the ready software threads. Processes are not scheduled, software threads are (at least on a modern OS). 2 software threads of 2 different processes can run in parallel on a processor with multiple cores (or even on some 2-way SMT cores).
In most cases one CPU can execute one thread at a time, right?
The term "CPU" is not clear here: it can mean different things regarding the context.
If "one CPU" means a modern microprocessor chip that is typically a multicore one nowadays, then definitively no. Software threads can truly run in parallel on different cores for examples.
If "one CPU" means a core (like often in high-performance computing), then it depends: a 1-way SMT core can execute only 1 thread at a time while a 2-way SMT core can execute 2 thread at a time.
On old microprocessor chip with 1 core and no SMT, it was true that one thread was running at a time and context switches was used to execute thread concurrently from the user point-of-view but not in parallel. This time is long gone (since nearly 2 decades) except maybe on some embedded microprocessor chips.
Is this...parallel?
Maybe.
Hyperthreading is Intel's trademark* for processor cores that have two complete sets of context registers. A hyperthreaded CPU can concurrently execute code on behalf of two threads without any intervention by the operating system (i.e., with no need for context switching.)
The extent to which those two concurrent executions actually are parallel executions varies from CPU model to model, and it depends on what the two threads actually are doing. For example (I'm just making this part up because it's been a few decades since I've needed to worry about any particular CPU architecture) if some "hyperthreaded" CPU has two integer ALUs per core, then the two threads might both be able to perform integer operations in parallel, but if it has only one FPU per core, then they would have to take turns using it.
Some Hyperthreaded CPU models have more duplicate execution units than others have, and so can parallelize more parts of the execution.
* AMD calls their similar capability, "2-way simultaneous multithreading."

Is synchronization faster on the same physical CPU core?

I have a question. If a thread modifies a variable, will the thread on the same physical core (a different hyperthread core) see the modification earlier than other cores? Or it has to wait until all the other cores see it?
I've been trying to pin two threads on the same physical core, but get performance degradation. I know it's because two cores share lots of resources. But in terms of synchronization. Will it help to put threads on the same physical core?
Thanks!
The answer is dependant of the platform (especially the underlying architecture). That being said, on the (mainstream) x86-64 architecture, threads sharing the same core communicate faster than threads on different cores or even different sockets. One main reason is that the two threads will often share the same L1 cache (and if not, the L2 cache). Thus, on thread can directly read what the other just wrote. Moreover, the threads can often run in parallel thanks to simultaneous multithreading (called Hyper-Threading on Intel CPUs) reducing the communication latency (no scheduling quantum to wait).
Meanwhile, threads on different cores will have to communicate through a (slow) bus or share data using the L3 cache (significantly slower than the L1/L2).
Then your workload is bound by communication (latency or throughput), it is often better to put threads close to each other (ie. on the same core). When the number of threads per core exceed the number of hardware thread, then performance decrease due to preemptive multitasking. When the workload is compute bound, it is better to put them on separate cores. Note that on modern x86 processors, threads working on the same core can even share the computing resources (ALUs) at the instruction level.

Do different threads in a process run on different physical cores of a multi-core processor need to assign contexts?

A process is the smallest unit for allocating resources. The thread is the smallest scheduling unit.
Does this mean that a process contains at least one thread? Is the
thread equal to the process when there is only one thread in the
process?
Many processors today are multi-core. Join I have a process P. There are two threads in this process P, A and B. I want A and B to run on core 0 and core 1 of a CPU respectively.
But we know that the process needs to be allocated resources. The process has the context. Is the context generally stored in a register? If so, different physical cores use different physical registers. Then when thread A and thread B run on Core 0 and Core 1, respectively.
So do these two cores need to allocate resources? In this case, how do these two threads maintain consistency? Each thread has its own resources, so hasn't this become two processes? Does this mean that different threads in a process running on different cores are the same as different processes running on different cores?
The vast majority of resources in an SMP system with the exception of registers and processing capacity are shared across all cores. This includes memory. So the operating system can schedule multiple threads on different cores, but all pointing to a shared set of process resources in memory.
CPU Caches are handled by the cores, using a cache coherency protocol. So long as the thread follows the memory model with correct use of memory barriers/atomic instructions, the memory visible through the cache should appear to be the same to all cores.

Operating Systems - Organization Questions

I am studying for a 3 topic comprehensive exam that decides If I graduate or not, and have some questions on Operating System Organization
A) How does a multicore computer with shared memory differ from a distributed or a clustered system with respect to OS? Make specific reference to the OS Kernel.
B) Briefly explain the difference between processes and threads
C) Threads on a single core system are often handled in User mode. Explain why this is not acceptable on a multicore computer
D) Explain at least 2 ways that the OS can handle threads on am ulticore computer
Here are my attempted answers.
A) Multicore is a single processor, which has multiple processors that work together to speed up the processing power, however since they share memory, the kernel already know the state of each other. Distributed and clustered systems use message passing, and must always alert the other kernel what the other is doing.
B) processes refer to the high level heavyweight task, which can usually be broken down into smaller individual tasks (threads). Threading a single process allows for the abstraction of multiprocessing, allowing concurrent actions to take place.
C) DO not know, but my guess is the OS must properly distribute tasks in Kernel mode
D) Assign processes per core, or assign threads per core. If you assign proccess per core, the core will iterate through all the threads of the process, while the other core works on another process. If you assign threads per core, each core will work on a group of threads that relate to the same process.
Please let me know if anyone has any thing that can help my understanding, especially on OS Organization Topics.
Thanks in advance
A. How does a multi-core computer differ from a distributed or a clustered system with respect to the OS?
a. Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. Clustered systems communicate via messages, while multiprocessors communicate via shared memory.
B. Briefly explain the difference between process and thread?
a. Both process and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in shared memory space, while processes run in separate memory spaces.
C. Threads on a single core system are often handled in user mode. Explain why this is not acceptable on a multicore computer.
a. A multithreaded application running on a traditional single computer chip would have to interleave threads. On a multicore chip however the threads could be spread across the available cores.
D. Explain atleast 2 ways the OS can handle threads on a multicore computer
a. Data Parallelism - divides the data up against multiple cores and perform the same task on each subset of the data.
b. Task Parallelism - divides the different tasks to be performed among different cores and perform them simultaneously.

Threads vs Cores

Say if I have a processor like this which says # cores = 4, # threads = 4 and without Hyper-threading support.
Does that mean I can run 4 simultaneous program/process (since a core is capable of running only one thread)?
Or does that mean I can run 4 x 4 = 16 program/process simultaneously?
From my digging, if no Hyper-threading, there will be only 1 thread (process) per core. Correct me if I am wrong.
A thread differs from a process. A process can have many threads. A thread is a sequence of commands that have a certain order. A logical core can execute on sequence of commands. The operating system distributes all the threads to all the logical cores available, and if there are more threads than cores, threads are processed in a fast cue, and the core switches from one to another very fast.
It will look like all the threads run simultaneously, when actually the OS distributes CPU time among them.
Having multiple cores gives the advantage that less concurrent threads will be placed on one single core, less switching between threads = greater speed.
Hyper-threading creates 2 logical cores on 1 physical core, and makes switching between threads much faster.
That's basically correct, with the obvious qualifier that most operating systems let you execute far more tasks simultaneously than there are cores or threads, which they accomplish by interleaving the executing of instructions.
A system with hyperthreading generally has twice as many hardware threads as physical cores.
The term thread is generally used as a description of an operating system concept that has the potential to execute independently of other threads. Whether it does so depends on whether it is stuck waiting for some event (disk or screen I/O, message queue), or if there are enough physical CPUs (hyperthreaded or not) to allow it run in the face of other non-waiting threads.
Hyperthreading is a CPU vendor term that means a single core, that can multiplex its attention between two computations. The easy way to think about a hyperthreaded core is as if you had two real CPUs, both slightly slower than what the manufacture says the core can actually do.
Basically this is up to the OS. A thread is a high-level construct holding a instruction pointer, and where the OS places a threads execution on a suitable logical processor. So with 4 cores you can basically execute 4 instructions in parallell. Where as a thread simply contains information about what instructions to execute and the instructions placement in memory.
An application normally uses a single process during execution and the OS switches between processes to give all processes "equal" process time. When an application deploys multiple threads the processes allocates more than one slot for execution but shares memory between threads.
Normally you make a difference between concurrent and parallell execution. Where parallell execution is when you actually physically execute instructions of more than one logical processor and concurrent execution is the the frequent switching of a single logical processor giving the apperence of parallell execution.

Resources