SLURM nodes, tasks, cores, and cpus - multithreading

Would someone be able to clarify what each of these things actually are? From what I gathered, nodes are computing points within the cluster, essentially a single computer. Tasks are processes that can be executed either on a single node or on multiple nodes. And cores are basically how much of a CPU on a single node do you want to be allocated to executing the task assigned to that CPU. Is this correct? Am I confusing something?

The terms can have different meanings in different context, but if we stick to a Slurm context:
A (compute) node is a computer part of a larger set of nodes (a cluster). Besides compute nodes, a cluster comprises one or more login nodes, file server nodes, management nodes, etc. A compute node offers resources such as processors, volatile memory (RAM), permanent disk space (e.g. SSD), accelerators (e.g. GPU) etc.
A core is the part of a processor that does the computations. A processor comprises multiple cores, as well as a memory controller, a bus controller, and possibly many other components. A processor in the Slurm context is referred to as a socket, which actually is the name of the slot on the motherboard that hosts the processor. A single core can have one or two hardware threads. This is a technology that allows virtually doubling the number of cores the operating systems perceives while only doubling part of the core components -- typically the components related to memory and I/O and not the computation components. Hardware multi-threading is very often disabled in HPC.
a CPU in a general context refers to a processor, but in the Slurm context, a CPU is a consumable resource offered by a node. It can refer to a socket, a core, or a hardware thread, based on the Slurm configuration.
The role of Slurm is to match those resources to jobs. A job comprises one or more (sequential) steps, and each step has one or more (parallel) tasks. A task is an instance of a running program, i.e. at a process, possibly along with subprocesses or software threads.
Multiple tasks are dispatched on possibly multiple nodes depending on how many core each task needs. The number of cores a task needs depends on the number of subprocesses or software threads in the instance of the running program. The idea is to map each hardware thread to one core, and make sure that each task has all assigned cores assigned on the same node.

Related

Why is executing two threads on two logical cores better than executing two threads on one physical core?

Modern CPU specifications seem to always mention twice the number of threads for each core. If it is a 4-core processor, the number of threads mentioned is 8. If it is a 6-core processor, the number of threads is 12.
At first, this felt confusing since, as far as I am aware, a single physical core can execute only a single thread at a time. Furthermore, any number of threads can be executed "simultaneously" with context switching. So, why even mention the number of threads in specifications?
Intel claims that this is possible in their processors due to Hyper-threading, which is their implementation of Simultaneous multithreading. I believe AMD has their own version.
Intel's explanation is that they expose each physical core as two "logical" cores. I am wondering how that improves performance. Why is it more efficient to execute two threads on two logical cores, rather than execute two threads on a single physical core with the help of context switching? The two logical cores are backed by a single physical core anyway. So, how is Simultaneous multithreading making a difference?
There seems to be some hardware implementations that make it more efficient. My vague understanding, after going through the wiki of Simultaneous multithreading, is that instructions from multiple threads are actually being executed at the same time. Apparently, that is the key difference. But, I do not understand why exposing one physical core as two logical cores is necessary. Is it to make the operating system serve more threads at the same time to the CPU? The operating system would think are twice the number of cores, and therefore serve twice the number of threads to the CPU. The OS will control the context switching, while the physical cores use their hardware capability to simultaneously execute the two threads served to each of them via their respective logical cores. Is that what happens?
Follow-up
A follow-up question would be, why not just specify them as logical cores in specifications? Why call them threads? For example:
Intel® Core™ i7-11600H Processor
# of Physical Cores 6
# of Logical Cores 12

Do different threads in a process run on different physical cores of a multi-core processor need to assign contexts?

A process is the smallest unit for allocating resources. The thread is the smallest scheduling unit.
Does this mean that a process contains at least one thread? Is the
thread equal to the process when there is only one thread in the
process?
Many processors today are multi-core. Join I have a process P. There are two threads in this process P, A and B. I want A and B to run on core 0 and core 1 of a CPU respectively.
But we know that the process needs to be allocated resources. The process has the context. Is the context generally stored in a register? If so, different physical cores use different physical registers. Then when thread A and thread B run on Core 0 and Core 1, respectively.
So do these two cores need to allocate resources? In this case, how do these two threads maintain consistency? Each thread has its own resources, so hasn't this become two processes? Does this mean that different threads in a process running on different cores are the same as different processes running on different cores?
The vast majority of resources in an SMP system with the exception of registers and processing capacity are shared across all cores. This includes memory. So the operating system can schedule multiple threads on different cores, but all pointing to a shared set of process resources in memory.
CPU Caches are handled by the cores, using a cache coherency protocol. So long as the thread follows the memory model with correct use of memory barriers/atomic instructions, the memory visible through the cache should appear to be the same to all cores.

What is the difference in scheduling threads?

I am currently learning about simultaneous multi-threading, multi-core and multi-processor scheduling of threads. I checked some information, my understanding is:
If there is a processor that supports simultaneous multi-threading, it turns one physical core into two logical cores. There are two processes, P1 and P2.
My understanding: In Linux, each process is composed of at least one thread? So scheduling is based on thread scheduling?
P1 and P2 are respectively scheduled to two logical cores. They operate independently. This is the first situation. If there is a process P3, it consists of two threads t1 and t2. Schedule t1 and t2 to different logical cores respectively. So what is the difference between scheduling two different processes to separate logical cores and scheduling different threads in the same process to logical cores?
My understanding: The process is the smallest unit of the system to
allocate resources, and threads share the resources of the process.
Threads in a process share virtual memory, PCB, and can access the
same data. Therefore, when scheduling different threads in a process
and scheduling threads in different processes, there is no difference
for the processor. The difference lies in the address translation of
the page table and whether the cache can be shared. For a multi-core
processor, the processor does not care whether the threads belong to
the same process. The consistency of the data is guaranteed by MESI.
The physical location of the data is guaranteed by the page table.
Is my understanding correct?
Right, there's no difference. The kernel just schedules tasks; each user task refers to a page table (whether that's shared with any other task or not).
Each logical CPU core has its own page-table pointer (e.g. x86 CR3).
And yes, cache coherency is maintained by hardware. The Linux kernel's hand-rolled atomics (using volatile, and inline asm for RMWs and barriers) depend on that.

Threads vs processess: are the visualizations correct?

I have no background in Computer Science, but I have read some articles about multiprocessing and multi-threading, and would like to know if this is correct.
SCENARIO 1:HYPERTHREADING DISABLED
Lets say I have 2 cores, 3 threads 'running' (competing?) per core, as shown in the picture (HYPER-THREADING DISABLED). Then I take a snapshot at some moment, and I observe, for example, that:
Core 1 is running Thread 3.
Core 2 is running Thread 5.
Are these declarations (and the picture) correct?
A) There are 6 threads running in concurrency.
B) There are 2 threads (3 and 5) (and processes) running in parallel.
SCENARIO 2:HYPERTHREADING ENABLED
Lets say I have MULTI-THREADING ENABLED this time.
Are these declarations (and the picture) correct?
C) There are 12 threads running in concurrency.
D) There are 4 threads (3,5,7,12) (and processes) running in 'almost' parallel, in the vcpu?.
E) There are 2 threads (5,7) running 'strictlÿ́' in parallel?
A process is an instance of a program running on a computer. The OS uses processes to maximize utilization, support multi-tasking, protection, etc.
Processes are scheduled by the OS - time sharing the CPU. All processes have resources like memory pages, open files, and information that defines the state of a process - program counter, registers, stacks.
In CS, concurrency is the ability of different parts or units of a program, algorithm or problem to be executed out-of-order or in a partial order, without affecting the final outcome.
A "traditional process" is when a process is an OS abstraction to present what is needed to run a single program. There is NO concurrency within a "traditional process" with a single thread of execution.
However, a "modern process" is one with multiple threads of execution. A thread is simply a sequential execution stream within a process. There is no protection between threads since they share the process resources.
Multithreading is when a single program is made up of a number of different concurrent activities (threads of execution).
There are a few concepts that need to be distinguished:
Multiprocessing is whenwe have Multiple CPUs.
Multiprogramming when the CPU executes multiple jobs or processes
Multithreading is when the CPU executes multiple mhreads per Process
So what does it mean to run two threads concurrently?
The scheduler is free to run threads in any order and interleaving a FIFO or Random. It can choose to run each thread to completion or time-slice in big chunks or small chunks.
A concurrent system supports more than one task by allowing all tasks to make progress. A parallel system can perform more than one task simultaneously. It is possible though, to have concurrency without parallelism.
Uniprocessor systems provide the illusion of parallelism by rapidly switching between processes (well, actually, the CPU schedulers provide the illusion). Such processes were running concurrently, but not in parallel.
Hyperthreading is Intel’s name for simultaneous multithreading. It basically means that one CPU core can work on two problems at the same time. It doesn’t mean that the CPU can do twice as much work. Just that it can ensure all its capacity is used by dealing with multiple simpler problems at once.
To your OS, each real silicon CPU core looks like two, so it feeds each one work as if they were separate. Because so much of what a CPU does is not enough to work it to the maximum, hyperthreading makes sure you’re getting your money’s worth from that chip.
There are a couple of things that are wrong (or unrealistic) about your diagrams:
A typical desktop or laptop has one processor chipset on its motherboard. With Intel and similar, the chipset consists of a CPU chip together with a "northbridge" chip and a "southbridge" chip.
On a server class machine, the motherboard may actually have multiple CPU chips.
A typical modern CPU chip will have more than one core; e.g. 2 or 4 on low-end chips, and up to 28 (for Intel) or 64 (for AMD) on high-end chips.
Hyperthreading and VCPUs are different things.
Hyperthreading is Intel proprietary technology1 which allows one physical to at as two logical cores running two independent instructions streams in parallel. Essentially, the physical core has two sets of registers; i.e. 2 program counters, 2 stack pointers and so on. The instructions for both instruction streams share instruction execution pipelines, on-chip memory caches and so on. The net result is that for some instruction mixes (non-memory intensive) you get significantly better performance than if the instruction pipelines are dedicated to a single instruction stream. The operating system sees each hyperthread as if it was a dedicated core, albeit a bit slower.
VCPU or virtual CPU terminology used in cloud computing context. On a typical cloud computing server, the customer gets a virtual server that behaves like a regular single or multi-core computer. In reality, there will typically be many of these virtual servers on a compute node. Some special software called a hypervisor mediates access to the hardware devices (network interfaces, disks, etc) and allocates CPU resources according to demand. A VCPU is a virtual server's view of a core, and is mapped to a physical core by the hypervisor. (The accounting trick is that VCPUs are typically over committed; i.e. the sum of VCPUs is greater than the number of physical cores. This is fine ... unless the virtual servers all get busy at the same time.)
In your diagram, you are using the term VCPU where the correct term would be hyperthread.
Your diagram shows each core (or hyperthread) associated with a distinct group of threads. In reality, the mapping from cores to threads is more fluid. If a core is idle, the operating system is free to schedule any (runnable) thread to run on it. (Some operating systems allow you to tie a given thread to a specific core for performance reasons. It is rarely necessary to do this.)
Your observations about the first diagram are correct.
Your observations about the second diagram are slightly incorrect. As stated above the hyperthreads on a core share the execution pipelines. This means that they are effectively executing at the same time. There is no "almost parallel". As I said, above, it is simplest to think of a hyperthread as a core "that runs a bit slower".
1 - Intel was not the first computer to com up with this idea. For example, CDC mainframes used this idea in the 1960's to get 10 PPUs from a single core and 10 sets of registers. This was before the days of pipelined architectures.

Operating Systems - Organization Questions

I am studying for a 3 topic comprehensive exam that decides If I graduate or not, and have some questions on Operating System Organization
A) How does a multicore computer with shared memory differ from a distributed or a clustered system with respect to OS? Make specific reference to the OS Kernel.
B) Briefly explain the difference between processes and threads
C) Threads on a single core system are often handled in User mode. Explain why this is not acceptable on a multicore computer
D) Explain at least 2 ways that the OS can handle threads on am ulticore computer
Here are my attempted answers.
A) Multicore is a single processor, which has multiple processors that work together to speed up the processing power, however since they share memory, the kernel already know the state of each other. Distributed and clustered systems use message passing, and must always alert the other kernel what the other is doing.
B) processes refer to the high level heavyweight task, which can usually be broken down into smaller individual tasks (threads). Threading a single process allows for the abstraction of multiprocessing, allowing concurrent actions to take place.
C) DO not know, but my guess is the OS must properly distribute tasks in Kernel mode
D) Assign processes per core, or assign threads per core. If you assign proccess per core, the core will iterate through all the threads of the process, while the other core works on another process. If you assign threads per core, each core will work on a group of threads that relate to the same process.
Please let me know if anyone has any thing that can help my understanding, especially on OS Organization Topics.
Thanks in advance
A. How does a multi-core computer differ from a distributed or a clustered system with respect to the OS?
a. Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. Clustered systems communicate via messages, while multiprocessors communicate via shared memory.
B. Briefly explain the difference between process and thread?
a. Both process and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in shared memory space, while processes run in separate memory spaces.
C. Threads on a single core system are often handled in user mode. Explain why this is not acceptable on a multicore computer.
a. A multithreaded application running on a traditional single computer chip would have to interleave threads. On a multicore chip however the threads could be spread across the available cores.
D. Explain atleast 2 ways the OS can handle threads on a multicore computer
a. Data Parallelism - divides the data up against multiple cores and perform the same task on each subset of the data.
b. Task Parallelism - divides the different tasks to be performed among different cores and perform them simultaneously.

Resources