INTRO
multiprocessing = using multiple CPU cores to complete a task (each core has separate memory, thus requires pipes and data structures for each core to "talk" to each other")
multithreading = using multiple threads (that are on a single CPU core) with a task scheduler to complete a task (all threads share same memory on CPU core)
static (temporal) multithreading - take advantage of idle I/O time by scheduling tasks to occur sequentially without pause during cache misses (i.e. waiting to read/write to an I/O device); used for I/O-bound tasks
dynamic (simultaneous) multithreading - take advantage of instructions that can happen at the same time (on Intel chips, this is called "Hyperthreading"); used for CPU-bound tasks
e.g.
a = b*c //Task 1
d = e*f //Task 2
g = a*d //Task 3
// Task 1 and 2 don't depend on each other, and hence can be run in parallel
QUESTION
Given the above, how can I control in LabVIEW which cores I use to multiprocess a task (not multithread)?
LabVIEW inherently parses the dataflow out to multiple processors and multiple threads to as much parallelism as the system is analyzed to stand. THERE ARE ALMOST ZERO CASES WHERE YOU SHOULD SPECIFY THE THREADING MODEL OF THE CODE. The Timed Loop and Timed Structure capabilities should be considered strictly for real-time systems, not for execution on the desktop systems (Windows, Mac, or Linux). If you attempt to specify the threading model, you will almost certainly get less performance than the sophisticated model already computed by the compiler and run-time engine.
As of NI LabVIEW version 8.5 the Timed Loop and Timed Sequence structures include a Processor input that allows you to manually assign available processors to handle the execution of the structures. You can configure the processor assignment by wiring an input to the Processor input of the Input Node for the structure or for frames of the structure.
http://www.ni.com/product-documentation/6400/en/
Related
When I run lscpu on my host, it shows
CPU(s): 8
Thread(s) per core: 2
Core(s) per socket: 4
My host has 4 physical CPUs, but 8 logical CPUs due to 2 threads per core. ok, "2 threads per core" means one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity? So this is parallel concept?
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel. In most cases one CPU can execute one thread at a time, right?
I'd like to confirm my understanding above is correct. Thanks
Ref for concurrent and parallel difference: What is the difference between concurrency and parallelism?
This concept is called Simultaneous multithreading (SMT). It is implemented in many processor, from x86-64 (both AMD and Intel) to POWER. The idea is to execute 2 threads concurrently. Some operation can be parallel regarding the specific target architecture.
one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity?
No. Hardware threads (also called logical cores) are not equivalent to cores (ie. in opposition to physical cores). Some processor units are statically allocated for the hardware threads while some units are dynamically allocated for the hardware thread meaning the threads share the available resources.
The initial idea was to execute something useful when a core was stalling on some operations like memory reads. With 2 hardware threads, a core can execute the instructions of another thread if the current one is waiting on memory, for example due to a cache miss. Memory-bound parallel codes that are limited by the latency of the RAM like naive transpositions or linked-list traversals can benefit from this mechanism.
The SMT implementation has significantly improved over time. Especially in x86-64 processor recently. Nowadays, hardware threads of modern processor can execute computing instructions truly in parallel. For example, an Intel Skylake processor can execute up to 4 arithmetic instructions at a time per cycle, thanks to 4 ALUs. 1 thread can execute 4 instructions per cycle only if the instructions are independent (during the target cycles). This is not always possible as some loops are inherently sequential and do not contain enough independent instruction for each loop (eg. cumulative sum). With a 2-way SMT enabled, 2 software threads can be scheduled on the same core and the core can execute 2 instructions of each thread completely in parallel in a given cycle. It can even load balance the number of instruction regarding the needs of each thread in real time (eg. 1 vs 3 instructions per cycle). In the end, latency-bound codes can be up to 2 times faster on a 2-way SMT processor like Skylake. That being said, it does not speed up codes that can already fully use all the available processor computing units. For example, a parallel matrix multiplication using an optimized BLAS library will nearly always be slower with 2 software threads running per core than with only 1 software thread per core. The execution can be slower because hardware thread share some resources like caches and they can conflict each other with 2 threads per core running simultaneously. Put it shortly, efficient codes should not benefit from it, but people tends to write inefficient code and it is not rare for compilers to fail to generate efficient codes saturating computing units of a core (they often need some help).
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel.
I would like to set the record straight: software threads and hardware threads are two very different things despite the name.
A software thread is a logical OS unit that can be scheduled on a hardware thread. A hardware thread can be seen as a physical part of a processor core (it is actually a naive simplistic view). A software thread is a part of an OS process. The OS is responsible for the scheduling of the ready software threads. Processes are not scheduled, software threads are (at least on a modern OS). 2 software threads of 2 different processes can run in parallel on a processor with multiple cores (or even on some 2-way SMT cores).
In most cases one CPU can execute one thread at a time, right?
The term "CPU" is not clear here: it can mean different things regarding the context.
If "one CPU" means a modern microprocessor chip that is typically a multicore one nowadays, then definitively no. Software threads can truly run in parallel on different cores for examples.
If "one CPU" means a core (like often in high-performance computing), then it depends: a 1-way SMT core can execute only 1 thread at a time while a 2-way SMT core can execute 2 thread at a time.
On old microprocessor chip with 1 core and no SMT, it was true that one thread was running at a time and context switches was used to execute thread concurrently from the user point-of-view but not in parallel. This time is long gone (since nearly 2 decades) except maybe on some embedded microprocessor chips.
Is this...parallel?
Maybe.
Hyperthreading is Intel's trademark* for processor cores that have two complete sets of context registers. A hyperthreaded CPU can concurrently execute code on behalf of two threads without any intervention by the operating system (i.e., with no need for context switching.)
The extent to which those two concurrent executions actually are parallel executions varies from CPU model to model, and it depends on what the two threads actually are doing. For example (I'm just making this part up because it's been a few decades since I've needed to worry about any particular CPU architecture) if some "hyperthreaded" CPU has two integer ALUs per core, then the two threads might both be able to perform integer operations in parallel, but if it has only one FPU per core, then they would have to take turns using it.
Some Hyperthreaded CPU models have more duplicate execution units than others have, and so can parallelize more parts of the execution.
* AMD calls their similar capability, "2-way simultaneous multithreading."
So what I am aware of is that Simultaneous Multithreading (Intel's Hyperthreading for example) enables a single CPU core to efficiently manage several threads at once. And most explainations I find is that it's like you have more than one core at your disposal. But what I'm wondering is if this is what is actually going on at a low level (machine level)? Or is it more like to the OS it just looks ike it is being operated on 2 cores, but in the end Simultaneous Multithreading just makes it much more efficient at going back and forth between two (or more) different threads, giving the illusion of having more than one core?
Simultancous multithreading is defined in "Simultaneous Multithreading: Maximizing On-Chip Parallelism" (Dean M. Tullsen et al., 1995, PDF) as "a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle" ("issue" means initiation of execution — an alternative use of the term means entering into an instruction scheduler). "Simultaneous" refers to the issue of instructions from different threads at the same time, distinguishing SMT from fine-grained multithreading that rapidly switches between threads in execution (e.g., choosing each cycle which thread's instructions to execute) and switch-on-event multithreading (which is more similar to OS-level context switches).
SMT implementations often interleave instruction fetch and decode and commit, making these pipeline stages look more like those of a fine-grain multithreaded or non-multithreaded core. SMT exploits an out-of-order superscalar already choosing dynamically between arbitrary (within a window) instructions recognizing that typically execution resources are not fully used. (In-order SMT provides relatively greater benefits since in-order execution lacks the latency hiding of out-of-order execution, but the pipeline control complexity is increased.)
A barrel processor (pure round-robin, fine-grained thread scheduling with nops issued for non-ready threads) with shared caches would look more like separate cores at 1/thread_count the clock frequency (and shared caches) since such lacks dynamic contention for execution resources. It is also arguable that having instructions from multiple threads in the processor pipeline at the same time represents parallel instruction processing; distinct threads can have instructions being processed (in different pipeline stages) at the same time. Even with switch-on-event multithreading, a cache miss can be processed in parallel with the execution of another thread, i.e., multiple instructions from another thread can be processed during the "processing" of a load instruction.
The distinction from OS-level context switching can be even more fuzzy if the ISA provides instructions that are not interrupt-atomic. For example, on x86 a timer interrupt can lead an OS to perform a context switch while a string instruction is in progress. In some sense, during the entire time slice of the other thread, the string instruction might be considered still to be "executing" since its operation was not completed. With hardware prefetching, some degree of forward progress of the earlier thread might, in theory, continue past the time when another thread starts running, so even a requirement of simultaneous activity in multiple threads might be satisfied. (If processing of long x86 string instructions was handed off to an accelerator, the instruction might run fully in parallel with another thread running on the core that initiated the instruction.)
I have no background in Computer Science, but I have read some articles about multiprocessing and multi-threading, and would like to know if this is correct.
SCENARIO 1:HYPERTHREADING DISABLED
Lets say I have 2 cores, 3 threads 'running' (competing?) per core, as shown in the picture (HYPER-THREADING DISABLED). Then I take a snapshot at some moment, and I observe, for example, that:
Core 1 is running Thread 3.
Core 2 is running Thread 5.
Are these declarations (and the picture) correct?
A) There are 6 threads running in concurrency.
B) There are 2 threads (3 and 5) (and processes) running in parallel.
SCENARIO 2:HYPERTHREADING ENABLED
Lets say I have MULTI-THREADING ENABLED this time.
Are these declarations (and the picture) correct?
C) There are 12 threads running in concurrency.
D) There are 4 threads (3,5,7,12) (and processes) running in 'almost' parallel, in the vcpu?.
E) There are 2 threads (5,7) running 'strictlÿ́' in parallel?
A process is an instance of a program running on a computer. The OS uses processes to maximize utilization, support multi-tasking, protection, etc.
Processes are scheduled by the OS - time sharing the CPU. All processes have resources like memory pages, open files, and information that defines the state of a process - program counter, registers, stacks.
In CS, concurrency is the ability of different parts or units of a program, algorithm or problem to be executed out-of-order or in a partial order, without affecting the final outcome.
A "traditional process" is when a process is an OS abstraction to present what is needed to run a single program. There is NO concurrency within a "traditional process" with a single thread of execution.
However, a "modern process" is one with multiple threads of execution. A thread is simply a sequential execution stream within a process. There is no protection between threads since they share the process resources.
Multithreading is when a single program is made up of a number of different concurrent activities (threads of execution).
There are a few concepts that need to be distinguished:
Multiprocessing is whenwe have Multiple CPUs.
Multiprogramming when the CPU executes multiple jobs or processes
Multithreading is when the CPU executes multiple mhreads per Process
So what does it mean to run two threads concurrently?
The scheduler is free to run threads in any order and interleaving a FIFO or Random. It can choose to run each thread to completion or time-slice in big chunks or small chunks.
A concurrent system supports more than one task by allowing all tasks to make progress. A parallel system can perform more than one task simultaneously. It is possible though, to have concurrency without parallelism.
Uniprocessor systems provide the illusion of parallelism by rapidly switching between processes (well, actually, the CPU schedulers provide the illusion). Such processes were running concurrently, but not in parallel.
Hyperthreading is Intel’s name for simultaneous multithreading. It basically means that one CPU core can work on two problems at the same time. It doesn’t mean that the CPU can do twice as much work. Just that it can ensure all its capacity is used by dealing with multiple simpler problems at once.
To your OS, each real silicon CPU core looks like two, so it feeds each one work as if they were separate. Because so much of what a CPU does is not enough to work it to the maximum, hyperthreading makes sure you’re getting your money’s worth from that chip.
There are a couple of things that are wrong (or unrealistic) about your diagrams:
A typical desktop or laptop has one processor chipset on its motherboard. With Intel and similar, the chipset consists of a CPU chip together with a "northbridge" chip and a "southbridge" chip.
On a server class machine, the motherboard may actually have multiple CPU chips.
A typical modern CPU chip will have more than one core; e.g. 2 or 4 on low-end chips, and up to 28 (for Intel) or 64 (for AMD) on high-end chips.
Hyperthreading and VCPUs are different things.
Hyperthreading is Intel proprietary technology1 which allows one physical to at as two logical cores running two independent instructions streams in parallel. Essentially, the physical core has two sets of registers; i.e. 2 program counters, 2 stack pointers and so on. The instructions for both instruction streams share instruction execution pipelines, on-chip memory caches and so on. The net result is that for some instruction mixes (non-memory intensive) you get significantly better performance than if the instruction pipelines are dedicated to a single instruction stream. The operating system sees each hyperthread as if it was a dedicated core, albeit a bit slower.
VCPU or virtual CPU terminology used in cloud computing context. On a typical cloud computing server, the customer gets a virtual server that behaves like a regular single or multi-core computer. In reality, there will typically be many of these virtual servers on a compute node. Some special software called a hypervisor mediates access to the hardware devices (network interfaces, disks, etc) and allocates CPU resources according to demand. A VCPU is a virtual server's view of a core, and is mapped to a physical core by the hypervisor. (The accounting trick is that VCPUs are typically over committed; i.e. the sum of VCPUs is greater than the number of physical cores. This is fine ... unless the virtual servers all get busy at the same time.)
In your diagram, you are using the term VCPU where the correct term would be hyperthread.
Your diagram shows each core (or hyperthread) associated with a distinct group of threads. In reality, the mapping from cores to threads is more fluid. If a core is idle, the operating system is free to schedule any (runnable) thread to run on it. (Some operating systems allow you to tie a given thread to a specific core for performance reasons. It is rarely necessary to do this.)
Your observations about the first diagram are correct.
Your observations about the second diagram are slightly incorrect. As stated above the hyperthreads on a core share the execution pipelines. This means that they are effectively executing at the same time. There is no "almost parallel". As I said, above, it is simplest to think of a hyperthread as a core "that runs a bit slower".
1 - Intel was not the first computer to com up with this idea. For example, CDC mainframes used this idea in the 1960's to get 10 PPUs from a single core and 10 sets of registers. This was before the days of pipelined architectures.
Can we take full benefits of Multi core architecture without Multi threading.?
Can we take full benefits of Multi core architecture without Multi threading.?
For conventional environments; you can take some of the benefits of multi-CPU without multi-threading (e.g. if you've got 8 CPUs and you're running 8 separate single-threaded processes, then...).
For non-conventional environments, who knows? For an example, maybe the entire system uses the actor model (software divided into separate/independent objects where each object is an event handler), where the OS has a queue of pending events, and each CPU does "get event from queue, execute the corresponding object's event handler for that event" in a loop. In this case you can say that there's no threads at all (just CPUs and events) and therefore there is no multi-threading.
Can we take full benefit of multicores without multithreading ? Definitely no. But we can still have some parallelism.
As already answered, we can have several independent processes running on different processors to improve global computer performances.
And it is still possible to do parallel processing by means of interprocess communication (IPC) as pipes or shared memory. For instance, if doing
taskset 0x01 sort | taskset 0x02 uniq
you will run two processes, sort on core 0 and uniq on core 1, and these process will communicate by a pipe (implemented in the shared memory). Note that this just an example and that OSes do run new processes on different cores without the taskset directive.
With posix shared memory IPC, you can do parallel processes running on different cores and exchanging data in a dedicated memory zone.
And you can use openMPI to run multiprocess parallel programs on a multicore. The shared memory will be used to implement MPI message passing.
But in either case, compared to multithreading, the programming burden will be higher and performances much lower.
I have been reading lately about system architecture and the topic of multi-threading has not been covered in detail with latest improvements in technology. I did my part of search, but could not find answers for the following:
The questions have are
1) Is multi-threading dependent on the system architecuture (CPU). do all CPU (single core) support multi-threading? If it does not, what happens to multi-threaded applications when run on those machines
It is cited here that
Intel CPUs support multithreading, but only two threads per CPU.
AMD CPUs do not support multithreading and AMD often sites Microsoft's
recommendations to turn off Hyperthreading on Intel CPUs when running applications
like peoplesoft and Exchange.
2) so what does it mean it say only two threads per CPU here. At any given time, CPU (single core) can process only thread. and the other thread is waiting to be processed correct?
3) how is it different from an application that spawns, say, 10 threads and waiting for them to be executed. If the CPU at the most can tackle only two threads, shouldn't programmer keep that fact in consideration when writing multi-threaded applications.
Even with multi-core processors (say quad-core) at the most 8 threads can be queued, but only 4 threads can be processed at the same time.
P.S: I have a read a little about hyper-threading but I am not sure if that is relevant here and if
all processors support hyper-threading
1) It depends on the operating system more than anything. Even for single core architectures, multi-threading can be supported, but the threads are not executing in parallel - The OS will context-switch between them.
2) Intel usually supports two-way hardware threading ( also called simultaneous multi-threading), where each thread is allocated a pipeline. So if you have a process with two threads they can both execute on the same core simultaneously.
3) See 1. Basically the operating system is going to allocate as many threads as it can to hardware before it plans to context-switch between the threads it couldn't allocate. This process is dependent on the OS's scheduler, and you can read about the Linux one to get a good idea of what's going on.
Edit: Hypethreading is basically the hardware threading feature I mentioned.
In your question CPU means core.
1) It does. I believe memory access on ARMs is in words, so write to char is not atomic
Also memory ordering differs Modern OSes (anything but DOS) support context switching: while one thread executes, others wait. Total number of threads in all Windows processes is about 1000. Common time quant (time to load CPU) is 1-10 ms. One core multithreading don't improve computational power but allows asynchronous tasks. For example GUI doesn't freeze during network activity. One threads waits net, another one responds to user activity.
2) Yes
3) It is common practice to spawn number of threads equal to number of (virtual) cores, ie number of cores in system for AMD and twice for Intel. It is true only for computational threads. Web server threads usually wait net and don't load CPU a lot, so it is better to spawn thousands of threads.
Hyperthreading is cool for tasks that wait RAM. While one thread waits data another one executes. For math it usually not increase performance. It is good for work with data that is not cache-friendly: lists, trees, hash tables that don't fit into cache.