Purpose of multiprocessors and multi-core processor - multithreading

I do want to clarify things in my head and model concrete knowledge. dual-core with one processor system, only two threads within the one process can be executed concurrently by each core. Uni-core with two processor system, two different process can be executed by each CPU.
So can we say, each processor can execute processes concurrently. While multi-core processor execute threads within the process concurrently?

I think you have a fundamental misunderstanding of what a process and thread are and how they relate to the hardware itself.
A CPU core can only execute 1 machine level instruction per clock cycle (so essentially, just 1 assembly instruction). CPU's are typically measured by the number of clock cycles they go through in a second. So a 2.5 GHz core can execute 2.5 billion instructions per second.
The OS (the operating system, like Windows, Linux, macOS, Android, iOS, etc.) is responsible for launching programs and giving them access to the hardware resources. Each program can be considered a "process".
Each process can launch multiple threads.
To ensure that multiple processes can share the same hardware resources, the idea of pre-emptive computing came about over 40 years ago.
In a nut-shell, pre-emptive computing, or time-slicing, is a function of the OS. It basically gives a few milliseconds to each thread that is running, regardless of which process that thread is a part of, and keeps the "context" of each thread so that the state of each thread can be handled appropriately when it's time for that thread to run; that's also known as a context switch.
A dual, quad, or even 128 core CPU does not change that, nor will the amount of CPU's in the system (e.g. 4 CPU's each with 128 cores). Each core can only execute 1 instruction per clock cycle.
What changes is how many instructions can be run in true parallel. If my CPU has 16 cores, then that means it can execute 16 instructions per clock cycle, and thus run 16 separate threads of execution without any context switching being necessary (though it does still happen, but that's a different issue).
This doesn't cover hyper-threading, in which 1 core can execute 2 instructions per cycle, essentially doubling your CPU count, and doesn't cover the idea of cache-misses or other low-level ideas in which extra cycles could be spent on a thread, but it covers the general idea of CPU scheduling.

Related

What does it mean when we say "4 cores 8 threads"?

When I run lscpu on my host, it shows
CPU(s): 8
Thread(s) per core: 2
Core(s) per socket: 4
My host has 4 physical CPUs, but 8 logical CPUs due to 2 threads per core. ok, "2 threads per core" means one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity? So this is parallel concept?
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel. In most cases one CPU can execute one thread at a time, right?
I'd like to confirm my understanding above is correct. Thanks
Ref for concurrent and parallel difference: What is the difference between concurrency and parallelism?
This concept is called Simultaneous multithreading (SMT). It is implemented in many processor, from x86-64 (both AMD and Intel) to POWER. The idea is to execute 2 threads concurrently. Some operation can be parallel regarding the specific target architecture.
one core can execute 2 threads simultaneously so as if we have doubled the CPU capacity?
No. Hardware threads (also called logical cores) are not equivalent to cores (ie. in opposition to physical cores). Some processor units are statically allocated for the hardware threads while some units are dynamically allocated for the hardware thread meaning the threads share the available resources.
The initial idea was to execute something useful when a core was stalling on some operations like memory reads. With 2 hardware threads, a core can execute the instructions of another thread if the current one is waiting on memory, for example due to a cache miss. Memory-bound parallel codes that are limited by the latency of the RAM like naive transpositions or linked-list traversals can benefit from this mechanism.
The SMT implementation has significantly improved over time. Especially in x86-64 processor recently. Nowadays, hardware threads of modern processor can execute computing instructions truly in parallel. For example, an Intel Skylake processor can execute up to 4 arithmetic instructions at a time per cycle, thanks to 4 ALUs. 1 thread can execute 4 instructions per cycle only if the instructions are independent (during the target cycles). This is not always possible as some loops are inherently sequential and do not contain enough independent instruction for each loop (eg. cumulative sum). With a 2-way SMT enabled, 2 software threads can be scheduled on the same core and the core can execute 2 instructions of each thread completely in parallel in a given cycle. It can even load balance the number of instruction regarding the needs of each thread in real time (eg. 1 vs 3 instructions per cycle). In the end, latency-bound codes can be up to 2 times faster on a 2-way SMT processor like Skylake. That being said, it does not speed up codes that can already fully use all the available processor computing units. For example, a parallel matrix multiplication using an optimized BLAS library will nearly always be slower with 2 software threads running per core than with only 1 software thread per core. The execution can be slower because hardware thread share some resources like caches and they can conflict each other with 2 threads per core running simultaneously. Put it shortly, efficient codes should not benefit from it, but people tends to write inefficient code and it is not rare for compilers to fail to generate efficient codes saturating computing units of a core (they often need some help).
While we have another concept that "one process can have multiple threads", I believe this means one process can handle multiple threads concurrently by switching context, but not necessarily in parallel.
I would like to set the record straight: software threads and hardware threads are two very different things despite the name.
A software thread is a logical OS unit that can be scheduled on a hardware thread. A hardware thread can be seen as a physical part of a processor core (it is actually a naive simplistic view). A software thread is a part of an OS process. The OS is responsible for the scheduling of the ready software threads. Processes are not scheduled, software threads are (at least on a modern OS). 2 software threads of 2 different processes can run in parallel on a processor with multiple cores (or even on some 2-way SMT cores).
In most cases one CPU can execute one thread at a time, right?
The term "CPU" is not clear here: it can mean different things regarding the context.
If "one CPU" means a modern microprocessor chip that is typically a multicore one nowadays, then definitively no. Software threads can truly run in parallel on different cores for examples.
If "one CPU" means a core (like often in high-performance computing), then it depends: a 1-way SMT core can execute only 1 thread at a time while a 2-way SMT core can execute 2 thread at a time.
On old microprocessor chip with 1 core and no SMT, it was true that one thread was running at a time and context switches was used to execute thread concurrently from the user point-of-view but not in parallel. This time is long gone (since nearly 2 decades) except maybe on some embedded microprocessor chips.
Is this...parallel?
Maybe.
Hyperthreading is Intel's trademark* for processor cores that have two complete sets of context registers. A hyperthreaded CPU can concurrently execute code on behalf of two threads without any intervention by the operating system (i.e., with no need for context switching.)
The extent to which those two concurrent executions actually are parallel executions varies from CPU model to model, and it depends on what the two threads actually are doing. For example (I'm just making this part up because it's been a few decades since I've needed to worry about any particular CPU architecture) if some "hyperthreaded" CPU has two integer ALUs per core, then the two threads might both be able to perform integer operations in parallel, but if it has only one FPU per core, then they would have to take turns using it.
Some Hyperthreaded CPU models have more duplicate execution units than others have, and so can parallelize more parts of the execution.
* AMD calls their similar capability, "2-way simultaneous multithreading."

Maximum number of threads and multithreading

I'm tripping up on the multithreading concept.
For example, my processor has 2 cores (and with hyper threading) 2 threads per core totaling to 4 thread. So does this mean my CPU can execute four separate instructions simultaneously? Is each thread capable of being a multi-thread?
So does this mean my CPU can execute four separate instructions simultaneously? Is each thread capable of being a multi-thread?
In short to both, yes.
A CPU can only execute 1 single instruction per phase in a clock cycle, due to certain factors like pipelining, a CPU might be able to pass multiple instructions through the different phases in a single clock cycle, and the frequency of the clock might be extremely fast, but it's still only 1 instruction at a time.
As an example, NOP is an x86 assembly instruction which the CPU interprets as "no operation this cycle" that's 1 instruction out of the hundreds or thousands (and more) that are executed from something even as simple as:
int main(void)
{
while (1) { /* eat CPU */ }
return 0;
}
A CPU thread of execution is one in which a series of instructions (a thread of instructions) are being executed, it does not matter from what "application" the instructions are coming from, a CPU does not know about high level concepts (like applications), that's a function of the OS.
So if you have a computer with 2 (or 4/8/128/etc.) CPU's that share the same memory (cache/RAM), then you can have 2 (or more) CPU's that can run 2 (or more) instructions at (literally) the exact same time. Keep in mind that these are machine instructions that are running at the same time (i.e. the physical side of the software).
An OS level thread is something a bit different. While the CPU handles the physical side of the execution, the OS handles the logical side. The above code breaks down into more than 1 instruction and when executed, actually gets run on more than 1 CPU (in a multi-CPU aware environment), even though it's a single "thread" (at the OS level), the OS schedules when to run the next instructions and on what CPU (based on the OS's thread scheduling policy, which is different amongst the various OS's). So the above code will eat up 100% CPU usage per a given "time slice" on that CPU it's running on.
This "slicing" of "time" (also known as preemptive computing) is why an OS can run multiple applications "at the same time", it's not literally1 at the same time since a CPU can only handle 1 instruction at a time, but to a human (who can barely comprehend the length of 1 second), it appears "at the same time".
1) except in the case with a multi-CPU setup, then it might be literally the same time.
When an application is run, the kernel (the OS) actually spawns a separate thread (a kernel thread) to run the application on, additionally the application can request to create another external thread (i.e. spawning another process or forking), or by creating an internal thread by calling the OS's (or programming languages) API which actually call lower level kernel routines that spawn and maintain the context switching of the spawned thread, additionally, any created thread is also capable of calling the same API's to spawn other separate threads (thus a thread is capable of being "multi-threaded").
Multi-threading (in the sense of applications and operating systems), is not necessarily portable, so while you might learn Java or C# and use their API's (i.e. Thread.Start or Runnable), utilizing the actual OS API's as provided (i.e. CreateThread or pthread_create and the slew of other concurrency functions) opens a different door for design decisions (i.e. "does platform X support thread library Y"); just something to keep in mind as you explore the different API's.
I hope that can help add some clarity.
I actually researched this very topic in my Operating Systems class.
When using threads, a good rule of thumb for increased performance for CPU bound processes is to use an equal number of threads as cores, except in the case of a hyper-threaded system in which case one should use twice as many cores. The other rule of thumb that can be concluded is for I/O bound processes. This rule is to quadruple the number threads per cores, except for the case of a hyper-threaded system than one can quadruple the number of threads per core.

What is the relationship between threads (in a Java or a C++ program) and number of cores in the CPU?

Can someone shed some light on it?
An i7 processor can run 8 threads but I am pretty sure we can create more than 8 threads in a JAVA or C++ program(not sure though). I have an i5 processor and while studying concurrency I have created 10 threads for assignments. I am just trying to understand how Core rating of CPU is related to threads.
The thread you are refering to is called a software thread; and you can create as many software threads as you need, as long as your operating system allows it. Each software thread, or code snippet, can run concurrently from the others.
For each core, there is at least one hardware thread to which the operating system can assign a software thread. If you have 8 cores, for example, then you have a hardware thread pool of capacity 8. You can map tens or hundreds of software threads to this 8-slot pool, where only 8 threads are actually running on hardware at the same time, i.e. in parallel.
Software threads are like people sharing the same computer. Each one can use this computer up to some time, not necessarily have his task completed, then give it up to another.
Hardware threads are like people having a computer for each of them. All of them can proceed with their tasks at the same time.
Note: For i7, there are two hardware threads (so called hyper-threading) in each core. So you can have up to 16 threads running in parallel.
There are already a couple of good answers talking about the hardware side of things, but there isn't much talk about the software side of things.
The essential fact that I believe you're missing is that not all threads have to be executing all the time. When you have thousands of threads on an 8 core machine, only a few of them are actually running at any given time. The others are sitting around doing nothing until some processor time becomes free. This has huge advantages because threads might be waiting on other resources, too. For example, if I have one thread trying to read a file from disk, then there's no reason for it to be taking up CPU time while it's waiting for the hard disk data to load into RAM. Another example is when the thread is waiting for a response from some other machine (such as a web request over the internet). When you have more threads than your processor can handle at once, the operating system and/or runtime (It depends on the OS and runtime implementation.) are responsible for deciding which threads should get available processor time. This kind of set up lets you maximize your machine's productivity because CPU cycles are doing something useful almost all the time.
A "thread" is a software abstraction which defines a single, self-consistent path of execution through the program: in most modern systems, the number of threads is basically only limited by memory. However, only a comparatively small number of threads can be run simultaneously by the CPU. Broadly speaking, the "core count" is how many threads the CPU can run truly in parallel: if there are more threads that want to run than there are cores available, the operating system will use time-slicing of some sort to let all the threads get some time to execute.
There are a whole bunch of terms which are thrown around when it comes to "cores:"
Processor count: the number of physical CPU chips on a system's motherboard. This used to be the only number that mattered, until CPUs with multiple cores became available.
Logical core count: the number of threads that the system hardware can run in parallel
Physical core count: the number of copies of the CPU execution hardware that the system has -- this is not always equal to the logical core count, due to features like SMT ("simultaneous multithreading") which use a single piece of hardware to run multiple threads in parallel
Module count: Recent (Bulldozer-derived) AMD processors have used an architecture which is a hybrid between SMT and the standard one-logical-core-per-physical-core model. On these CPUs, there is a separate copy of the integer execution hardware for each logical core, but two logical cores share a floating-point unit and the fetch-and-decode frontend; AMD calls the unit containing two logical cores a module.
It should also be briefly mentioned that GPUs, graphics cards, have enormous core counts and run huge numbers (thousands) of threads in parallel. The trade-off is that GPU cores have very little memory and often a substantially restricted programming model.
Threads are handled by the OS's scheduler. The number of cores in a CPU determine how many threads it can run at the same time.
Note that threads are constantly switched in and out by the scheduler to give the "illusion" that everything is running at the same time.
More here, if you're interested.
no, no, no... Your I7 has eight execution threads and can run 8 threads at once.
1000 threads or more can be waiting for processor time.
calling thread.sleep moves a thread off the execution core and back into memory where it waits until woken.

Running mlutiple threads in CPU

All we know that JVM schedules the user threads in a single CPU based machine .Why cant a single CP run mltiple process/threads in parallel,What is the constrain stops that capability
Also JVM is like a another software which is running in any machine,There may be thousands of other programs may waiting for the CPU cycle at a given time between this how JVM threads get the schedules from the CPU What is the parameter which gives the speed/possibility of the allocation of cycles for any process in any machine.
This is not really a Java question, but a cpu architecture question.
And some CPUs DO run multiple threads in parallel per core. Look at Intel and Hyperthreading.. a 4 core machine with 8 threads, does the opposite of what you suggest.
traditional single-core processors can only process one instruction at a time, meaning that they can only work in a single thread at any one point in time.
multithread support is achieved synthetically by giving threads 'turns' on the cpu so that they appear to be running concurrently.
multi-core processors can process an instruction per CPU at any one point in time.
this question is more in relation to CPU hardware design than programming and especially not a single language ie java as the restriction is across the board.

Threads vs Cores

Say if I have a processor like this which says # cores = 4, # threads = 4 and without Hyper-threading support.
Does that mean I can run 4 simultaneous program/process (since a core is capable of running only one thread)?
Or does that mean I can run 4 x 4 = 16 program/process simultaneously?
From my digging, if no Hyper-threading, there will be only 1 thread (process) per core. Correct me if I am wrong.
A thread differs from a process. A process can have many threads. A thread is a sequence of commands that have a certain order. A logical core can execute on sequence of commands. The operating system distributes all the threads to all the logical cores available, and if there are more threads than cores, threads are processed in a fast cue, and the core switches from one to another very fast.
It will look like all the threads run simultaneously, when actually the OS distributes CPU time among them.
Having multiple cores gives the advantage that less concurrent threads will be placed on one single core, less switching between threads = greater speed.
Hyper-threading creates 2 logical cores on 1 physical core, and makes switching between threads much faster.
That's basically correct, with the obvious qualifier that most operating systems let you execute far more tasks simultaneously than there are cores or threads, which they accomplish by interleaving the executing of instructions.
A system with hyperthreading generally has twice as many hardware threads as physical cores.
The term thread is generally used as a description of an operating system concept that has the potential to execute independently of other threads. Whether it does so depends on whether it is stuck waiting for some event (disk or screen I/O, message queue), or if there are enough physical CPUs (hyperthreaded or not) to allow it run in the face of other non-waiting threads.
Hyperthreading is a CPU vendor term that means a single core, that can multiplex its attention between two computations. The easy way to think about a hyperthreaded core is as if you had two real CPUs, both slightly slower than what the manufacture says the core can actually do.
Basically this is up to the OS. A thread is a high-level construct holding a instruction pointer, and where the OS places a threads execution on a suitable logical processor. So with 4 cores you can basically execute 4 instructions in parallell. Where as a thread simply contains information about what instructions to execute and the instructions placement in memory.
An application normally uses a single process during execution and the OS switches between processes to give all processes "equal" process time. When an application deploys multiple threads the processes allocates more than one slot for execution but shares memory between threads.
Normally you make a difference between concurrent and parallell execution. Where parallell execution is when you actually physically execute instructions of more than one logical processor and concurrent execution is the the frequent switching of a single logical processor giving the apperence of parallell execution.

Resources