In what sense does each thread appears to the operating system as a separate CPU? - multithreading

In the book Modern Operating Systems,
Multithreading has implications for the operating system because each thread
appears to the operating system as a separate CPU. E.g., Consider a system with two
actual CPUs, each with two threads. The operating system will see this as four
CPUs.
I don't understand that. A thread is a light weighted process which in turn is a running program. A cpu is a hardware.
A thread runs on a cpu.
An OS manages the hardware directly including cpu, while processes (including threads) see the hardware indirectly via the abstraction provided by OS. How can an OS not know how many cpus are there?
In what sense does each thread appears to the operating system as a separate CPU?
Thanks.

The fact that a cpu have multiple physical cores, is one thing (cpu cores), but the fact that the cpu can have virtual cores, are usually called "threads" in the hardware context(but they are not the same as the programming term "threds"). The simple way to think about "threads" in a hardware context is the amount of cpu cores (notice however this is partially correct, and incorrect as well, but in order to understand the difference I would recommend looking at wikipedia, for example: https://en.wikipedia.org/wiki/Hyper-threading).

The statement is not "silly" and is a decent first approximation to understanding tasking and threading.
A task/thread has a set of registers, a memory address space (populated with RAM and sometimes ROM), and the ability to execute instructions. This is the same as a basic CPU. So one can conceive of a multi-tasking or multi-threading system as being a collection of CPUs.
(And, to carry this analogy further, there are situations where multiple threads/tasks are simulated on a single thread/task.)
Yes, modern CPUs have a lot of extra registers and controls for, eg, controlling tasks and threads, but that's going beyond the basics.
And, from the standpoint of multi-threading within an OS, it is true that there are essentially the same concerns about synchronization and atomicity whether you have multiple streams of execution running in multiple threads on a single CPU, or the multiple streams each running on its own CPU. The only major difference is with regard to cache coherence, and even for that case there exist systems where each thread on a single CPU has its own cache.

The various threads of a SMT CPU appear as separate cores to the OS. For example on x86 with hyperthreading, interprocessor interrupts apply to virtual cores. For example a SIPI (start-up ipi) addressed to "all" will fire up all threads, not just all actual physical cores.
The OS can know how many APs (application processors, contracted with the BSP, bootstrap processor, which is the one that starts when you turn the machine on) there are by observing how many times the code that you instructed them to start at (with the SIPI) runs (but this is better used a check), or by parsing the MP tables (definitely do that, you have to anyway in order to detect memory layout and devices and so on).

Related

cpu core threads vs process threads [duplicate]

What is the difference between software threads, hardware threads and java threads?
Are software threads, java threads and hardware threads independent or interdependent?
I am asking this because, I know Java threads are created inside a process with in jvm (java.exe).
Also is it true that these different process are executed on different hardware threads?
A "hardware thread" is a physical CPU or core. So, a 4 core CPU can genuinely support 4 hardware threads at once - the CPU really is doing 4 things at the same time.
One hardware thread can run many software threads. In modern operating systems, this is often done by time-slicing - each thread gets a few milliseconds to execute before the OS schedules another thread to run on that CPU. Since the OS switches back and forth between the threads quickly, it appears as if one CPU is doing more than one thing at once, but in reality, a core is still running only one hardware thread, which switches between many software threads.
Modern JVMs map java threads directly to the native threads provided by the OS, so there is no inherent overhead introduced by java threads vs native threads. As to hardware threads, the OS tries to map threads to cores, if there are sufficient cores. So, if you have a java program that starts 4 threads, and have 4 or more cores, there's a good chance your 4 threads will run truly in parallel on 4 separate cores, if the cores are idle.
Software threads are threads of execution managed by the operating system.
Hardware threads are a feature of some processors that allow better utilisation of the processor under some circumstances. They may be exposed to/by the operating system as appearing to be additional cores ("hyperthreading").
In Java, the threads you create maintain the software thread abstraction, where the JVM is the "operating system". Whether the JVM then maps Java threads to OS threads is the JVM's business (but it almost certainly does). And then the OS will be using hardware threads if they are available.
Hardware threads (e.g. Intel Hyperthreading) are a cheaper and slower alternative to having multiple-cores
Software threads are a software abstraction implemented by the (Linux) kernel:
either the kernel runs one software thread per CPU (or hyperthread)
or it fakes it with the scheduler by running a process for a bit, then a timer interrupt comes, then it switches to another process, and so on
Key to their implementation is the hardware provided and kernel configured separation between userland and kerneland: What are Ring 0 and Ring 3 in the context of operating systems?
I will now focus on hardware threads, which is the more obscure hardware question, with a focus on Intel's implementation which it calls Hyperthreading.
The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 8.7 "INTEL HYPER-THREADING TECHNOLOGY ARCHITECTURE" describes HT briefly. It contains the following diagram:
TODO it is slower by how much percent in average in real applications?
Hyperthreading is possible because modern single CPUs cores already execute multiple instructions at once with the instruction pipeline https://en.wikipedia.org/wiki/Instruction_pipelining
The instruction pipeline is a separation of functions inside of a single core to ensure that each part of the circuit is used at any given time: reading memory, decoding instructions, executing instructions, etc.
Hyperthreading separates functions further by using:
a single backend, which actually runs the instructions with its pipeline.
Dual core has two backends, which explains the greater cost and performance.
two front-ends, which take two streams of instructions and order them in a way to maximize pipelining usage of the single backend by avoiding hazards.
Dual core would also have 2 front-ends, one for each backend.
There are edge cases where instruction reordering produces no benefit, making hyperthreading useless. But it produces a significant improvement in average.
Two hyperthreads in a single core share further cache levels (TODO how many? L1?) than two different cores, which share only L3, see:
Multiple threads and CPU cache
How are cache memories shared in multicore Intel CPUs?
The interface that each hyperthread exposes to the operating system is similar to that of an actual core, and both can be controlled separately. Thus cat /proc/cpuinfo shows me 4 processors, even though I only have 2 cores with 2 hyperthreads each.
Operating systems can however take advantage of knowing which hyperthreads are on the same core to run multiple threads of a given program on a single core, which might improve cache usage.
This LinusTechTips video contains a light-hearted non-technical explanation: https://www.youtube.com/watch?v=wnS50lJicXc
Hardware threads can be thought of as the CPU cores, although each core can run multiple threads. Most of the CPUs mention how many threads can be run on each core (on linux, lscpu command gives this detail). These are the number of cores that can be used in parallel.
Software threads are abstraction to the hardware to make multi-processing possible. If you have multiple software threads but there are not multiple resources then these software threads are a way to run all tasks in parallel by allocating resources for limited time(or using some other strategy) so that it appears that all threads are running in parallel. These are managed by the operating system. Java thread is an abstraction at the JVM level.
I think you are mistaken. I never heard about hardware threads (unless you mean hyper threading on certain intel machines). Every process is a running representation of a program. Threads are simultaneous execution flows with in a process. Java thread definitions are mapped to system threads by JVM. Java used to have a concept of GreenThreads, which is no longer the case.

Threads vs processess: are the visualizations correct?

I have no background in Computer Science, but I have read some articles about multiprocessing and multi-threading, and would like to know if this is correct.
SCENARIO 1:HYPERTHREADING DISABLED
Lets say I have 2 cores, 3 threads 'running' (competing?) per core, as shown in the picture (HYPER-THREADING DISABLED). Then I take a snapshot at some moment, and I observe, for example, that:
Core 1 is running Thread 3.
Core 2 is running Thread 5.
Are these declarations (and the picture) correct?
A) There are 6 threads running in concurrency.
B) There are 2 threads (3 and 5) (and processes) running in parallel.
SCENARIO 2:HYPERTHREADING ENABLED
Lets say I have MULTI-THREADING ENABLED this time.
Are these declarations (and the picture) correct?
C) There are 12 threads running in concurrency.
D) There are 4 threads (3,5,7,12) (and processes) running in 'almost' parallel, in the vcpu?.
E) There are 2 threads (5,7) running 'strictlÿ́' in parallel?
A process is an instance of a program running on a computer. The OS uses processes to maximize utilization, support multi-tasking, protection, etc.
Processes are scheduled by the OS - time sharing the CPU. All processes have resources like memory pages, open files, and information that defines the state of a process - program counter, registers, stacks.
In CS, concurrency is the ability of different parts or units of a program, algorithm or problem to be executed out-of-order or in a partial order, without affecting the final outcome.
A "traditional process" is when a process is an OS abstraction to present what is needed to run a single program. There is NO concurrency within a "traditional process" with a single thread of execution.
However, a "modern process" is one with multiple threads of execution. A thread is simply a sequential execution stream within a process. There is no protection between threads since they share the process resources.
Multithreading is when a single program is made up of a number of different concurrent activities (threads of execution).
There are a few concepts that need to be distinguished:
Multiprocessing is whenwe have Multiple CPUs.
Multiprogramming when the CPU executes multiple jobs or processes
Multithreading is when the CPU executes multiple mhreads per Process
So what does it mean to run two threads concurrently?
The scheduler is free to run threads in any order and interleaving a FIFO or Random. It can choose to run each thread to completion or time-slice in big chunks or small chunks.
A concurrent system supports more than one task by allowing all tasks to make progress. A parallel system can perform more than one task simultaneously. It is possible though, to have concurrency without parallelism.
Uniprocessor systems provide the illusion of parallelism by rapidly switching between processes (well, actually, the CPU schedulers provide the illusion). Such processes were running concurrently, but not in parallel.
Hyperthreading is Intel’s name for simultaneous multithreading. It basically means that one CPU core can work on two problems at the same time. It doesn’t mean that the CPU can do twice as much work. Just that it can ensure all its capacity is used by dealing with multiple simpler problems at once.
To your OS, each real silicon CPU core looks like two, so it feeds each one work as if they were separate. Because so much of what a CPU does is not enough to work it to the maximum, hyperthreading makes sure you’re getting your money’s worth from that chip.
There are a couple of things that are wrong (or unrealistic) about your diagrams:
A typical desktop or laptop has one processor chipset on its motherboard. With Intel and similar, the chipset consists of a CPU chip together with a "northbridge" chip and a "southbridge" chip.
On a server class machine, the motherboard may actually have multiple CPU chips.
A typical modern CPU chip will have more than one core; e.g. 2 or 4 on low-end chips, and up to 28 (for Intel) or 64 (for AMD) on high-end chips.
Hyperthreading and VCPUs are different things.
Hyperthreading is Intel proprietary technology1 which allows one physical to at as two logical cores running two independent instructions streams in parallel. Essentially, the physical core has two sets of registers; i.e. 2 program counters, 2 stack pointers and so on. The instructions for both instruction streams share instruction execution pipelines, on-chip memory caches and so on. The net result is that for some instruction mixes (non-memory intensive) you get significantly better performance than if the instruction pipelines are dedicated to a single instruction stream. The operating system sees each hyperthread as if it was a dedicated core, albeit a bit slower.
VCPU or virtual CPU terminology used in cloud computing context. On a typical cloud computing server, the customer gets a virtual server that behaves like a regular single or multi-core computer. In reality, there will typically be many of these virtual servers on a compute node. Some special software called a hypervisor mediates access to the hardware devices (network interfaces, disks, etc) and allocates CPU resources according to demand. A VCPU is a virtual server's view of a core, and is mapped to a physical core by the hypervisor. (The accounting trick is that VCPUs are typically over committed; i.e. the sum of VCPUs is greater than the number of physical cores. This is fine ... unless the virtual servers all get busy at the same time.)
In your diagram, you are using the term VCPU where the correct term would be hyperthread.
Your diagram shows each core (or hyperthread) associated with a distinct group of threads. In reality, the mapping from cores to threads is more fluid. If a core is idle, the operating system is free to schedule any (runnable) thread to run on it. (Some operating systems allow you to tie a given thread to a specific core for performance reasons. It is rarely necessary to do this.)
Your observations about the first diagram are correct.
Your observations about the second diagram are slightly incorrect. As stated above the hyperthreads on a core share the execution pipelines. This means that they are effectively executing at the same time. There is no "almost parallel". As I said, above, it is simplest to think of a hyperthread as a core "that runs a bit slower".
1 - Intel was not the first computer to com up with this idea. For example, CDC mainframes used this idea in the 1960's to get 10 PPUs from a single core and 10 sets of registers. This was before the days of pipelined architectures.

Can multiple OS processes run in parallel on multicore CPU?

So I got into a debate whether multicore CPU allows parallel execution of separate processes.
As far as I understand, each core allows executing different threads but they all have to belong to one process. Or am I wrong?
My reasoning is that, while each core has separate set of registers and L1/L2 cache (depending on hardware), they all have to share other stuff like L3 cache or TLB (I don't have a lot of knowledge about cpu architecture, so feel free to correct me).
I tried searching for an answer, but couldn't find any results (maybe the question is too dumb lol).
Thanks a lot in adance.
Multiple threads of multiple processes can be scheduled to run on a single core. Of course, at a given time only one thread runs on the core. The queue of processes to run on the core is managed by the scheduler. A good scheduler will provide to the core a good mix of CPU-bound and I/O-bound processes so that all of the components in the machine have well-balanced load.
So a multi-core CPU allows not only parallel but also concurrent execution of processes. On the other hand, a single core CPU can allow only parallel execution. No concurrency is there in single core machines.
All the resources of a core are given to the thread/process currently running on it (not in Hyper Threading though). The first resource that is in possession of multiple processes at the same time, if I'm not wrong, is Main Memory or RAM. All processes use some part of the RAM even when they are not running on the core. To load the process to the core a Process Control Block (PCB) is loaded from RAM by setting the registers, address spaces and stack to the same state which the process was in, when it was unloaded from the core to give time to another process.
The time quantum for each process varies from a few ms to a few hundred ms. Compared to that a L1/L2 cache access is a few ns and a main memory access is a few hundred ns. The image below should be interesting:
Two processes or threads can be run truly concurrently on separate cores provided they don't contend on a shared resource at the electronic level.
The most obvious thing to contend on in an Intel chip is the L3 cache and RAM. If you have two or more Intel chips they're talking to each other over QPI. Whilst this allows a cluster of CPUs each with their own memory controllers to operate in an SMP configuration, it becomes another thing to contend on if threads want data from another chip's memory.
In AMD chips each core has a memory controller, and Hypertransport does the job of synthesizing an SMP configuration. Pleasingly this makes all cores everywhere the pretty much the same, even in multi-chip systems (it's Hypertransport inside and outside the chips).
Both Intel and AMD have done excellent jobs of creating architectures that minimise the memory contention that occurs in a multi-core, symmetric multi-processing system without us having to think too hard about how we write software. If you want the absolute most out of your hardware you can program taking into account the underlying NUMA hardware architecture, and you may (it's really hard) reduce some of the contention that's going on.
Other things that might prevent true concurrent execution is if there's a specialised subsystem serving several cores. For example the UltraSPARC T1 shared a floating point unit between 8 cores. Obviously they can't all use it at once!
FPGAs are often seen as great things for parallelisable computations such as FFTs. However they have limited internal memory, and if the computation starts needing to store more data you have to use external RAM. That immediately limits the degree of parallelism that can be achieved, as different parts of the FPGA start contending for access to the external RAM. In such cases it is doubtful whether an FPGA is the right way to go; an FPGA clocked at 500MHz accessing RAM (which is also very slow, still) with no advanced onboard caching is not going to be as fast as a well design CPU with an advanced cache and multi memory controller subsystems.

What is the relationship between threads (in a Java or a C++ program) and number of cores in the CPU?

Can someone shed some light on it?
An i7 processor can run 8 threads but I am pretty sure we can create more than 8 threads in a JAVA or C++ program(not sure though). I have an i5 processor and while studying concurrency I have created 10 threads for assignments. I am just trying to understand how Core rating of CPU is related to threads.
The thread you are refering to is called a software thread; and you can create as many software threads as you need, as long as your operating system allows it. Each software thread, or code snippet, can run concurrently from the others.
For each core, there is at least one hardware thread to which the operating system can assign a software thread. If you have 8 cores, for example, then you have a hardware thread pool of capacity 8. You can map tens or hundreds of software threads to this 8-slot pool, where only 8 threads are actually running on hardware at the same time, i.e. in parallel.
Software threads are like people sharing the same computer. Each one can use this computer up to some time, not necessarily have his task completed, then give it up to another.
Hardware threads are like people having a computer for each of them. All of them can proceed with their tasks at the same time.
Note: For i7, there are two hardware threads (so called hyper-threading) in each core. So you can have up to 16 threads running in parallel.
There are already a couple of good answers talking about the hardware side of things, but there isn't much talk about the software side of things.
The essential fact that I believe you're missing is that not all threads have to be executing all the time. When you have thousands of threads on an 8 core machine, only a few of them are actually running at any given time. The others are sitting around doing nothing until some processor time becomes free. This has huge advantages because threads might be waiting on other resources, too. For example, if I have one thread trying to read a file from disk, then there's no reason for it to be taking up CPU time while it's waiting for the hard disk data to load into RAM. Another example is when the thread is waiting for a response from some other machine (such as a web request over the internet). When you have more threads than your processor can handle at once, the operating system and/or runtime (It depends on the OS and runtime implementation.) are responsible for deciding which threads should get available processor time. This kind of set up lets you maximize your machine's productivity because CPU cycles are doing something useful almost all the time.
A "thread" is a software abstraction which defines a single, self-consistent path of execution through the program: in most modern systems, the number of threads is basically only limited by memory. However, only a comparatively small number of threads can be run simultaneously by the CPU. Broadly speaking, the "core count" is how many threads the CPU can run truly in parallel: if there are more threads that want to run than there are cores available, the operating system will use time-slicing of some sort to let all the threads get some time to execute.
There are a whole bunch of terms which are thrown around when it comes to "cores:"
Processor count: the number of physical CPU chips on a system's motherboard. This used to be the only number that mattered, until CPUs with multiple cores became available.
Logical core count: the number of threads that the system hardware can run in parallel
Physical core count: the number of copies of the CPU execution hardware that the system has -- this is not always equal to the logical core count, due to features like SMT ("simultaneous multithreading") which use a single piece of hardware to run multiple threads in parallel
Module count: Recent (Bulldozer-derived) AMD processors have used an architecture which is a hybrid between SMT and the standard one-logical-core-per-physical-core model. On these CPUs, there is a separate copy of the integer execution hardware for each logical core, but two logical cores share a floating-point unit and the fetch-and-decode frontend; AMD calls the unit containing two logical cores a module.
It should also be briefly mentioned that GPUs, graphics cards, have enormous core counts and run huge numbers (thousands) of threads in parallel. The trade-off is that GPU cores have very little memory and often a substantially restricted programming model.
Threads are handled by the OS's scheduler. The number of cores in a CPU determine how many threads it can run at the same time.
Note that threads are constantly switched in and out by the scheduler to give the "illusion" that everything is running at the same time.
More here, if you're interested.
no, no, no... Your I7 has eight execution threads and can run 8 threads at once.
1000 threads or more can be waiting for processor time.
calling thread.sleep moves a thread off the execution core and back into memory where it waits until woken.

software threads vs hardware threads

What is the difference between software threads, hardware threads and java threads?
Are software threads, java threads and hardware threads independent or interdependent?
I am asking this because, I know Java threads are created inside a process with in jvm (java.exe).
Also is it true that these different process are executed on different hardware threads?
A "hardware thread" is a physical CPU or core. So, a 4 core CPU can genuinely support 4 hardware threads at once - the CPU really is doing 4 things at the same time.
One hardware thread can run many software threads. In modern operating systems, this is often done by time-slicing - each thread gets a few milliseconds to execute before the OS schedules another thread to run on that CPU. Since the OS switches back and forth between the threads quickly, it appears as if one CPU is doing more than one thing at once, but in reality, a core is still running only one hardware thread, which switches between many software threads.
Modern JVMs map java threads directly to the native threads provided by the OS, so there is no inherent overhead introduced by java threads vs native threads. As to hardware threads, the OS tries to map threads to cores, if there are sufficient cores. So, if you have a java program that starts 4 threads, and have 4 or more cores, there's a good chance your 4 threads will run truly in parallel on 4 separate cores, if the cores are idle.
Software threads are threads of execution managed by the operating system.
Hardware threads are a feature of some processors that allow better utilisation of the processor under some circumstances. They may be exposed to/by the operating system as appearing to be additional cores ("hyperthreading").
In Java, the threads you create maintain the software thread abstraction, where the JVM is the "operating system". Whether the JVM then maps Java threads to OS threads is the JVM's business (but it almost certainly does). And then the OS will be using hardware threads if they are available.
Hardware threads (e.g. Intel Hyperthreading) are a cheaper and slower alternative to having multiple-cores
Software threads are a software abstraction implemented by the (Linux) kernel:
either the kernel runs one software thread per CPU (or hyperthread)
or it fakes it with the scheduler by running a process for a bit, then a timer interrupt comes, then it switches to another process, and so on
Key to their implementation is the hardware provided and kernel configured separation between userland and kerneland: What are Ring 0 and Ring 3 in the context of operating systems?
I will now focus on hardware threads, which is the more obscure hardware question, with a focus on Intel's implementation which it calls Hyperthreading.
The Intel Manual Volume 3 System Programming Guide - 325384-056US September 2015 8.7 "INTEL HYPER-THREADING TECHNOLOGY ARCHITECTURE" describes HT briefly. It contains the following diagram:
TODO it is slower by how much percent in average in real applications?
Hyperthreading is possible because modern single CPUs cores already execute multiple instructions at once with the instruction pipeline https://en.wikipedia.org/wiki/Instruction_pipelining
The instruction pipeline is a separation of functions inside of a single core to ensure that each part of the circuit is used at any given time: reading memory, decoding instructions, executing instructions, etc.
Hyperthreading separates functions further by using:
a single backend, which actually runs the instructions with its pipeline.
Dual core has two backends, which explains the greater cost and performance.
two front-ends, which take two streams of instructions and order them in a way to maximize pipelining usage of the single backend by avoiding hazards.
Dual core would also have 2 front-ends, one for each backend.
There are edge cases where instruction reordering produces no benefit, making hyperthreading useless. But it produces a significant improvement in average.
Two hyperthreads in a single core share further cache levels (TODO how many? L1?) than two different cores, which share only L3, see:
Multiple threads and CPU cache
How are cache memories shared in multicore Intel CPUs?
The interface that each hyperthread exposes to the operating system is similar to that of an actual core, and both can be controlled separately. Thus cat /proc/cpuinfo shows me 4 processors, even though I only have 2 cores with 2 hyperthreads each.
Operating systems can however take advantage of knowing which hyperthreads are on the same core to run multiple threads of a given program on a single core, which might improve cache usage.
This LinusTechTips video contains a light-hearted non-technical explanation: https://www.youtube.com/watch?v=wnS50lJicXc
Hardware threads can be thought of as the CPU cores, although each core can run multiple threads. Most of the CPUs mention how many threads can be run on each core (on linux, lscpu command gives this detail). These are the number of cores that can be used in parallel.
Software threads are abstraction to the hardware to make multi-processing possible. If you have multiple software threads but there are not multiple resources then these software threads are a way to run all tasks in parallel by allocating resources for limited time(or using some other strategy) so that it appears that all threads are running in parallel. These are managed by the operating system. Java thread is an abstraction at the JVM level.
I think you are mistaken. I never heard about hardware threads (unless you mean hyper threading on certain intel machines). Every process is a running representation of a program. Threads are simultaneous execution flows with in a process. Java thread definitions are mapped to system threads by JVM. Java used to have a concept of GreenThreads, which is no longer the case.

Resources