I have a Process that invoke Multiple threads( say 6 Thread) .
What will be the impact of its performance once If I run it on a server machine with
6 CPU OR 4 CPU
What is the relation between Threads CPU and Cores inside each CPU.
I have read that, threads run in only different cores inside one CPU.is that true?
It depends.
If your tasks are CPU-bound with no pipeline stalls, then you'll get the best performance from spawning one thread per physical CPU core.
If your CPU-bound tasks have pipeline stalls from cache misses, branch mispredictions, dependencies, etc, then you can take advantage of Hyperthreading and spawn one thread per virtual core. On a CPU without Hyperthreading the number of virtual cores is equal to the number of physical cores.
If your tasks block for IO, then you can benefit from spawning many more threads than CPU cores. The Apache web server is an example of this approach.
Related
A process is the smallest unit for allocating resources. The thread is the smallest scheduling unit.
Does this mean that a process contains at least one thread? Is the
thread equal to the process when there is only one thread in the
process?
Many processors today are multi-core. Join I have a process P. There are two threads in this process P, A and B. I want A and B to run on core 0 and core 1 of a CPU respectively.
But we know that the process needs to be allocated resources. The process has the context. Is the context generally stored in a register? If so, different physical cores use different physical registers. Then when thread A and thread B run on Core 0 and Core 1, respectively.
So do these two cores need to allocate resources? In this case, how do these two threads maintain consistency? Each thread has its own resources, so hasn't this become two processes? Does this mean that different threads in a process running on different cores are the same as different processes running on different cores?
The vast majority of resources in an SMP system with the exception of registers and processing capacity are shared across all cores. This includes memory. So the operating system can schedule multiple threads on different cores, but all pointing to a shared set of process resources in memory.
CPU Caches are handled by the cores, using a cache coherency protocol. So long as the thread follows the memory model with correct use of memory barriers/atomic instructions, the memory visible through the cache should appear to be the same to all cores.
I'm trying to understand the usage of CPU cores with regard to concurrent threads and processes. Please see the below questions:
Assume I have 2 CPU cores. When there are 2 processes running, each process has only 1 thread. Are the two processes using the 2 cores?
Assume I have 2 CPU cores. When there is 1 process running, which has 2 threads. Are the two threads using the 2 cores?
Assume I have 2 CPU cores. When there are 2 processes running, each process has 2 threads. How are the two cores used by those processes and cores?
How to calculate the maximum real concurrent execution given CPU cores? What are other factor I should take into account?
1,2: Quite likely but not definitely. A portion of the system software determines what runs where. It would be unlikely to choose to keep a process or thread waiting for cpu attention when there is one that is otherwise idle, it isn't absolute.
Most processing involves some sort of transfer to and from a device, network, etc.. Typically this necessitates a period of inactivity waiting for the transfer to complete. During this inactivity, another process / thread can run on that cpu. So, if a given process is 30% cpu time and 70% I/O time, then I can run about 3 of them concurrently on a single cpu without degrading performance.
3,4: Like the paragraph above implies, depending upon the workload, their could be any distribution of the threads among the cpus. If the threads were all compute bound (100% cpu), most operating systems switch between them at a small enough granularity that all remain lively, and large enough that the switching has a minimal impact on them.
This scheduling may take other notions into consideration, such as data affinity. Recently touched bits of data are likely to remain in the cpu cache when a thread has relinquished it. The next time the thread is to be scheduled, it would be best to put it onto that cpu, to retain the effort required to warm the cache for it. It might also think that two threads of one process (address space) are more likely to share data, so should prefer the same cpu.
4: depending upon your system, there are likely to be many performance analysis tools available. Top, on UNIX-inspired systems is a simple tool which gives system wide utilization information, and the simple tool time will show how much time a process spent on a cpu vs real-world time. If you run each of your tasks sequentially, noting the cpu-time that they take, then time them running concurrently, the ratio between these cpu-times indicates the scaling factor of your concurrent app. Note that real-world time can be misleading because of io-overlap.
Lets say there's a machine with 8-cores CPU.
I'm creating 2 posix threads using standard pthread_create(...) function.
As I know there's no any garanties these threads always would be executed by a 2 different physical cores, but practically in 90% they will run simultaneously (or in parallel). At least for my cases I seen that top command shows 2 cpu's are running ... thus around 160-180% CPU usage
The question is:
What could be the scenario when 2 threads within a single process are running only on 1 physical core ?
Two cases:
1) The other physical cores are busy doing other stuff, so only one core gets used by this process. The two threads run in alternation on that core.
2) The physical core supports executing more than one thread concurrently using hyperthreading or something similar. The other physical cores are busy doing other stuff, so the best the scheduler can do is run both threads in a single physical core.
For multicore computing, one thing confusing me from the beginning is the model of multicore hardware is too abstracted from the real machine.
I worked on a laptop with a single intel processor, containing 4 cores and support hyperthreading which makes the num of logical cores 8.
Suppose I have a Java program implemented a concurrent algorithm (it is said Java would use the OS rule of thread scheduling, so JVM won't affect the scheduling), and the program is purely CPU-bound.
My observations:
if the program(process) running less than 8 threads, parallelism of work increases as the num of threads increasing
when the total threads num is larger than 8, the performance become complicated, but usually no more improvement than running 8 threads; and for some certain algorithm, it is much worse, i.e.time-consumption increased hugely than running 8 threads.
My knowledge of this:
As far as I know, the program I run is treated as a user process by the OS, and if the program created threads to try to gain parallelism, the OS will try to schedule these threads among the cores available.
Any threads of the process on a same core may share the total execution time of the process on that core.
My questions:
Suppose the CPU is only running my program. i.e. no other user processes.
if there is only 1 core of the CPU, the process will get no benefit of parallelism by multi-threading, since the total execution time of the process will not change. Is it true?
if there are more than one cores available, the OS will try to schedule the threads of the process evenly and fairly on different cores, and the process's threads on different cores get their own (extra)
execution time, therefore speeding up. Is it true?
if there are n threads and m cores, where n>m, than some core may run more than 1 thread of the process, which may even harm the speed-up of parallelism because of "context switch" among threads on
the same core and potentially side-effect of threads of the process running at different speed. Is this true?
Thanks very much!
if there is only 1 core of the CPU, the process will get no benefit of parallelism by multi-threading, since the total execution time of the process will not change. Is it true?
Only if you are 100% CPU-bound. If you have I/O waits, multiple threads can help a lot even on a single core.
if there are more than one cores available, the OS will schedule the threads of the process evenly and fairly on different cores
That seems to be up to the discretion of the OS. There could be all kinds of quota and priorities involved.
may even harm the speed-up of parallelism because of "context switch" among threads on the same core
It is true that there is overhead in managing extra threads (not just at the scheduling level, but also in synchronizing and communicating within your application), and if these threads cannot make productive use of otherwise idle CPU cores, then having less threads can actually improve performance.
Say if I have a processor like this which says # cores = 4, # threads = 4 and without Hyper-threading support.
Does that mean I can run 4 simultaneous program/process (since a core is capable of running only one thread)?
Or does that mean I can run 4 x 4 = 16 program/process simultaneously?
From my digging, if no Hyper-threading, there will be only 1 thread (process) per core. Correct me if I am wrong.
A thread differs from a process. A process can have many threads. A thread is a sequence of commands that have a certain order. A logical core can execute on sequence of commands. The operating system distributes all the threads to all the logical cores available, and if there are more threads than cores, threads are processed in a fast cue, and the core switches from one to another very fast.
It will look like all the threads run simultaneously, when actually the OS distributes CPU time among them.
Having multiple cores gives the advantage that less concurrent threads will be placed on one single core, less switching between threads = greater speed.
Hyper-threading creates 2 logical cores on 1 physical core, and makes switching between threads much faster.
That's basically correct, with the obvious qualifier that most operating systems let you execute far more tasks simultaneously than there are cores or threads, which they accomplish by interleaving the executing of instructions.
A system with hyperthreading generally has twice as many hardware threads as physical cores.
The term thread is generally used as a description of an operating system concept that has the potential to execute independently of other threads. Whether it does so depends on whether it is stuck waiting for some event (disk or screen I/O, message queue), or if there are enough physical CPUs (hyperthreaded or not) to allow it run in the face of other non-waiting threads.
Hyperthreading is a CPU vendor term that means a single core, that can multiplex its attention between two computations. The easy way to think about a hyperthreaded core is as if you had two real CPUs, both slightly slower than what the manufacture says the core can actually do.
Basically this is up to the OS. A thread is a high-level construct holding a instruction pointer, and where the OS places a threads execution on a suitable logical processor. So with 4 cores you can basically execute 4 instructions in parallell. Where as a thread simply contains information about what instructions to execute and the instructions placement in memory.
An application normally uses a single process during execution and the OS switches between processes to give all processes "equal" process time. When an application deploys multiple threads the processes allocates more than one slot for execution but shares memory between threads.
Normally you make a difference between concurrent and parallell execution. Where parallell execution is when you actually physically execute instructions of more than one logical processor and concurrent execution is the the frequent switching of a single logical processor giving the apperence of parallell execution.