What is a difference between CPU threads and program threads - multithreading

For example i5 7600k has 4 threads, but game can have more than 4 threads. What is the difference and why they have the same name?

A CPU that has 4 threads (really a CPU with 4 cores, or possibly a 2 core CPU with Hyperthreading) can execute 4 separate threads simultaneously. A program can have more threads than that, but only 4 of them can be executing at any given time - the others would be in a sleep/wait state while they wait for the CPU to become available.
As for how the CPU "becomes available" for other threads when there are more threads than it can execute at a given time, that's a function of the operating system scheduler. The operating system scheduler rotates threads on and off the CPU periodically (typically every few milliseconds) so that every thread that wants to execute eventually gets its turn on the CPU.
There's more to it than that, but hopefully that covers the gist of your question.

Related

Purpose of multiprocessors and multi-core processor

I do want to clarify things in my head and model concrete knowledge. dual-core with one processor system, only two threads within the one process can be executed concurrently by each core. Uni-core with two processor system, two different process can be executed by each CPU.
So can we say, each processor can execute processes concurrently. While multi-core processor execute threads within the process concurrently?
I think you have a fundamental misunderstanding of what a process and thread are and how they relate to the hardware itself.
A CPU core can only execute 1 machine level instruction per clock cycle (so essentially, just 1 assembly instruction). CPU's are typically measured by the number of clock cycles they go through in a second. So a 2.5 GHz core can execute 2.5 billion instructions per second.
The OS (the operating system, like Windows, Linux, macOS, Android, iOS, etc.) is responsible for launching programs and giving them access to the hardware resources. Each program can be considered a "process".
Each process can launch multiple threads.
To ensure that multiple processes can share the same hardware resources, the idea of pre-emptive computing came about over 40 years ago.
In a nut-shell, pre-emptive computing, or time-slicing, is a function of the OS. It basically gives a few milliseconds to each thread that is running, regardless of which process that thread is a part of, and keeps the "context" of each thread so that the state of each thread can be handled appropriately when it's time for that thread to run; that's also known as a context switch.
A dual, quad, or even 128 core CPU does not change that, nor will the amount of CPU's in the system (e.g. 4 CPU's each with 128 cores). Each core can only execute 1 instruction per clock cycle.
What changes is how many instructions can be run in true parallel. If my CPU has 16 cores, then that means it can execute 16 instructions per clock cycle, and thus run 16 separate threads of execution without any context switching being necessary (though it does still happen, but that's a different issue).
This doesn't cover hyper-threading, in which 1 core can execute 2 instructions per cycle, essentially doubling your CPU count, and doesn't cover the idea of cache-misses or other low-level ideas in which extra cycles could be spent on a thread, but it covers the general idea of CPU scheduling.

Threads vs cores when threads are asleep

I am looking to confirm my assumptions about threads and CPU cores.
All the threads are the same. No disk I/O is used, threads do not share memory, and each thread does CPU bound work only.
If I have CPU with 10 cores, and I spawn 10 threads, each thread will have its own core and run simultaneously.
If I launch 20 threads with a CPU that has 10 cores, then the 20 threads will "task switch" between the 10 cores, giving each thread approximately 50% of the CPU time per core.
If I have 20 threads but 10 of the threads are asleep, and 10 are active, then the 10 active threads will run at 100% of the CPU time on the 10 cores.
An thread that is asleep only costs memory, and not CPU time. While the thread is still asleep. For example 10,000 threads that are all asleep uses the same amount of CPU as 1 thread asleep.
In general if you have a series of threads that sleep frequently while working on a parallel process. You can add more threads then there are cores until get to a state where all the cores are busy 100% of the time.
Are any of my assumptions incorrect? if so why?
Edit
When I say the thread is asleep, I mean that the thread is blocked for a specific amount of time. In C++ I would use sleep_for Blocks the execution of the current thread for at least the specified sleep_duration
If we assume that you are talking about threads that are implemented using native thread support in a modern OS, then your statements are more or less correct.
There are a few factors that could cause the behavior to deviate from the "ideal".
If there are other user-space processes, they may compete for resources (CPU, memory, etcetera) with your application. That will reduce (for example) the CPU available to your application. Note that this will include things like the user-space processes responsible for running your desktop environment etc.
There are various overheads that will be incurred by the operating system kernel. There are many places where this happens including:
Managing the file system.
Managing physical / virtual memory system.
Dealing with network traffic.
Scheduling processes and threads.
That will reduce the CPU available to your application.
The thread scheduler typically doesn't do entirely fair scheduling. So one thread may get a larger percentage of the CPU than another.
There are some complicated interactions with the hardware when the application has a large memory footprint, and threads don't have good memory locality. For various reasons, memory intensive threads compete with each other and can slow each other down. These interactions are all accounted as "user process" time, but they result in threads being able to do less actual work.
So:
1) If I have CPU with 10 cores, and I spawn 10 threads, each thread will have its own core and run simultaneously.
Probably not all of the time, due to other user processes and OS overheads.
2) If I launch 20 threads with a CPU that has 10 cores, then the 20 threads will "task switch" between the 10 cores, giving each thread approximately 50% of the CPU time per core.
Approximately. There are the overheads (see above). There is also the issue that time slicing between different threads of the same priority is fairly coarse grained, and not necessarily fair.
3) If I have 20 threads but 10 of the threads are asleep, and 10 are active, then the 10 active threads will run at 100% of the CPU time on the 10 cores.
Approximately: see above.
4) An thread that is asleep only costs memory, and not CPU time. While the thread is still asleep. For example 10,000 threads that are all asleep uses the same amount of CPU as 1 thread asleep.
There is also the issue that the OS consumes CPU to manage the sleeping threads; e.g. putting them to sleep, deciding when to wake them, rescheduling.
Another one is that the memory used by the threads may also come at a cost. For instance if the sum of the memory used for all process (including all of the 10,000 threads' stacks) is larger than the available physical RAM, then there is likely to be paging. And that also uses CPU resources.
5) In general if you have a series of threads that sleep frequently while working on a parallel process. You can add more threads then there are cores until get to a state where all the cores are busy 100% of the time.
Not necessarily. If the virtual memory usage is out of whack (i.e. you are paging heavily), the system may have to idle some of the CPU while waiting for memory pages to be read from and written to the paging device. In short, you need to take account of memory utilization, or it will impact on the CPU utilization.
This also doesn't take account of thread scheduling and context switching between threads. Each time the OS switches a core from one thread to another it has to:
Save the the old thread's registers.
Flush the processor's memory cache
Invalidate the VM mapping registers, etcetera. This includes the TLBs that #bazza mentioned.
Load the new thread's registers.
Take performance hits due to having to do more main memory reads, and vm page translations because of previous cache invalidations.
These overheads can be significant. According to https://unix.stackexchange.com/questions/506564/ this is typically around 1.2 microseconds per context switch. That may not sound much, but if your application is switching threads rapidly, that could amount to many milliseconds in each second.
As already mentioned in the comments, it depends on a number of factors. But in a general sense your assumptions are correct.
Sleep
In the bad old days a sleep() might have been implemented by the C library as a loop doing pointless work (e.g. multiplying 1 by 1 until the required time had elapsed). In that case, the CPU would still be 100% busy. Nowadays a sleep() will actually result in the thread being descheduled for the requisite time. Platforms such as MS-DOS worked this way, but any multitasking OS has had a proper implementation for decades.
10,000 sleeping threads will take up more CPU time, because the OS has to make scheduling judgements every timeslice tick (every 60ms, or thereabouts). The more threads it has to check for being ready to run, the more CPU time that checking takes.
Translate Lookaside Buffers
Adding more threads than cores is generally seen as OK. But you can run into a problem with Translate Lookaside Buffers (or their equivalents on other CPUs). These are part of the virtual memory management side of the CPU, and they themselves are effectively content address memory. This is really hard to implement, so there's never that much of it. Thus the more memory allocations there are (which there will be if you add more and more threads) the more this resource is eaten up, to the point where the OS may have to start swapping in and out different loadings of the TLB in order for all the virtual memory allocations to be accessible. If this starts happenging, everything in the process becomes really, really slow. This is likely less of a problem these days than it was, say, 20 years ago.
Also, modern memory allocators in C libraries (and thence everything else built on top, e.g. Java, C#, the lot) will actually be quite careful in how requests for virtual memory are managed, minising the times they actually have to as the OS for more virtual memory. Basically they seek to provide requested allocations out of pools they've already got, rather than each malloc() resulting in a call to the OS. This takes the pressure of the TLBs.

How many cores does a process occupy?

Lets say I have 4 core on my machine and I have a process that spawns 4 threads, while this is the current process scheduled, are all 4 of those cores reserved for the process' 4 threads?
That is a very complex question. However, I can help. As a general rule, 1 process only uses 1 core. Actually, 1 thread can only be executed by 1 core. If you have a dual core processor, it is literally 2 CPUs stuck together in the same pc. These are called physical processors. These physical proessors execute 1 thread. Although, some CPUs have 2 physical cores but are capable of running 4 threads simultaneously. These extra 2 threads are run on logical cores. They do not physically exist but logically exist to the cpu.
If by process you mean thread then yes 1 process 1 core. And you can run 4 threads on a cpu with 4 compute cores (the name with includes physical and logical cores because a single core cpu may only have 1 compute core).
If by process you mean program or process in the processes tab in the task manager, then it depends on how the program is written.
Judging by your question, if a process spawns 4 threads it depends at what place it is in the pool. There are thousands of threads waiting to be executed. The threads from each program or executable file do not have to be executed at the same time.
The 4 threads of your process are scheduled independently - the process itself isn't scheduled.
If all 4 threads are runnable at the same time, and there's no other higher priority runnable threads in the system, then all 4 threads may be scheduled simultaneously on your 4 cores.

How does more than one thread execute on a processor core

I wanted to know how does a multi-threaded program with more number of threads executes on a processor core. For example, my program has 12 threads and I am running it on a intel core-i5 machine. It has four CPUs. Will each core run 3 threads? I am confused because I have seen programs with 30 threads running on a 4 core machine.
Thanks
Each core would be able to execute one thread simultaneously. So if there are 30 threads and 4 cores, 26 threads will be waiting to get context switched to get executed. Something like, thread 1-4 runs for 200ms and then 5-8 runs for 200 ms and so on
The processor core is capable of executing one thread at a time. In a quad core, 4 threads are executed simultaneously. Not all the user space threads are executed simultaneously, the kernel threads also runs to schedule the next thread or do some other kernel tasks.

Does a process run threads in a sequential order?

The question is about multithreading. Say I have 3 threads, the main one, a child1, and a child2. Does the process executing these threads run it in an order that it works on one thread for a short amount of time, then works on the other, and so on and forth and keeps switching, or are the threads running without ever being stopped by the process? Somewhere I read that a thread gets stopped without finish, then another thread is worked on and stopped, then back to thread1 and so on on forth, but that wouldn't make any sense if any threads are stopped as the point of mutlithreading was that they are all concurrent and all run at the same time, but how does the processor do that?
This is in .Net/C#.
the scenario you describe is the way IS ran thread in the old age before multi-core
OS scheduled thread sequentially based in their priorities, but now... I suppose you have at least 2 core where 2 thread can run concurrently and the 3rd thread will be schedule and interrupt one of the other!!!!
The scenario you're describing is correct, except that one thread will normally be running at each time per processor core.
Simplified; if 3 threads are active on 4 cores, they will all always be allowed to run since there's always an available core to run them, while if 3 threads are active on 2 cores, only two can run at any time so they will have to take turns.
Operating systems schedule threads to execute on the available CPU cores (either real or virtual). In the past, most computers had single core CPUs, and thus only one thread could be executed at a time. Modern CPUs are typically 2, 4, or 8 core systems. Some of these cores are virtual, like Intel's hyperthreading CPUs which have twice as many virtual cores as physical cores.
However, there are almost always more threads than CPU cores available, so the OS will prioritize all of the threads on the system in order to run them as efficiently as possible. The threads created by your process may or may not truly run in parallel over any given time span, but you should assume that they will.

Resources