Multi-threading, how do concurrent threads work? [closed] - multithreading

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
If I have a dual core CPU, does it mean that it can run a maximum of 2 threads?
Then; if so, how does one run 4 concurrent threads, when they are seemingly limited by their CPU, to two? (Since it can only run maximum of 2 for a dual core PC).

This is a very big question.
Basically you are correct that with a dual-core CPU only two threads can be currently executing. However, more threads than two are actually scheduled to execute. Furthermore, a running thread can be interrupted at (almost) any time by the operating system, effectively halting execution of that thread to allow another thread to be run.
There are a lot of factors that weigh into how threads are interrupted and run. Each thread has a given "time-slice" in which to execute and after that time-slice has elapsed that thread may be stopped to allow other threads to execute (if any are waiting). There are also thread priorities that can be assigned that allow for higher priority tasks to take precedence over lower priority tasks.
Some work that can be offloaded from the main CPU (to the GPU or to a disk controller) can also be run in parallel with other threads.
Suggest that you read up on the basics.

Related

How many threads should I spawn for maximum performance? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
I am writing a Rust script that needs to brute force the solution to some calculation and is likely to run 2^80 times. That is a lot! I am trying to make it run as fast as possible and thus want to divide the burden to multiple threads. However if I understand correctly this only accelerates my script if the threads actually run on different cores, otherwise they will not truly run simultaneously but switch between one another when running..
How can I make sure they use different cores, and how can I know that no more cores are available?
TL;DR: Use std::thread::available_parallelism (or alternatively the num-cpus crate) to know how many threads to run and let your OS handle the rest.
Typically when you create a thread, the OS thread scheduler is given free liberty to decide where and when those threads execute, however it will do so in a way that best takes advantage of CPU resources. So of course if you use less threads than the system has available, you are potentially missing out on performance. If you use more than the number of available threads, that's not particularly a problem since the thread scheduler will try its best to balance the threads that have work to do, but more than the available threads would be a mall waste of memory, OS resources, and context-switches. Creating your threads to match the number of logical CPU cores on your system is the sweetspot, and the above function will get that.
You could tell the OS exactly which cores to run which threads by setting their affinity, however that isn't really advisable since it wouldn't particularly make anything faster unless you start really configuring your kernel or are really taking advantage of your NUMA nodes.

What does it mean by "user threads cannot take advantage of multithreading or multiprocessing"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
user threads cannot take advantage of multithreading or multiprocessing
source : wikipedia
Does this mean a CPU cannot efficiently execute multiple user threads simultaneously ?
Does this mean a CPU cannot switch between two or more user threads ?
For example : there are two user threads t0 and t1. t0 is the first one to execute. Will t1 only begin execution when t0 has finished or can switching take place ?
PS : This question might look like more than one question but I guess it is just one.
Here's what the page currently says:
Threads are sometimes implemented in userspace libraries, thus called user threads. The kernel is unaware of them, so they are managed and scheduled in userspace. Some implementations base their user threads on top of several kernel threads, to benefit from multi-processor machines (M:N model). In this article the term "thread" (without kernel or user qualifier) defaults to referring to kernel threads. User threads as implemented by virtual machines are also called green threads. User threads are generally fast to create and manage, but cannot take advantage of multithreading or multiprocessing and get blocked if all of their associated kernel threads get blocked even if there are some user threads that are ready to run.
As you can see, in one paragraph it is stating BOTH that user threads both can take advantage of multiprocessors (via associated kernel threads), AND that it cannot.
I suggest that you ask your question on the Wikipedia page's Talk page, and see if they authors can enlighten you as to what they mean ... and why they are saying it.
But what I think they are saying that user(-space) threads that aren't backed by multiple kernel threads typically cannot execute simultaneously on multiple cores.
However, I would hesitate to say that this is inherent to user threads per se; i.e. that it would be impossible to implement an OS in which a application could exploit multiple cores without any kernel assistance.

What is the difference between CPU threads and cores? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
My Intel CPU has 6 cores and 12 threads. I know that each core can do computation in parallel to other 5 cores. Thus, if I run a program on every 6 core, I get a 6 times speed up. But I cannot understand how that relates to threads. If I run my program on 12 threads of my 6 cores, will I get a 12 times speed up?
A thread is a "logical core", it has a full set of registers, uses its own virtual address space, and can perform anything a core can do, so in that sense - you have 12 cores.
However, a thread shares most of it's execution resources with its counterpart thread on the same core. Since modern cores can handle multiple instructions at the same time, having two (or more) threads allows you to essentially "throw" instructions from the 2 software threads into a large "pool", and have them executed whenever they're ready. If you have a single thread taking up full 100% of your core utilization, then you won't gain much from that, but if one of the threads leaves some empty slots, because it has branch mispredictions, data dependencies, long memory delays, or any other cause for inefficiency - the other thread sharing the core can use these slots instead, giving you a nice boost (since the alternative was to wait until the first thread finished its time slot and doing an expensive context switch).
In general, you can think of that in the following way - running 2 software threads on 2 cores would give you the best performance, running them on a single core with simultaneous multithreading would be slightly slower, especially in case you're bounded on execution (but less so if you're bounded for e.g. on memory latency). However if you don't have this feature, running the same 2 workloads on a single core would require you to run them one after the other (in timeslots), which would probably be much slower.
Edit: note that there are different ways of implementing this concept, see for e.g. - Difference between intel and AMD multithreading
A thread is a "simultaneous" computation on the same core. So one core can manage two threads and effectively acts as two cores. This is a very basic answer I'm afraid.

Best strategy to execute tasks with high branch divergency [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a project written a few years ago that do compute N similar tasks in a row on a single CPU core.
These N tasks are completely independent so they could be computed in parallel.
However, the problem with these tasks is that the control flow inside each task differs much from one task to other, so the SIMT approach implemented in CUDA will more likely impede than help.
I came up with an idea to launch N blocks with 1 thread in each to break the warp dependency for threads.
Can anyone suggest a better way to optimise the computations in this situation, or point out possible pitfalls with my solution.
You are right with your comment what causes and what is caused by divergence of threads in a warp. However, launching configuration mentioned by you (1 thread in each block) totally diminishes potential of GPU. Threads in a warp/half-warp is the maximal unit of threads that is eventually executed in parallel on a single multiprocessor. So, having one thread in the block and having 32 these blocks is actually as having 32 threads in the warp with different paths. First case is even worse because number resident blocks per multiprocessors is quite limited (8 or 16, depending on compute capability).
Therefore, if you want to fully exploit potential of GPU, keep in mind Jack's comment and try to reorganize threads so that threads of a single warp would follow equal execution path.

Two threads (x milliseconds time for one action) in One Core == 2x time? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I will explain the above question, I have two threads that each one of them do the same action that takes x milliseconds. if I have computer that has one core , Is it take about 2x milliseconds to do the two action ?
If the action is CPU-bound, basically meaning it only consists in computations, then yes the total wall-time will be a bit more than twice the time take by one thread due to context switching overhead.
If the action has some non negligible IO-related operations (read from memory, disk, or network), then two threads on a single core might take a bit more than the time needed with one thread, but not necessarily twice that time. If the OS is able to have one thread do IO while the other does computations, and alternate, then you might have both threads running in the same wall time as one single thread.
Yes. They will be executed one after the other or somehow interleaved but it total it will take the double time.
Yes of course. If you have two threads and one CPU core the threads will run one after the other, or in time slices. But it is not possible for one core to run more than one thread of execution at a time.
Unless hyperthreading is being used. But that makes one core look like two (or more) cores so that does not apply here.

Resources