Will a waiting thread still eat up cpu time? - multithreading

I'm trying to make a thread pool for a game engine and I've been considering how my system should react to third party libraries spawning their own threads.
From what I've read, it is ideal to only have one thread for each CPU you have access to. So if my third party physics update spawns four threads, it would be ideal to turn off four threads from my thread pool while it is running, then turn them back on afterwards, that way multiple threads are never contending over one CPU.
My question is about the underlying mechanics behind functionality like conditional variables. Since spawning threads is expensive, having four threads wait on a conditional variable and then notifying them when the physics is done seems like a much better option than joining four threads and re-spawning them afterwards. But if they are waiting on a variable, are the threads truly "asleep" or are they still contending for CPU resources in the background?

Although you did not write what platform you are programming on, in most implementations threads that are waiting consume little to no CPU resources.
They do however use some memory (to save the stack, etc.), so you should avoid spawning an excessive number of threads and trying to reuse them as much as possible, since as you noted, spawning a new thread is an expensive operation on most platforms.
Even though you did not provide a lot of information, I'm guessing that in your scenario letting the threads wait is a much better option, as a small number of threads will not use a lot of resources and possibly having to spawn new threads frequently will affect performance badly on almost all platforms.

Any good third party library should give you the option of running it's work through your thread pool, to avoid that problem in the first place.
For example here's the documentation on how you can do that with PhysX - https://developer.nvidia.com/sites/default/files/akamai/physx/Docs/TaskManager.html

Related

Does multithreading actually work in uniprocessor environment

Suppose we have a process with multiple threads in a uniprocessor.
Now I know that if we have several processes, only one of them will be processed at a time in a uniprocessor and hence the processes are not concurrent.
If my understanding is correct, similarly each thread will be processed at a time and not concurrent in a uniprocessor. Is this statement true? If so then does multithreading mean having more than one thread in a process and does not mean running multiple threads at a time? And does that mean there's no benefit of creating user threads in a uniprocessor environment?
TL;DR: threads are switching more often than processes and in real time we have an effect of concurrency because it is happens really fast.
when you wrote:
each thread will be processed at a time and not concurrent in a uni processor
Notice the word "concurrent", there is no real concurrency in uni processor, there is only effect of that thanks to the multiple number of context switches between processes.
Let's clarify something here, the single core of the CPU can handle one thread at a given time, each process has a main thread and (if needed) more threads running together. If a process A is now running and it has 3 threads: A1(main thread), A2, A3 all three will be running as long as process A is being processed by the CPU core. When a context switch occur process A is no longer running and now process B will run with his threads.
About this statement:
there's no benefit of creating user threads in a uni processor environment
That is not true. there is a benefit in creating threads, they are easier to create ("spawn" as in the books) and shearing the process heap memory. Creating a sub process ("child" as in the books) is a overhead comparing to a thread because a process need to have his own memory. For example each google chrome tab is a process not a thread, but this tab has multiple threads running concurrency with little responsibility.
If you are still somehow running a computer with just one, single-core, CPU, then you would be correct to observe that only one thread can be physically executing at one time. But that does not negate the value of breaking up the application into multiple threads and/or processes.
The essential benefit is concurrency. When one thread is waiting (e.g. for an input/output operation to complete), there is something else for the CPU to be doing in the meantime: it can be running a different thread that isn't waiting. With a carefully designed application, you can get much better utilization of every part of the hardware, more parallelism, and thus, more throughput.
My favorite go-to example is a fast food restaurant. About a dozen workers, each one doing different things, cooperate to bring your order to you. Even if one of them (say, "the fry guy") is standing around, someone else always has something to do. Several orders are in-process at once. This overlap, this "concurrency," is what you are shooting for – regardless of how many CPUs you have.
Multithreading is also commonly used with GUI applications that also need to do some kind of "heavy lifting." One thread handles the GUI interaction (and has no other real responsibilities) while other threads, with a slightly inferior priority (or "niceness") do the lifting. When a GUI event comes in, the GUI thread pre-empts the others and responds to it immediately, then of course goes right back to sleep again. But in this way the GUI always remains very responsive – even though the other threads are doing "heavy lifting" things, GUI messages are still handled very promptly. (I scooped-up about a 25% performance improvement by re-tooling an older application to use this approach, because the application was no longer "polling" for GUI events.)
The first question I ask about any thread is, "what does it wait for?" To me, a thread is defined by what event it waits for and what it does when that event happens.
Threads were in wide-spread use for at least a decade before multi-processor computers became commercially available. They are useful when you want to write a program that has to respond to un-synchronized events that come from multiple different sources. There's a few different ways to model a program like that. One way is to have a different thread to wait on each different event source. The next most popular is an event driven architecture in which there's a main loop that waits for all events and calls different event handler functions for each of the different kinds of event.
The multi-threaded style of program often is easier to read* because there's usually different activities going on inside the program, and the state of each activity can be implicit in the context (i.e., registers and call stack) of the thread that's driving it, while in the event-driven model, each activity's state must be explicitly encoded in some object.
The implicit-in-the-context way of keeping the state is much closer to the procedural style of coding a single activity that we learn as beginners.
*Easier to read does not mean that the code is easy to write without making bad and non-obvious mistakes!!
The main impetus for developing threads was Ada compliance. Prior to that, different operating systems had their own ways of handing multiple things at once. In eunuchs, the way to do more than one thing was to spin off a new process. In VMS, software interrupts (aka Asynchronous System Traps or Asynchronous Procedure Calls in Windoze). In those days (1970's) multiprocessor systems were rare.
One of the goals of Ada was to have a system independent way of doing things. It adopted the "task" which is effectively a thread. In order to support Ada, compiler developers had to include task (thread) libraries.
With the rise of multiprocessors, operating systems started to make threads (rather than processes) the basic schedulable unit in a system.
Threads then give a way for programs to handle multiple things simultaneously, even if there is only one processor. Sadly, support for threads in programming languages has been woefully lacking. Ada is the only major language I can think of that has real support for threads (tasks). Thread support in Java, for example, is a complete, sick joke. The result is threads are not as effective in practice as they could be.

Why does Dropbox use so many threads?

My understanding of threads is that you can only have one thread per core, two with hyper threading, before you start losing efficiency.
This computer has eight cores and so should work best with 8/16 threads then, yet many applications use several times that, especially Dropbox.
It also uses 95 threads while idling on my laptop, which only has 4 cores.
Why is this the case? Does it have so many threads for programming convenience, have I misunderstood threading efficiency or is it something else entirely?
I took a peek at the Mac version of the client, and it seems to be written in Python and it uses several frameworks.
A bunch of threads seem to be used in some in house actor system
They use nucleus for app analytics
There seems to be a p2p network
some networking threads (one per hype core)
a global pool (one per physical core)
many threads for file monitoring and thumbnail generation
task schedulers
logging
metrics
db checkpointing
something called infinite configuration
etc.
Most are idle.
It looks like a hodgepodge of subsystems, each starting their own threads, but they don't seem too expensive in terms of memory or CPU.
My understanding of threads is that you can only have one thread per core, two with hyper threading, before you start losing efficiency.
Nope, this is not true. I'm not sure why you think that, but it's not true.
As just the most obvious way to show that it's false, suppose you had that number of threads and one of them accessed a page of memory that wasn't in RAM and had to be loaded to disk. If you don't have any other threads that can run, then one core is wasted for the entire time it takes to read that page of memory from disk.
It's hard to address the misconception directly without knowing what flawed chain of reasoning led to it. But the most common one is that if you have more threads ready-to-run than you can execute at once, then you have lots of context switches and context switches are expensive.
But that is obviously wrong. If all the threads are ready-to-run, then no context switches are necessary. A context switch is only necessary if a running thread stops being ready-to-run.
If all context switches are voluntary, then the implementation can select the optimum number of context switches. And that's precisely what it does.
Having large numbers of threads causes you to lose efficiency if, and only if, lots of threads do a small amount of work and then become no longer ready-to-run while other waiting threads are ready-to-run. That forces the implementation to do a context even where it is not optimal.
Some applications that use lots of threads do in fact do this. And that does result in poor performance. But Dropbox doesn't.

Cost of a thread

I understand how to create a thread in my chosen language and I understand about mutexs, and the dangers of shared data e.t.c but I'm sure about how the O/S manages threads and the cost of each thread. I have a series of questions that all relate and the clearest way to show the limit of my understanding is probably via these questions.
What is the cost of spawning a thread? Is it worth even worrying about when designing software? One of the costs to creating a thread must be its own stack pointer and process counter, then space to copy all of the working registers to as it is moved on and off of a core by the scheduler, but what else?
Is the amount of stack available for one program split equally between threads of a process or on a first come first served?
Can I somehow check the hardware on start up (of the program) for number of cores. If I am running on a machine with N cores, should I keep the number of threads to N-1?
then space to copy all of the working registeres to as it is moved on
and off of a core by the scheduler, but what else?
One less evident cost is the strain imposed on the scheduler which may start to choke if it needs to juggle thousands of threads. The memory isn't really the issue. With the right tweaking you can get a "thread" to occupy very little memory, little more than its stack. This tweaking could be difficult (i.e. using clone(2) directly under linux etc) but it can be done.
Is the amount of stack available for one program split equally between
threads of a process or on a first come first served
Each thread gets its own stack, and typically you can control its size.
If I am running on a machine with N cores, should I keep the number of
threads to N-1
Checking the number of cores is easy, but environment-specific. However, limiting the number of threads to the number of cores only makes sense if your workload consists of CPU-intensive operations, with little I/O. If I/O is involved you may want to have many more threads than cores.
You should be as thoughtful as possible in everything you design and implement.
I know that a Java thread stack takes up about 1MB each time you create a thread. , so they add up.
Threads make sense for asynchronous tasks that allow long-running activities to happen without preventing all other users/processes from making progress.
Threads are managed by the operating system. There are lots of schemes, all under the control of the operating system (e.g. round robin, first come first served, etc.)
It makes perfect sense to me to assign one thread per core for some activities (e.g. computationally intensive calculations, graphics, math, etc.), but that need not be the deciding factor. One app I develop uses roughly 100 active threads in production; it's not a 100 core machine.
To add to the other excellent posts:
'What is the cost of spawning a thread? Is it worth even worrying about when designing software?'
It is if one of your design choices is doing such a thing often. A good way of avoiding this issue is to create threads once, at app startup, by using pools and/or app-lifetime threads dedicated to operations. Inter-thread signaling is much quicker than continual thread creation/termination/destruction and also much safer/easier.
The number of posts concerning problems with thread stopping, terminating, destroying, thread count runaway, OOM failure etc. is ledgendary. If you can avoid doing it at all, great.

TPL Tasks, Threads, etc

Could someone clear up to me how these things correlate:
Task
Thread
ThreadPool's thread
Paraller.For/ForEach/Invoke
I.e. when I create a Task and run it, where does it get a thread to execute on? And when I call Parallel.* what is really going on under the covers?
Any links to articles, blogposts, etc are also very welcomed!
The ideal state of a system is to have 1 actively running thread per CPU core. By defining work in more general terms of "tasks", the TPL can dynamically decide how many threads to use and which tasks to do on each one in order to come closest to achieving that ideal state. These are decisions that are almost always best made dynamically at runtime because when writing the code you can't know for sure how many CPU cores will be available to your application, how busy they are with other work, etc.
Thread: is a real OS thread, has handle and ID.
ThreadPool: is a collection of already-created OS Threads. These threads are owned/maintained by the runtime, and your code is only allowed to "borrow" them for a while, you can only do work short-termed work in these threads, and you can't modify any thread state, nor delete these threads.
Best guesses on these two:
Task: might get run on a pre-created thread in the thread pool, or might get run as part of user-mode scheduling, this is all depending on what the runtime thinks is best. Another guess: with TPL, the user-mode scheduling is NOT based on OS Fibers, but is its own complete (and working) implementation).
Parallel.For: actually, no clue how this is implemented. The runtime might create new threads to do the parallel bits, or much more likely use the thread pool's threads for the parallelism.

Why should I use a thread vs. using a process?

Separating different parts of a program into different processes seems (to me) to make a more elegant program than just threading everything. In what scenario would it make sense to make things run on a thread vs. separating the program into different processes? When should I use a thread?
Edit
Anything on how (or if) they act differently with single-core and multi-core would also be helpful.
You'd prefer multiple threads over multiple processes for two reasons:
Inter-thread communication (sharing data etc.) is significantly simpler to program than inter-process communication.
Context switches between threads are faster than between processes. That is, it's quicker for the OS to stop one thread and start running another than do the same with two processes.
Example:
Applications with GUIs typically use one thread for the GUI and others for background computation. The spellchecker in MS Office, for example, is a separate thread from the one running the Office user interface. In such applications, using multiple processes instead would result in slower performance and code that's tough to write and maintain.
Well apart from advantages of using thread over process, like:
Advantages:
Much quicker to create a thread than
a process.
Much quicker to switch
between threads than to switch
between processes.
Threads share data
easily
Consider few disadvantages too:
No security between threads.
One thread can stomp on another thread's
data.
If one thread blocks, all
threads in task block.
As to the important part of your question "When should I use a thread?"
Well you should consider few facts that a threads should not alter the semantics of a program. They simply change the timing of operations. As a result, they are almost always used as an elegant solution to performance related problems. Here are some examples of situations where you might use threads:
Doing lengthy processing: When a windows application is calculating it cannot process any more messages. As a result, the display cannot be updated.
Doing background processing: Some
tasks may not be time critical, but
need to execute continuously.
Doing I/O work: I/O to disk or to
network can have unpredictable
delays. Threads allow you to ensure
that I/O latency does not delay
unrelated parts of your application.
I assume you already know you need a thread or a process, so I'd say the main reason to pick one over the other would be data sharing.
Use of a process means you also need Inter Process Communication (IPC) to get data in and out of the process. This is a good thing if the process is to be isolated though.
You sure don't sound like a newbie. It's an excellent observation that processes are, in many ways, more elegant. Threads are basically an optimization to avoid too many transitions or too much communication between memory spaces.
Superficially using threads may also seem like it makes your program easier to read and write, because you can share variables and memory between the threads freely. In practice, doing that requires very careful attention to avoid race conditions or deadlocks.
There are operating-system kernels (most notably L4) that try very hard to improve the efficiency of inter-process communication. For such systems one could probably make a convincing argument that threads are pointless.
I would like to answer this in a different way. "It depends on your application's working scenario and performance SLA" would be my answer.
For instance threads may be sharing the same address space and communication between threads may be faster and easier but it is also possible that under certain conditions threads deadlock and then what do you think would happen to your process.
Even if you are a programming whiz and have used all the fancy thread synchronization mechanisms to prevent deadlocks it certainly is not rocket science to see that unless a deterministic model is followed which may be the case with hard real time systems running on Real Time OSes where you have a certain degree of control over thread priorities and can expect the OS to respect these priorities it may not be the case with General Purpose OSes like Windows.
From a Design perspective too you might want to isolate your functionality into independent self contained modules where they may not really need to share the same address space or memory or even talk to each other. This is a case where processes will make sense.
Take the case of Google Chrome where multiple processes are spawned as opposed to most browsers which use a multi-threaded model.
Each tab in Chrome can be talking to a different server and rendering a different website. Imagine what would happen if one website stopped responding and if you had a thread stalled due to this, the entire browser would either slow down or come to a stop.
So Google decided to spawn multiple processes and that is why even if one tab freezes you can still continue using other tabs of your Chrome browser.
Read more about it here
and also look here
I agree to most of the answers above. But speaking from design perspective i would rather go for a thread when i want set of logically co-related operations to be carried out parallel. For example if you run a word processor there will be one thread running in foreground as an editor and other thread running in background auto saving the document at regular intervals so no one would design a process to do that auto saving task separately.
In addition to the other answers, maintaining and deploying a single process is a lot simpler than having a few executables.
One would use multiple processes/executables to provide a well-defined interface/decoupling so that one part or the other can be reused or reimplemented more easily than keeping all the functionality in one process.
Came across this post. Interesting discussion. but I felt one point is missing or indirectly pointed.
Creating a new process is costly because of all of the
data structures that must be allocated and initialized. The process is subdivided into different threads of control to achieve multithreading inside the process.
Using a thread or a process to achieve the target is based on your program usage requirements and resource utilization.

Resources