How does process blocking apply to a multi-threaded process?

How does process blocking apply to a multi-threaded process? - multithreading

I've learned that a process has running, ready, blocked, and suspended states. Threads also have these states except for suspended because it lives in the process's address space.
A process blocks most of the time when it is doing a blocking i/o or waiting for an event.
I can easily picture out a process getting blocked if its single-threaded or if it follows a one-to-many model, but how does it work if the process is multi-threaded?
For example:
I have a process with two threads in a system that follows a one-to-one model. One handles the gui and the other handles the blocking i/o. I know the process remains responsive because the other thread handles the i/o.
So is there by any chance the process gets blocked or should I just rule it out in this case?
I'm just getting into these stuff so forgive me If I haven't understand some of the important details yet.

Let's say you have a work queue where the UI thread schedules work to be done and the I\O thread looks there for work to do. The work queue itself is data that is read and modified from both threads, therefor you must synchronize access somehow or race conditions result.
The naive approach is to synchronize access to the queue using a lock (aka critical section). If the I\O thread acquires the lock and then blocks, the UI thread will only remain responsive until it decides it needs to schedule work and tries to acquire the lock. A better approach is to use a lock-free queue about which much has been written and you can easily search for more info.
But to answer your question, yes, it is still much easier than you might think to cause UI to stutter / hang even when using multiple threads. There are various libraries that make it easier or harder to solve this problem, so depending on your OS and language of choice, there may be something better than just OS primitives. Win32 (from what I remember) doesn't it make it very easy at all despite having all sorts of synchronization primitives. Pthreads and Boost never seemed very straightforward to me either. Apple's GCD makes it semantically much easier to express what you want (in my opinion), though there are still pitfalls one must be aware of (such as scheduling too many blocking operations on a single work queue to be done in parallel and causing the processor to thrash when they all wake up at the same time).
My advice is to just dive in and write lots of multithreaded code. It can be tough to debug but you will learn a lot and eventually it becomes second nature.

Related

Sockets use threads instead of select()

I have a question about multi sockets.
I know that I have to use select() for multi sockets. select() waits for a fd ...
But why we need to use select() when we can create a thread for each socket and perform accept() on each one seperatly ? Is it even a bad idea ? Is it just about "too many sockets, too many threads so" or what ??

It's true, you can avoid multiplexing sockets by instead spawning one thread for each socket, and then using blocking I/O on each thread.
That saves you from having to deal with select() (or poll() or etc); but now you have to deal with multiple threads instead, which is often worse.
Whether threads will be more of a hassle to manage than socket-multiplexing in your particular program depends a lot on what your program is trying to do. For example, if the threads in your program don't need to communicate/co-operate with each other or share any resources, then a multithreaded design can work well (as would a multiprocess design). On the other hand, if your threads all need to access a shared data structure or other resource, or if they need to interact with each other, then you've got a bit of a programming challenge on your hands, which you'll need to solve 100% perfectly or you'll end up with a program that "seems to work most of the time" but then occasionally deadlocks, crashes, or gives incorrect results due to incorrect/insufficient synchronization. This phenomenon of "meta-stability" is much more common/severe amongst buggy multithreaded programs than in buggy single-threaded programs, since a multithreaded program's exact flow of execution will be different every time you run it (due to the asynchronous nature of the threads with respect to each other).
Stability and code-correctness issues aside, there are a couple of other problems particular to multithreading that you avoid by using a single-threaded design:
Most OS's don't scale well above a few dozen threads. So if you're thinking one-thread-per-client, and you want to support hundreds or thousands of simultaneous clients, you're in for some performance problems.
It's hard to control a thread that is blocked in a blocking-socket call. Say the user has pressed Command-Q (or whatever the appropriate equivalent is) so it's now time for your program to quit. If you have one or more threads blocked inside a blocking-socket call, there's no straightforward way to do that:
You can't just call unilaterally call exit(), because while the main thread is tearing down process-global resources, one or more threads might still be using them, leading to an occasional crash
You can't ask the threads to exit (via atomic-boolean or whatever) and then call join() to wait for them, because they are blocking inside I/O calls and thus might take minutes/hours/days before they respond
You can't send a signal to the threads and have them react in a signal-handler, because signals are per-process, and you can't control which thread will receive the signal.
You can't just unilaterally kill the threads, because they might be holding resources (like mutexes or file handles) that would then remain unreleased forever, potentially causing deadlocks or other problems
You can't close down the threads' sockets for them, and hope that this will cause the threads to error out and terminate, as this leads to race condition if the threads also try to close down those same resources.
So even in a multithreaded design, if you want a clean shutdown (or any other sort of local control of a network-thread) you usually end up having to use non-blocking I/O and/or socket multiplexing inside each thread anyway, so now you've got the worst of both worlds, complexity-wise.

QProcess, QEventLoop - of any use for parallel-processing

I wonder whether I could use QEventLoop (QProcess?) to parallelize multiple calls to same function with Qt. What is precisely the difference with QtConcurrent or QThread? What is a process and an event loop more precisely? I read that QCoreApplication must exec() as early as possible in main() method, so that I wonder why it is different from main Thread.
could you point as some efficient reference to processes and thread with Qt? I came through the official doc and those things remain unclear.
Thanks and regards.

Process and thread are not Qt-specific concepts. You can search for "process vs. thread" anywhere for that distinction to be explained. For instance: What resources are shared between threads?
Though related concepts, spawning a new process is a more "heavyweight" form of parallelism than spawning a new thread within your existing process. Processes are protected from each other by default, while threads of execution within a process can read and write each other's memory directly. The protection you get from spawning processes comes at a greater run-time cost...and since independent processes can't read each other's memory, you have to share data between them using methods of inter-process communication.
Odds are that you want threads, because they're simpler to use in a case where one is writing all the code in a program. Given all the complexities in multithreaded programming, I'd suggest looking at a good book or reading some websites to start with. See: What are some good resources for learning threaded programming?
But if you want to dive in and just get a feel for how threading in Qt looks, you can spend time looking at the examples:
http://qt-project.org/doc/qt-4.8/examples-threadandconcurrent.html
QtConcurrent is an abstraction library that makes it easier to implement some kinds of parallel programming patterns. It's built on top of the QThread abstractions, and there's nothing it can do that you couldn't code yourself by writing to QThread directly. But it might make your code easier to write and less prone to errors.
As for an event loop...that is merely a generic term for how any given thread of execution in your program waits for work items to process, processes them, and can decide when it is no longer needed. If a thread's job were merely to start up, do some math, and exit...then it wouldn't need an event loop. But starting and stopping a thread takes time and churns resources. So typically threads live for longer periods of time, and have an event loop that knows how to wait for events it needs to respond to.
If you build on top of QtConcurrent, you won't have to worry about an event loop in your worker threads because they are managed automatically in a thread pool. The word count example is pretty simple to see:
http://qt-project.org/doc/qt-4.8/qtconcurrent-wordcount-main-cpp.html

Can code running in a background thread be faster than in the main VCL thread in Delphi?

If anybody has had a lot of experience timing code running on the main VCL thread vs a background thread, I'd like to get an opinion. I have some code that does some heavy string processing running in my Delphi 6 application on the main thread. Each time I run an operation, the time for each operation hovers around 50 ms on a single thread on my i5 Quad core. What makes me really suspicious is that the same code running on an old Pentium 4 that I have, shows the same time for the operation when usually I see code running about 4 times slower on the Pentium 4 than the Quad Core. I am beginning to wonder if the code might be consuming significantly less time than 50 ms but that there's something about the main VCL thread, perhaps Windows message handling or executing Windows API calls, that is creating an artificial "floor" for the operation. Note, an operation is triggered by an incoming request on a socket if that matters, but the time measurement does not take place until the data is fully received.
Before I undertake the work of moving all the code on to a background thread for testing, I am wondering if anyone has any general knowledge in this area? What have your experiences been with code running on and off the main VCL thread? Note, the timing measurements are being done when there is absolutely no user triggered activity going on during the tests.
I'm also wondering if raising the priority of the thread to just below real-time would do any good. I've never seen much improvement in my run times when experimenting with those flags.
-- roschler

Given all threads have the same priority, as they normally do, there can't be a difference, for the following reasons. If you're seeing a difference, re-evaluate the code (make sure you run the same thing in both VCL and background threads) and make sure you time it properly:
The compiler generates the exact same code, it doesn't care if the code is going to run in the main thread or in a background thread. In fact you can put the whole code in a procedure and call that from both your worker thread's Execute() and from the main VCL thread.
For the CPU all cores, and all threads, are equal. Unless it's actually a Hyper Threading CPU, where not all cores are real, but then see the next bullet.
Even if not all CPU cores are equal, your thread will very unlikely run on the same core, the operating system is free to move it around at will (and does actually schedule your thread to run on different cores at different times).
Messaging overhead doesn't matter for the main VCL thread, because unless you're calling Application.ProcessMessages() manually, the message pump is simply stopped while your procedure does it's work. The message pump is passive, your thread needs to request messages from the queue, but since the thread is busy doing your work, it's not requesting any messages so no overhead there.
There's just one place where threads are not equal, and this can change the perceived speed of execution: It's the operating system that schedules threads to execution units (cores), and for the operating system threads have different priorities. You can tell the OS a certain thread needs to be treated differently using the SetThreadPriority() API (which is used by the TThread.Priority property).

Without simple source code to reproduce the issue, and how you are timing your threads, it will be difficult to understand what occurs in your software.
Sounds definitively like either:
An Architecture issue - how are your threads defined?
A measurement issue - how are you timing your threads?
A typical scaling issue of both the memory manager and the RTL string-related implementation.
About the latest point, consider this:
The current memory manager (FastMM4) is not scaling well on multi-core CPU; try with a per-thread memory manager, like our experimental SynScaleMM - note e.g. that the Free Pascal Compiler team has written a new scaling MM from scratch recently, to avoid such issue;
Try changing the string process implementation to avoid memory allocation (use static buffers), and string reference-counting (every string reference counting access produces a LOCK DEC/INC which do not scale so well on multi-code CPU - use per-thread char-level process, using e.g. PChar on static buffers instead of string).
I'm sure that without string operations, you'll find that all threads are equivalent.
In short: neither the current Delphi MM, neither the current string implementation scales well on multi-core CPU. You just found out a known issue of the current RTL. Read this SO question.

When your code has control of the VCL thread, for instance if it is in one method and doesn't call out to any VCL controls or call Application.ProcessMessages, then the run time will not be affected just because it's in the main VCL thread.
There is no overhead, since you "own" the whole processing power of the thread when you are in your own code.
I would suggest that you use a profiling tool to find where the actual bottleneck is.

Performance can't be assessed statically. For that you need to get AQTime, or some other performance profiler for Delphi. I use AQtime, and I love it, but I'm aware it's considered expensive.
Your code will not magically get faster just because you moved it to a background thread. If anything, your all-inclusive-time until you see results in your UI might get a little slower, if you have to send a lot of data from the background thread to the foreground thread via some synchronization mechanisms.
If however you could execute parts of your algorithm in parallel, that is, split your work so that you have 2 or more worker threads processing your data, and you have a quad core processor, then your total time to do a fixed load of work, could decrease. That doesn't mean the code would run any faster, but depending on a lot of factors, you might achieve a slight benefit from multithreading, up to the number of cores in your computer. It's never ever going to be a 2x performance boost, to use two threads instead of one, but you might get 20%-40% better performance, in your more-than-one-threaded parallel solutions, depending on how scalable your heap is under multithreaded loads, and how IO/memory/cache bound your workload is.
As for raising thread priorities, generally all you will do there is upset the delicate balance of your Windows system's performance. By raising the priorities you will achieve (sometimes) a nominal, but unrepeatable and non-guaranteeable increase in performance. Depending on the other things you do in your code, and your data sources, playing with priorities of threads can introduce subtle problems. See Dining Philosophers problem for more.
Your best bet for optimizing the speed of string operations is to first test it and find out exactly where it is using most of its time. Is it heap operations? Memory Copy and move operations? Without a profiler, even with advice from other people, you will still be comitting a cardinal sin of programming; premature optimization. Be results oriented. Be science based. Measure. Understand. Then decide.
Having said that, I've seen a lot of horrible code in my time, and there is one killer thing that people do that totally kills their threaded app performance; Using TThread.Synchronize too much.
Here's a pathological (Extreme) case, that sadly, occurs in the wild fairly frequently:
procedure TMyThread.Execute;
begin
while not Terminated do
Synchronize(DoWork);
end;
The problem here is that 100% of the work is really done in the foreground, other than the "if terminated" check, which executes in the thread context. To make the above code even worse, add a non-interruptible sleep.
For fast background thread code, use Synchronize sparingly or not at all, and make sure the code it calls is simple and executes quickly, or better yet, use TThread.Queue or PostMessage if you could really live with queueing main thread activity.

Are you asking if a background thread would be faster? If your background thread would run the same code as the main thread and there's nothing else going on in the main thread, you don't stand to gain anything with a background thread. Threads should be used to split and distribute processing loads that would otherwise contend with one another and/or block one another when running in the main thread. Since you seem to be dealing with a case where your main thread is otherwise idle, simply spawning a thread to run slow code will not help.
Threads aren't magic, they can't speed up slow code or eliminate processing bottlenecks in a particular segment not related to contention on the main thread. Make sure your code isn't doing something you don't know about and that your timing methodology is correct.
My first hunch would be that your interaction with the socket is affecting your timing in a way you haven't detected... (I know you said you're sure that's not involved - but maybe check again...)

fork in multi-threaded program

I've heard that mixing forking and threading in a program could be very problematic, often resulting with mysterious behavior, especially when dealing with shared resources, such as locks, pipes, file descriptors. But I never fully understand what exactly the dangers are and when those could happen. It would be great if someone with expertise in this area could explain a bit more in detail what pitfalls are and what needs to be care when programming in a such environment.
For example, if I want to write a server that collects data from various different resources, one solution I've thought is to have the server spawns a set of threads, each popen to call out another program to do the actual work, open pipes to get the data back from the child. Each of these threads responses for its own work, no data interexchange in b/w them, and when the data is collected, the main thread has a queue and these worker threads will just put the result in the queue. What could go wrong with this solution?
Please do not narrow your answer by just "answering" my example scenario. Any suggestions, alternative solutions, or experiences that are not related to the example but helpful to provide a clean design would be great! Thanks!

The problem with forking when you do have some threads running is that the fork only copies the CPU state of the one thread that called it. It's as if all of the other threads just died, instantly, wherever they may be.
The result of this is locks aren't released, and shared data (such as the malloc heap) may be corrupted.
pthread does offer a pthread_atfork function - in theory, you could take every lock in the program before forking, release them after, and maybe make it out alive - but it's risky, because you could always miss one. And, of course, the stacks of the other threads won't be freed.

It is really quite simple. The problems with multiple threads and processes always arise from shared data. If there is not shared data then there can be no possible issues arising.
In your example the shared data is the queue owned by the main thread - any potential contention or race conditions will arise here. Typical methods for "solving" these issues involve locking schemes - a worker thread will lock the queue before inserting any data, and the main thread will lock the queue before removing it.

Why should I use a thread vs. using a process?

Separating different parts of a program into different processes seems (to me) to make a more elegant program than just threading everything. In what scenario would it make sense to make things run on a thread vs. separating the program into different processes? When should I use a thread?
Edit
Anything on how (or if) they act differently with single-core and multi-core would also be helpful.

You'd prefer multiple threads over multiple processes for two reasons:
Inter-thread communication (sharing data etc.) is significantly simpler to program than inter-process communication.
Context switches between threads are faster than between processes. That is, it's quicker for the OS to stop one thread and start running another than do the same with two processes.
Example:
Applications with GUIs typically use one thread for the GUI and others for background computation. The spellchecker in MS Office, for example, is a separate thread from the one running the Office user interface. In such applications, using multiple processes instead would result in slower performance and code that's tough to write and maintain.

Well apart from advantages of using thread over process, like:
Advantages:
Much quicker to create a thread than
a process.
Much quicker to switch
between threads than to switch
between processes.
Threads share data
easily
Consider few disadvantages too:
No security between threads.
One thread can stomp on another thread's
data.
If one thread blocks, all
threads in task block.
As to the important part of your question "When should I use a thread?"
Well you should consider few facts that a threads should not alter the semantics of a program. They simply change the timing of operations. As a result, they are almost always used as an elegant solution to performance related problems. Here are some examples of situations where you might use threads:
Doing lengthy processing: When a windows application is calculating it cannot process any more messages. As a result, the display cannot be updated.
Doing background processing: Some
tasks may not be time critical, but
need to execute continuously.
Doing I/O work: I/O to disk or to
network can have unpredictable
delays. Threads allow you to ensure
that I/O latency does not delay
unrelated parts of your application.

I assume you already know you need a thread or a process, so I'd say the main reason to pick one over the other would be data sharing.
Use of a process means you also need Inter Process Communication (IPC) to get data in and out of the process. This is a good thing if the process is to be isolated though.

You sure don't sound like a newbie. It's an excellent observation that processes are, in many ways, more elegant. Threads are basically an optimization to avoid too many transitions or too much communication between memory spaces.
Superficially using threads may also seem like it makes your program easier to read and write, because you can share variables and memory between the threads freely. In practice, doing that requires very careful attention to avoid race conditions or deadlocks.
There are operating-system kernels (most notably L4) that try very hard to improve the efficiency of inter-process communication. For such systems one could probably make a convincing argument that threads are pointless.

I would like to answer this in a different way. "It depends on your application's working scenario and performance SLA" would be my answer.
For instance threads may be sharing the same address space and communication between threads may be faster and easier but it is also possible that under certain conditions threads deadlock and then what do you think would happen to your process.
Even if you are a programming whiz and have used all the fancy thread synchronization mechanisms to prevent deadlocks it certainly is not rocket science to see that unless a deterministic model is followed which may be the case with hard real time systems running on Real Time OSes where you have a certain degree of control over thread priorities and can expect the OS to respect these priorities it may not be the case with General Purpose OSes like Windows.
From a Design perspective too you might want to isolate your functionality into independent self contained modules where they may not really need to share the same address space or memory or even talk to each other. This is a case where processes will make sense.
Take the case of Google Chrome where multiple processes are spawned as opposed to most browsers which use a multi-threaded model.
Each tab in Chrome can be talking to a different server and rendering a different website. Imagine what would happen if one website stopped responding and if you had a thread stalled due to this, the entire browser would either slow down or come to a stop.
So Google decided to spawn multiple processes and that is why even if one tab freezes you can still continue using other tabs of your Chrome browser.
Read more about it here
and also look here

I agree to most of the answers above. But speaking from design perspective i would rather go for a thread when i want set of logically co-related operations to be carried out parallel. For example if you run a word processor there will be one thread running in foreground as an editor and other thread running in background auto saving the document at regular intervals so no one would design a process to do that auto saving task separately.

In addition to the other answers, maintaining and deploying a single process is a lot simpler than having a few executables.
One would use multiple processes/executables to provide a well-defined interface/decoupling so that one part or the other can be reused or reimplemented more easily than keeping all the functionality in one process.

Came across this post. Interesting discussion. but I felt one point is missing or indirectly pointed.
Creating a new process is costly because of all of the
data structures that must be allocated and initialized. The process is subdivided into different threads of control to achieve multithreading inside the process.
Using a thread or a process to achieve the target is based on your program usage requirements and resource utilization.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string