multithreading for io-bound application : good or bad - multithreading

Ive been trying to figure out if multithreading in an io-bound application will actually improve performance or reduce it. Many sources I have read are conflicting.
Take this one for example.
Why multithreading with io-bound is bad
The accepted answer is that if your application is io-bound multithreading will cause contention and slow down your application.
Where as is this example the answer with the highest votes states that it can improve throughput.
Why multithreading with io-bound is good
Am I misunderstanding something here?
In my situation I need to read from n disk locations n times a second. I'm finding it difficult to decide if I should be using threads at all.
For example, if I had 20 files on disk and 20 seperate threads reading from disk in a state of waiting and waking, is this going to completely slow down my system?
If a pthread is executing code that reads from disk, would all the other 19 threads doing the same thing on different file be blocked?

Why do you need multithreading for this? If you work with single disk, multithreading will provide the same performance in the best case or a bit slower otherwise.

In this question:
Does it make sense to spawn more than one thread per processor?
That you labeled "why multithreading with io-bound is good", the top answer has 16 upvotes and he states that "If your software makes frequent use of disk or network IO". Take particular note of the last part, "network IO". This is a distinguishing factor from your first linked question, which is only about threads and disk IO.

The only way that additional threads will help in an io-bound situation is if your storage system can handle multiple requests in parallel. This can be the case on high end storage arrays, but is unlikely to be the case on a consumer device like the ipad.

Related

Does multi-threading improve performance? How?

I hear everyone talking about how multi-threading can improve performance. I don't believe this, unless there is something I'm missing. If I have an array of 100 elements and traversing it takes 6 seconds. When I divide the work between two threads, the processor would have to go through the same amount of work and therefore time, except that they are working simultaneously but at half the speed. Shouldn't multi threading make it even slower? Since you need additional instructions for dividing the work?
For a simple task of iterating 100 elements multi-threading the task will not provide a performance benefit.
Iterating over 100 billion elements and do processing on each element, then the use of additional CPU's may well help reduce processing time. And more complicated tasks will likely incur interrupts due to I/O for example. When one thread is sleeping waiting for a peripheral to complete I/O (e.g. a disk write, or a key press from the keyboard), other threads can continue their work.
For CPU bound tasks where you have more than one core in your processor you can divide your work on each of your processor core. If you have two cores, split the work on two threads. This way you have to threads working at full speed.
However threads are really expensive to create, so you need a pretty big workload to overcome the initial cost of creating the threads.
You can also use threads to improve appeared performance (or responsiveness) in an interactive application. You run heavy computations on a background thread to avoid blocking UI interactions. Your computations does not complete faster, but your application does not have those "hangs" that make it appear slow and unresponsive.

How to optimize number of threads per number of cores

I'm trying to get a better idea of how many threads should run on n number of cores. I know this a complicated question in which the answer depends on a number of factors, such as how much shared state there is, and how much sleeping and waiting on resources each thread does.
To simplify things, let's say we have 2 cores and just one process that can divide its work into threads with no shared state. Let's say each thread just performs computation after computation with no sleeping and no waiting on resources. Would the ideal number of threads in this case be 2?
Let's complicate things a bit and say that the threads have to do some sort of disk I/O. How does this change our answer? I would think that we could have more than 2 cores in this case.
Or let's say they don't do any sleeping or waiting on resources, but instead there's some memory that they both have access to that requires synchronization. How does this change our answer? I would think that in this case we may actually prefer 1 thread over 2, depending on how much synchronization is required.
This is a hard question to answer in the general matter. It really depends on more specifics in the case. What to remember is that it costs to do context-switches - if you're only doing computations it would be wasteful to have 2 threads running on one core (as you wouldn't really gain anything - only lose in the context switches). On the other hand if you are waiting for resources and at the same time can continue with other calculations, it is a good idea to have a thread wait for those resources to not make the entire execution lag behind.
When it comes to IO you don't think in terms of threads per core. You think in terms of threads per physical device. Each individual device has different optimal DOP's (magnetic disk = 1, SSD at least 4, network much higher).
For CPU bound work the optimal number is 1 (per core).
In mixed cases or cases where it is more complicated than that no general answer can be given. The system can behave in surprising ways (like collapsing under load!). The approach here is to test different DOPs and use the best. Generally, there will be exactly one optimum while both 1 and infinity perform much worse. So you only need to find the single maximum which is quite easy.

Thread pooling and multi core systems

Do you think threadpooling design pattern is the way to go for the multi-core future?
A threadpooling library for instance, if used extensively, makes/force the application writer
(1) to break the problem into separate parallel jobs hence promoting (enforcing :) ) parallelism
(2) With abstraction from all the low level OS calls, synchronization etc etc makes programmer's life easier. (Especially for C programmers :) )
I have strong belief its the best way (Or One of the "best" ways :) ) for multi-core future...
So, my question is, am I write in thinking so or I am in some delusion :)
Regards,
Microkernel
Thread pooling is a technique that involves a queue and a number of threads taking jobs from the queue and process them. This is in contrast to the technique of just starting new threads whenever a new task arrives.
Benefits are that the maximum number of threads is limited to avoid too much threading and that there is less overhead involved with any new task (Thread is already running and takes task. No threat starting needed).
Whether this is a good design highly depends on your problem. If you have many short jobs that come to your program at a very fast rate, then this is a good idea because the lower overhead is really a benefit. If you have extremely large numbers of concurrent tasks this is a good idea to keep your scheduler from having to do too much work.
There are many areas where thread pooling is just not helpful. So you cannot generalize. Sometimes multi threading at all is not possible. Or not even desired, as multi threading adds an unpredictable element (race conditions) to your code which is extremely hard to debug.
A thread pooling library can hardly "force" you to use it. You still need to think stuff through and if you just start one thread... Won't help.
As almost every informatics topic the answer is: It Depends.
the pooling system is fine with Embarrassingly parallel http://en.wikipedia.org/wiki/Embarrassingly_parallel
For other task where you need more syncornization between threads it's not that good
For the Windows NT engine thread pools are usually much less efficient than I/O Completion Ports. These are covered extensively in numerous questions and answers here. IOCPs enable event-driven processing in that multiple threads can wait on the IOCP until an event occurs due to an IOC (read or write) on a socket or handle which is then queued to the IOCP. The IOCP in turn pairs a waiting thread with the id of the event and releases the thread for processing. After the thread has processed the event and initiated a new I/O it returns to the IOCP to wait for the next event (which may or may not be the completion of the I/O it just initiated).
IOCs may also be artificially signalled by explicit posting from a non-event.
Using IOCPs is not polling. The optimal IOCP implementation will have as many threads waiting on the IOCP as there are cores in the system. The threads may all execute the same physical code if that is deemed efficient. Since a thread processes from an IOC up until it issues an I/O it does nothing which forces it to wait for other resources except perhaps to compete for access to thread-safe areas. It is a natural choice to move away from the "one handle per thread" paradigm. IOCP-controlled threads are therefore as efficient as the programmer is able to construct them.
I like the answer by #yaankee a lot except I would argue that thread pool is almost always the right way to go. The reason: a thread pool can degenerate itself into a simple static work partitioning model for problems like matrix-matrix multiply. OpenMP guided is kind of along those lines.

Is there a point to multithreading?

I don’t want to make this subjective...
If I/O and other input/output-related bottlenecks are not of concern, then do we need to write multithreaded code? Theoretically the single threaded code will fare better since it will get all the CPU cycles. Right?
Would JavaScript or ActionScript have fared any better, had they been multithreaded?
I am just trying to understand the real need for multithreading.
I don't know if you have payed any attention to trends in hardware lately (last 5 years) but we are heading to a multicore world.
A general wake-up call was this "The free lunch is over" article.
On a dual core PC, a single-threaded app will only get half the CPU cycles. And CPUs are not getting faster anymore, that part of Moores law has died.
In the words of Herb Sutter The free lunch is over, i.e. the future performance path for computing will be in terms of more cores not higher clockspeeds. The thing is that adding more cores typically does not scale the performance of software that is not multithreaded, and even then it depends entirely on the correct use of multithreaded programming techniques, hence multithreading is a big deal.
Another obvious reason is maintaining a responsive GUI, when e.g. a click of a button initiates substantial computations, or I/O operations that may take a while, as you point out yourself.
The primary reason I use multithreading these days is to keep the UI responsive while the program does something time-consuming. Sure, it's not high-tech, but it keeps the users happy :-)
Most CPUs these days are multi-core. Put simply, that means they have several processors on the same chip.
If you only have a single thread, you can only use one of the cores - the other cores will either idle or be used for other tasks that are running. If you have multiple threads, each can run on its own core. You can divide your problem into X parts, and, assuming each part can run indepedently, you can finish the calculations in close to 1/Xth of the time it would normally take.
By definition, the fastest algorithm running in parallel will spend at least as much CPU time as the fastest sequential algorithm - that is, parallelizing does not decrease the amount of work required - but the work is distributed across several independent units, leading to a decrease in the real-time spent solving the problem. That means the user doesn't have to wait as long for the answer, and they can move on quicker.
10 years ago, when multi-core was unheard of, then it's true: you'd gain nothing if we disregard I/O delays, because there was only one unit to do the execution. However, the race to increase clock speeds has stopped; and we're instead looking at multi-core to increase the amount of computing power available. With companies like Intel looking at 80-core CPUs, it becomes more and more important that you look at parallelization to reduce the time solving a problem - if you only have a single thread, you can only use that one core, and the other 79 cores will be doing something else instead of helping you finish sooner.
Much of the multithreading is done just to make the programming model easier when doing blocking operations while maintaining concurrency in the program - sometimes languages/libraries/apis give you little other choice, or alternatives makes the programming model too hard and error prone.
Other than that the main benefit of multi threading is to take advantage of multiple CPUs/cores - one thread can only run at one processor/core at a time.
No. You can't continue to gain the new CPU cycles, because they exist on a different core and the core that your single-threaded app exists on is not going to get any faster. A multi-threaded app, on the other hand, will benefit from another core. Well-written parallel code can go up to about 95% faster- on a dual core, which is all the new CPUs in the last five years. That's double that again for a quad core. So while your single-threaded app isn't getting any more cycles than it did five years ago, my quad-threaded app has four times as many and is vastly outstripping yours in terms of response time and performance.
Your question would be valid had we only had single cores. The things is though, we mostly have multicore CPU's these days. If you have a quadcore and write a single threaded program, you will have three cores which is not used by your program.
So actually you will have at most 25% of the CPU cycles and not 100%. Since the technology today is to add more cores and less clockspeed, threading will be more and more crucial for performance.
That's kind of like asking whether a screwdriver is necessary if I only need to drive this nail. Multithreading is another tool in your toolbox to be used in situations that can benefit from it. It isn't necessarily appropriate in every programming situation.
Here are some answers:
You write "If input/output related problems are not bottlenecks...". That's a big "if". Many programs do have issues like that, remembering that networking issues are included in "IO", and in those cases multithreading is clearly worthwhile. If you are writing one of those rare apps that does no IO and no communication then multithreading might not be an issue
"The single threaded code will get all the CPU cycles". Not necessarily. A multi-threaded code might well get more cycles than a single threaded app. These days an app is hardly ever the only app running on a system.
Multithreading allows you to take advantage of multicore systems, which are becoming almost universal these days.
Multithreading allows you to keep a GUI responsive while some action is taking place. Even if you don't want two user-initiated actions to be taking place simultaneously you might want the GUI to be able to repaint and respond to other events while a calculation is taking place.
So in short, yes there are applications that don't need multithreading, but they are fairly rare and becoming rarer.
First, modern processors have multiple cores, so a single thraed will never get all the CPU cycles.
On a dualcore system, a single thread will utilize only half the CPU. On a 8-core CPU, it'll use only 1/8th.
So from a plain performance point of view, you need multiple threads to utilize the CPU.
Beyond that, some tasks are also easier to express using multithreading.
Some tasks are conceptually independent, and so it is more natural to code them as separate threads running in parallel, than to write a singlethreaded application which interleaves the two tasks and switches between them as necessary.
For example, you typically want the GUI of your application to stay responsive, even if pressing a button starts some CPU-heavy work process that might go for several minutes. In that time, you still want the GUI to work. The natural way to express this is to put the two tasks in separate threads.
Most of the answers here make the conclusion multicore => multithreading look inevitable. However, there is another way of utilizing multiple processors - multi-processing. On Linux especially, where, AFAIK, threads are implemented as just processes perhaps with some restrictions, and processes are cheap as opposed to Windows, there are good reasons to avoid multithreading. So, there are software architecture issues here that should not be neglected.
Of course, if the concurrent lines of execution (either threads or processes) need to operate on the common data, threads have an advantage. But this is also the main reason for headache with threads. Can such program be designed such that the pieces are as much autonomous and independent as possible, so we can use processes? Again, a software architecture issue.
I'd speculate that multi-threading today is what memory management was in the days of C:
it's quite hard to do it right, and quite easy to mess up.
thread-safety bugs, same as memory leaks, are nasty and hard to find
Finally, you may find this article interesting (follow this first link on the page). I admit that I've read only the abstract, though.

Does multithreading make sense for IO-bound operations?

When performing many disk operations, does multithreading help, hinder, or make no difference?
For example, when copying many files from one folder to another.
Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.
Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?
Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.
However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.
I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.
That depends on your definition of "I/O bound" but generally multithreading has two effects:
Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])
Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)
I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).
I would think it depends on a number of factors, like the kind of application you are running, the number of concurrent users, etc.
I am currently working on a project that has a high degree of linear (reading files from start to finish) operations. We use a NAS for storage, and were concerned about what happens if we run multiple threads. Our initial thought was that it would slow us down because it would increase head seeks. So we ran some tests and found out that the ideal number of threads is the same as the number of cores in the computer.
But your mileage may vary.
It can do, simply because whenever there is more work for a thread to do (identifying the next file to copy) the OS wakes it up, so threads are a simple way to hook into the OS scheduler and yet still write code in a traditional sequential way, instead of having to break it up into a state machine with callbacks.
This is mainly an assistance with clear programming rather than performance.
In most cases, using multi-thread for disk IO will not benefit efficiency. Let's imagine 2 circumstances:
Lock-Free File: We can split the file for each thread by giving them different IO offset. For instance, a 1024B bytes file is split into n pieces and each thread writes the 1024/n respectively. This will cause a lot of verbose disk head movement because of the different offset.
Lock File: Actually lock the IO operation for each critical section. This will cause a lot of verbose thread switches and it turns out that only one thread can write the file simultaneously.
Correct me if I' wrong.
No, it makes no sense. At some point, the operations have to be serialized (by the OS). On the other hand, since modern OS's have to cope with multiple processes anyway I doubt that there's an added overhead.
I'd think it would hinder the operations... You only have one controller and one drive.
You could use a second thread to do the operation, and a main thread that shows an updated UI.
I think it could worsen the performance, because the multiple threads will compete for the same resources.
You can test the impact of doing concurrent IO operations on the same device by copying a set of files from one place to another and measuring the time, then split the set in two parts and make the copies in parallel... the second option will be sensibly slower.

Resources