Multiple Backup Jobs Simultaneously: Theory vs Practice - linux

While designing a fairly simple backup system for Linux in python, I'm finding myself asking the question, could there be any time advantage to backing up up several datasets/archives simultaneously?
My intuition tells me that writing to several archives simultaneously would not buy me much time as I/O would already be the greatest bottleneck.
On the other hand, if using something like bz2, would there be an advantage with multi-threading since higher demand of CPU will decrease I/O demand? Or is it a wash since all threads would be doing essentially the same thing and therefore sharing the same bottlenecks?

It depends on your system. If you have multiple disks, it could be very worthwhile to parallelize your backup job. If you have multiple processors, compressing multiple jobs in parallel may be worth your while.
If the processor is slow enough (and the disks are fast enough) that zipping makes your CPU a bottleneck, you'll make some gains on multicore or hyperthreaded processors. The reduced I/O demand from zipped data being written is almost certainly a win if your CPU can keep up with the read speed of your drive(s).
Anyway, this is all very system dependent. Try it and see. Run two jobs at once and then run the same two in serial and see which took longer. The cheap (coding-wise) way is to just run your backup script twice with different input and output parameters. Once you've established a winner, you can go farther down the path.

Related

About multithreading download disadvantages

I have a question about multithreading download, as you know downloading using several threads improve performance of application, however there are some measures to respect: like the number of threads, the available bandwidth and some more, but I don't really understand, why the performance of application might be degraded by using many threads for example, or how can the bandwidth,quality of server affect the performance of multithreaded application? , what are the cases in which monothread download is faster than multithread?
Thanks for your replies.
I assume you're referring to download managers.
First, I'm sceptical of how much "performance" benefit a download manager really provides. But more importantly, any benefit they do provide is not due to multi-threading. The performance constraint of a download is the bandwidth of the connection. And this is why I'm sceptical of the benefits:
A 1 Mbps connection will download at 1 Mbps.
Splitting the file into 4 segments means you download each segment at 256 Kbps and 4 * 256 Kbps = 1 Mbps.
You may get some improvement if a server throttles each download segment.
You may get a small benefit if one of the segments gets timed out: the others downloading mean your connection doesn't sit idle during the time-out wait.
You might also speed up a download by 'drowning out' anything else trying to use the connection. (Not that I'd really call this a benefit though.)
The real benefit of a download manager is in automatically restarting downloads efficiently (i.e. not re-starting from scratch if possible).
So what is the point of multi-threading?
Let's first dispel a myth: Multi-threading does not speed anything up. If a routine requires X clock-cycles to run: it will take X clock-cycles; whether on 1 thread or many threads.
What multi-threading does do: it allows tasks to run concurrently (at the same time).
The ability to do different things at the same time means:
A slow task (combining various segments of a large download) can be done on a different thread without interfering with other threads that need to react quickly (such as the user interface).
Concurrent tasks can also use more available resources (multiple CPUs) more efficiently. Note (in answer to the last part of your question) if you only have one CPU then your threads are "time-sliced" by the operating system so it's not truly concurrent. But the time slices are very small, so previous benefit still applies.
When is single-threaded faster than multi-threaded?
Well, pretty much always in cases where CPU is not the bottle-neck. In the case of download: As mentioned before, the bottle-neck is the bandwidth between the two end-points of the connection. Many threads actually means you have to do more work (managing and coordinating the different threads).
The most efficient approach for download is 2 threads: one for the UI, and the other for the download so that any pauses/dealys don't stall the user interface.
However, more generally even when you have CPU intensive work that could theoretically benefit from multiple threads doing different work concurrently, it's very easy to make mistakes in implementation that actually slow down your application.
Ideally your multiple tasks should not share data. Because if they do, then you risk race-condition or concurrency bugs.
When they do have to share data, you need to synchronise the work in some way to avoid the above mentioned bugs. (There are many techniques to choose from depending on your needs and I won't go into detail here.)
However if your synchronisation is poorly planned you risk introducing a number of problems that can significantly slow down your application. These include:
Bottle-necking through a shared resource to make your multiple threads unable to run concurrently in any case.
High lock contention where task spend more time waiting than working.
Even deadlocking which can totally block some tasks.
First of all, what Multi-Threading download does, it creates multiple threads, which download the file from different starting positions, which tries to utilize maximum power of your internet connection. (This will download fast in multi-core processor case, which is explained below).
We might feel threads running parallel, but actually, they are supposed to run turn by turn. For example thread t1 runs for 0.25 sec, then thread t2 runs for 0.241 sec, and then t1,.., t2,, t1. So, they share CPU Bursts.
So why Multi-threads improve performance?
Answer: If you have multi-core processor, threads can be managed to run parallel using multiple processors, which improves performance. (This is how download accelerators like IDMan do the magic!).
Why Multi-threads degrade performance?
Answer: If you have single-core processor, all threads will run turn by turn, as a single process, sharing CPU Bursts. In this case, monothread download is faster than multithread, because if you use multi-threads, some time is wasted while switching from one thread to another. (Download accelerators like IDMan will not really download fast in this case despite multi-threaded download).
See this picture having dual core processor, and how threads are managed.
I hope this helps! :-)

Is there a point to multithreading?

I don’t want to make this subjective...
If I/O and other input/output-related bottlenecks are not of concern, then do we need to write multithreaded code? Theoretically the single threaded code will fare better since it will get all the CPU cycles. Right?
Would JavaScript or ActionScript have fared any better, had they been multithreaded?
I am just trying to understand the real need for multithreading.
I don't know if you have payed any attention to trends in hardware lately (last 5 years) but we are heading to a multicore world.
A general wake-up call was this "The free lunch is over" article.
On a dual core PC, a single-threaded app will only get half the CPU cycles. And CPUs are not getting faster anymore, that part of Moores law has died.
In the words of Herb Sutter The free lunch is over, i.e. the future performance path for computing will be in terms of more cores not higher clockspeeds. The thing is that adding more cores typically does not scale the performance of software that is not multithreaded, and even then it depends entirely on the correct use of multithreaded programming techniques, hence multithreading is a big deal.
Another obvious reason is maintaining a responsive GUI, when e.g. a click of a button initiates substantial computations, or I/O operations that may take a while, as you point out yourself.
The primary reason I use multithreading these days is to keep the UI responsive while the program does something time-consuming. Sure, it's not high-tech, but it keeps the users happy :-)
Most CPUs these days are multi-core. Put simply, that means they have several processors on the same chip.
If you only have a single thread, you can only use one of the cores - the other cores will either idle or be used for other tasks that are running. If you have multiple threads, each can run on its own core. You can divide your problem into X parts, and, assuming each part can run indepedently, you can finish the calculations in close to 1/Xth of the time it would normally take.
By definition, the fastest algorithm running in parallel will spend at least as much CPU time as the fastest sequential algorithm - that is, parallelizing does not decrease the amount of work required - but the work is distributed across several independent units, leading to a decrease in the real-time spent solving the problem. That means the user doesn't have to wait as long for the answer, and they can move on quicker.
10 years ago, when multi-core was unheard of, then it's true: you'd gain nothing if we disregard I/O delays, because there was only one unit to do the execution. However, the race to increase clock speeds has stopped; and we're instead looking at multi-core to increase the amount of computing power available. With companies like Intel looking at 80-core CPUs, it becomes more and more important that you look at parallelization to reduce the time solving a problem - if you only have a single thread, you can only use that one core, and the other 79 cores will be doing something else instead of helping you finish sooner.
Much of the multithreading is done just to make the programming model easier when doing blocking operations while maintaining concurrency in the program - sometimes languages/libraries/apis give you little other choice, or alternatives makes the programming model too hard and error prone.
Other than that the main benefit of multi threading is to take advantage of multiple CPUs/cores - one thread can only run at one processor/core at a time.
No. You can't continue to gain the new CPU cycles, because they exist on a different core and the core that your single-threaded app exists on is not going to get any faster. A multi-threaded app, on the other hand, will benefit from another core. Well-written parallel code can go up to about 95% faster- on a dual core, which is all the new CPUs in the last five years. That's double that again for a quad core. So while your single-threaded app isn't getting any more cycles than it did five years ago, my quad-threaded app has four times as many and is vastly outstripping yours in terms of response time and performance.
Your question would be valid had we only had single cores. The things is though, we mostly have multicore CPU's these days. If you have a quadcore and write a single threaded program, you will have three cores which is not used by your program.
So actually you will have at most 25% of the CPU cycles and not 100%. Since the technology today is to add more cores and less clockspeed, threading will be more and more crucial for performance.
That's kind of like asking whether a screwdriver is necessary if I only need to drive this nail. Multithreading is another tool in your toolbox to be used in situations that can benefit from it. It isn't necessarily appropriate in every programming situation.
Here are some answers:
You write "If input/output related problems are not bottlenecks...". That's a big "if". Many programs do have issues like that, remembering that networking issues are included in "IO", and in those cases multithreading is clearly worthwhile. If you are writing one of those rare apps that does no IO and no communication then multithreading might not be an issue
"The single threaded code will get all the CPU cycles". Not necessarily. A multi-threaded code might well get more cycles than a single threaded app. These days an app is hardly ever the only app running on a system.
Multithreading allows you to take advantage of multicore systems, which are becoming almost universal these days.
Multithreading allows you to keep a GUI responsive while some action is taking place. Even if you don't want two user-initiated actions to be taking place simultaneously you might want the GUI to be able to repaint and respond to other events while a calculation is taking place.
So in short, yes there are applications that don't need multithreading, but they are fairly rare and becoming rarer.
First, modern processors have multiple cores, so a single thraed will never get all the CPU cycles.
On a dualcore system, a single thread will utilize only half the CPU. On a 8-core CPU, it'll use only 1/8th.
So from a plain performance point of view, you need multiple threads to utilize the CPU.
Beyond that, some tasks are also easier to express using multithreading.
Some tasks are conceptually independent, and so it is more natural to code them as separate threads running in parallel, than to write a singlethreaded application which interleaves the two tasks and switches between them as necessary.
For example, you typically want the GUI of your application to stay responsive, even if pressing a button starts some CPU-heavy work process that might go for several minutes. In that time, you still want the GUI to work. The natural way to express this is to put the two tasks in separate threads.
Most of the answers here make the conclusion multicore => multithreading look inevitable. However, there is another way of utilizing multiple processors - multi-processing. On Linux especially, where, AFAIK, threads are implemented as just processes perhaps with some restrictions, and processes are cheap as opposed to Windows, there are good reasons to avoid multithreading. So, there are software architecture issues here that should not be neglected.
Of course, if the concurrent lines of execution (either threads or processes) need to operate on the common data, threads have an advantage. But this is also the main reason for headache with threads. Can such program be designed such that the pieces are as much autonomous and independent as possible, so we can use processes? Again, a software architecture issue.
I'd speculate that multi-threading today is what memory management was in the days of C:
it's quite hard to do it right, and quite easy to mess up.
thread-safety bugs, same as memory leaks, are nasty and hard to find
Finally, you may find this article interesting (follow this first link on the page). I admit that I've read only the abstract, though.

Programming for Multi core Processors

As far as I know, the multi-core architecture in a processor does not effect the program. The actual instruction execution is handled in a lower layer.
my question is,
Given that you have a multicore environment, Can I use any programming practices to utilize the available resources more effectively? How should I change my code to gain more performance in multicore environments?
That is correct. Your program will not run any faster (except for the fact that the core is handling fewer other processes, because some of the processes are being run on the other core) unless you employ concurrency. If you do use concurrency, though, more cores improves the actual parallelism (with fewer cores, the concurrency is interleaved, whereas with more cores, you can get true parallelism between threads).
Making programs efficiently concurrent is no simple task. If done poorly, making your program concurrent can actually make it slower! For example, if you spend lots of time spawning threads (thread construction is really slow), and do work on a very small chunk size (so that the overhead of thread construction dominates the actual work), or if you frequently synchronize your data (which not only forces operations to run serially, but also has a very high overhead on top of it), or if you frequently write to data in the same cache line between multiple threads (which can lead to the entire cache line being invalidated on one of the cores), then you can seriously harm the performance with concurrent programming.
It is also important to note that if you have N cores, that DOES NOT mean that you will get a speedup of N. That is the theoretical limit to the speedup. In fact, maybe with two cores it is twice as fast, but with four cores it might be about three times as fast, and then with eight cores it is about three and a half times as fast, etc. How well your program is actually able to take advantage of these cores is called the parallel scalability. Often communication and synchronization overhead prevent a linear speedup, although, in the ideal, if you can avoid communication and synchronization as much as possible, you can hopefully get close to linear.
It would not be possible to give a complete answer on how to write efficient parallel programs on StackOverflow. This is really the subject of at least one (probably several) computer science courses. I suggest that you sign up for such a course or buy a book. I'd recommend a book to you if I knew of a good one, but the paralell algorithms course I took did not have a textbook for the course. You might also be interested in writing a handful of programs using a serial implementation, a parallel implementation with multithreading (regular threads, thread pools, etc.), and a parallel implementation with message passing (such as with Hadoop, Apache Spark, Cloud Dataflows, asynchronous RPCs, etc.), and then measuring their performance, varying the number of cores in the case of the parallel implementations. This was the bulk of the course work for my parallel algorithms course and can be quite insightful. Some computations you might try parallelizing include computing Pi using the Monte Carlo method (this is trivially parallelizable, assuming you can create a random number generator where the random numbers generated in different threads are independent), performing matrix multiplication, computing the row echelon form of a matrix, summing the square of the number 1...N for some very large number of N, and I'm sure you can think of others.
I don't know if it's the best possible place to start, but I've subscribed to the article feed from Intel Software Network some time ago and have found a lot of interesting thing there, presented in pretty simple way. You can find some very basic articles on fundamental concepts of parallel computing, like this. Here you have a quick dive into openMP that is one possible approach to start parallelizing the slowest parts of your application, without changing the rest. (If those parts present parallelism, of course.) Also check Intel Guide for Developing Multithreaded Applications. Or just go and browse the article section, the articles are not too many, so you can quickly figure out what suits you best. They also have a forum and a weekly webcast called Parallel Programming Talk.
Yes, simply adding more cores to a system without altering the software would yield you no results (with exception of the operating system would be able to schedule multiple concurrent processes on separate cores).
To have your operating system utilise your multiple cores, you need to do one of two things: increase the thread count per process, or increase the number of processes running at the same time (or both!).
Utilising the cores effectively, however, is a beast of a different colour. If you spend too much time synchronising shared data access between threads/processes, your level of concurrency will take a hit as threads wait on each other. This also assumes that you have a problem/computation that can relatively easily be parallelised, since the parallel version of an algorithm is often much more complex than the sequential version thereof.
That said, especially for CPU-bound computations with work units that are independent of each other, you'll most likely see a linear speed-up as you throw more threads at the problem. As you add serial segments and synchronisation blocks, this speed-up will tend to decrease.
I/O heavy computations would typically fare the worst in a multi-threaded environment, since access to the physical storage (especially if it's on the same controller, or the same media) is also serial, in which case threading becomes more useful in the sense that it frees up your other threads to continue with user interaction or CPU-based operations.
You might consider using programming languages designed for concurrent programming. Erlang and Go come to mind.

Multithreading in .NET 4.0 and performance

I've been toying around with the Parallel library in .NET 4.0. Recently, I developed a custom ORM for some unusual read/write operations one of our large systems has to use. This allows me to decorate an object with attributes and have reflection figure out what columns it has to pull from the database, as well as what XML it has to output on writes.
Since I envision this wrapper to be reused in many projects, I'd like to squeeze as much speed out of it as possible. This library will mostly be used in .NET web applications. I'm testing the framework using a throwaway console application to poke at the classes I've created.
I've now learned a lesson of the overhead that multithreading comes with. Multithreading causes it to run slower. From reading around, it seems like it's intuitive to people who've been doing it for a long time, but it's actually counter-intuitive to me: how can running a method 30 times at the same time be slower than running it 30 times sequentially?
I don't think I'm causing problems by multiple threads having to fight over the same shared object (though I'm not good enough at it yet to tell for sure or not), so I assume the slowdown is coming from the overhead of spawning all those threads and the runtime keeping them all straight. So:
Though I'm doing it mainly as a learning exercise, is this pessimization? For trivial, non-IO tasks, is multithreading overkill? My main goal is speed, not responsiveness of the UI or anything.
Would running the same multithreading code in IIS cause it to speed up because of already-created threads in the thread pool, whereas right now I'm using a console app, which I assume would be single-threaded until I told it otherwise? I'm about to run some tests, but I figure there's some base knowledge I'm missing to know why it would be one way or the other. My console app is also running on my desktop with two cores, whereas a server for a web app would have more, so I might have to use that as a variable as well.
Thread's don't actually all run concurrently.
On a desktop machine I'm presuming you have a dual core CPU, (maybe a quad at most). This means only 2/4 threads can be running at the same time.
If you have spawned 30 threads, the OS is going to have to context switch between those 30 threads to keep them all running. Context switches are quite costly, so hence the slowdown.
As a basic suggestion, I'd aim for 1 thread per CPU if you are trying to optimise calculations. Any more than this and you're not really doing any extra work, you are just swapping threads in an out on the same CPU. Try to think of your computer as having a limited number of workers inside, you can't do more work concurrently than the number of workers you have available.
Some of the new features in the .net 4.0 parallel task library allow you to do things that account for scalability in the number of threads. For example you can create a bunch of tasks and the task parallel library will internally figure out how many CPUs you have available, and optimise the number of threads is creates/uses so as not to overload the CPUs, so you could create 30 tasks, but on a dual core machine the TP library would still only create 2 threads, and queue the . Obviously, this will scale very nicely when you get to run it on a bigger machine. Or you can use something like ThreadPool.QueueUserWorkItem(...) to queue up a bunch of tasks, and the pool will automatically manage how many threads is uses to perform those tasks.
Yes there is a lot of overhead to thread creation, but if you are using the .net thread pool, (or the parallel task library in 4.0) .net will be managing your thread creation, and you may actually find it creates less threads than the number of tasks you have created. It will internally swap your tasks around on the available threads. If you actually want to control explicit creation of actual threads you would need to use the Thread class.
[Some cpu's can do clever stuff with threads and can have multiple Threads running per CPU - see hyperthreading - but check out your task manager, I'd be very surprised if you have more than 4-8 virtual CPUs on today's desktops]
There are so many issues with this that it pays to understand what is happening under the covers. I would highly recommend the "Concurrent Programming on Windows" book by Joe Duffy and the "Java Concurrency in Practice" book. The latter talks about processor architecture at the level you need to understand it when writing multithreaded code. One issue you are going to hit that's going to hurt your code is caching, or more likely the lack of it.
As has been stated there is an overhead to scheduling and running threads, but you may find that there is a larger overhead when you share data across threads. That data may be flushed from the processor cache into main memory, and that will cause serious slow downs to your code.
This is the sort of low-level stuff that managed environments are supposed to protect us from, however, when writing highly parallel code, this is exactly the sort of issue you have to deal with.
A colleague of mine recorded a screencast about the performance issue with Parallel.For and Parallel.ForEach which may help:
http://rocksolidknowledge.com/ScreenCasts.mvc/Watch?video=ParallelLoops.wmv
You're speaking of an ORM, so I presume some amount of I/O is going on. If this is the case, the overhead of thread creation and context switching is going to be comparatively non-existent.
Most likely, you're experiencing I/O contention: it can be slower (particularly on rotational hard drives, but also on other storage devices) to read the same set of data if you read it out of order than if you read it in-order. So, if you're executing 30 database queries, it's possible they'll run faster sequentially than in parallel if they're all backed by the same I/O device and the queries aren't in cache. Running them in parallel may cause the system to have a bunch of I/O read requests almost simultaneously, which may cause the OS to read little bits of each in turn - causing your drive head to jump back and forth, wasting precious milliseconds.
But that's just a guess; it's not possible to really determine what's causing your slowdown without knowing more.
Although thread creation is "extremely expensive" when compared to say adding two numbers, it's not usually something you'll easily overdo. If your operations are extremely short (say, a millisecond or less), using a thread-pool rather than new threads will noticeably save time. Generally though, if your operations are that short, you should reconsider the granularity of parallelism anyhow; perhaps you're better off splitting the computation into bigger chunks: for instance, by having a fairly low number of worker tasks which handle entire batches of smaller work-items at a time rather than each item separately.

Simultaneous Or Sequential writes-- Does it matter in terms of speed?

Simultaneous Or Sequential write operation-- Does it matter in terms of speed?
With multicore processor, does it make sense to parallelize all the file write operation using multi thread, just to get a boost of speed? Of course, all those write operations are independent.
Generally, no.
As of now, the physical write to disk IS the bottle neck by some orders of magnitude, and it is in most scenarios rather sequential. Parallelizing writes you have good chances to worsen performance by incurring seeks. Sequential reads and writes will largely outperform interleaving n most cases.
Per-disk parallelization (TCQ and NCQ) mainly work by reducing the seeks that are naturally required when different clients concurrently request data from different sections of the disk. If you can avoid these seeks in the first place, you are better off.
I some scenarios - RAID 1, JBOD or when different streams of data arrive rather slowly - the right scheduling can improve your throughput, but that requires intimate knowledge of the hardware at hand, and other processes not spoiling your fun.
At best, you can leave that as a decision to the end user (e.g. give an option to turn it off), and provide performance measures to guide him. (You might even prove me wrong ;))
That depends on the disks and their controller. Do they have TCQ/NCQ? Is it RAID?
If so that might make some sense. With one regular SATA disk w/o NCQ, it won't.
Write the simplest code first, and see whether that performs well enough with the target environment. (Different disks, operating system versions, CPUs, drivers etc may well affect the result significantly.)
If the simplest correct code isn't fast enough, then it makes sense to try to work out faster ways of performing IO. At a guess, it might make sense to parallelize the write operations if you're writing to different disks, but possibly not otherwise. That's only a complete guess though.
Purely by coincidence, I'm planning to benchmark a related situation soon. I have a blog post describing the tests I intend to perform, and will update the entry with a link to results when I've got some. It's not quite the same as what you're describing, but close enough to perhaps be of interest.
Technically, you can mmap a file and have multiple threads write to it, but the disk will probably still create a bottleneck.
If you need maximize I/O throughput, a starting point would be to investigate the asynchronous I/O your environment supports.
This is a simple question, but the answer can be really really complicated. Les try to narrow down the scenario with some assumptions: The OS is Windows, you have a relatively large number of writes that are truly independent.
You can skip the multi-threading by simply issuing the writes asynchronously.
Issue them all at once - let the OS schedule the writes
It doesn't matter if the writes are to the same file or to different files. Note, this is only true if the above assumption about the writes being independent is true.
Worst case, this won' be any slower than a single plain old every day disk on a parallel ATA controller: it will be slow.
Best case, the OS can schedule the writes very efficiency. This would be true in the case of a storage system with lots of spindles, or with a disk that supports NCQ.
The key thing to remember here is that disk I/O (in general) isn't CPU bound, so going out of your way to use multi-core won't help you; it will just make life complex.
Note, you can help things if you order the writes so they are sequential in a file (overall) or sequential on the disk by sorting them by their extent.
If you are talking about writing to one file, the answer is no. You can't parallelize writing to one file since every process or thread has to acquire a lock for the file from the OS to do writes.
Otherwize this has to depend on the hardware controllers and type of storage, the OS kernel and filesystem implementation.

Resources