I have a function that creates some results for a list of tasks. I would like to save the results on the fly to 1) release memory compared to saving to appending to a results_list and 2) have the results of the first part in case of errors.
Here is a very short sample code:
for task in task_list:
result = do_awesome_stuff_to_task(task)
save_nice_results_to_db(result) # Send this job to another process and let the main process continue
Is there a way for the main process to create results for each task in task_list and each time a result is create send this to another processor/thread to save it, so the main loop can continue without waiting for the slow saving process?
I have looked at multiprocessing, but that seems mostly to speed up the loop over task_list rather than allow a secondary sub process to do other parts of the work. I have also looked into asyncio, but that seems mostly used for I/O.
All in all, I am looking for a way to have a main process looping over the task_list. For each task finished I would like to send the results to another subprocess to save the results. Notice, the do_awesome_stuff_to_task is much faster than savings process, hence, the main loop will have reached through multiple task before the first task is saved. I have thought of two ways of tackling this:
Use multiple sub process to save
Save every xx iteration - the save_results scale okay, so perhaps the save process can save xx iteration at a time while the main loop continuous?
Is this possible to do with Python? Where to look and what key considerations to take?
All help is appreciated.
It's hard to know what will be faster in your case without testing, but here's some thoughts on how to choose what to do.
If save_nice_results_to_db is slow because it's writing data to disk or network, make sure you aren't already at the maximum write speed of your hardware. Depending on the server at the other end, network traffic can sometimes benefit greatly from opening multiple ports at once to read/write so long as you stay within your total network transfer speed (of the mac interface as well as your ISP). SSD's can see some limited benefit from initiating multiple reads/writes at once, but too many will hurt performance. HDD's are almost universally slower when trying to do more than one thing at once. Everything is more efficient reading/writing larger chunks at a time.
multiprocessing must typically transfer data between the parent and child processes using pickle because they don't share memory. This has a high overhead, so if result is a large object, you may waste more time with the added overhead of sending the data to a child process than you could save by any sort of concurrency. (emphasis on the may. always test for yourself). As of 3.8 the shared_memory module was added which may be somewhat more efficient, but is much less flexible and easy to use.
threading benefits from all threads sharing memory so there is zero transfer overhead to "send" data between threads. Python threads however cannot execute bytecode concurrently due to the GIL (global interpreter lock), so multiple CPU cores cannot be leveraged to increase computation speed. This is due to python itself having many parts which are not thread-safe. Specific functions written in c may release this lock to get around this issue and leverage multiple cpu cores using threads, but once execution returns to the python interpreter, that lock is held again. Typically functions involving network access or file IO can release the GIL, as the interpreter is waiting on an operating system call which is usually thread safe. Other popular libraries like Numpy also make an effort to release the GIL while doing complex math operations on large arrays. You can only release the GIL from c/c++ code however, and not from python itself.
asyncio should get a special mention here, as it's designed specifically with concurrent network/file operations in mind. It uses coroutines instead of threads (even lower overhead than threads, which themselves are much lower overhead than processes) to queue up a bunch of operations, then uses an operating system call to wait on any of them to finish (event loop). Using this would also require your do_awesome_stuff_to_task to happen in a coroutine for it to happen at the same time as save_nice_results_to_db.
A trivial example of firing each result off to a thread to be processed:
for task in task_list:
result = do_awesome_stuff_to_task(task)
threading.Thread(target=save_nice_results_to_db, args=(result,)).start() # Send this job to another process and let the main process continue
Related
I am modelling and solving a nonlinear program (NLP) using single-threaded CPLEX with AMPL (I am constraining CPLEX to use only one thread explicitly) in CentOS 7. I am using a processor with 6 independent cores (intel i7 8700) to solve 6 independent test instances.
When I run these tests sequentially, it is much faster than when I run these 6 instances concurrenctly (about 63%) considering time elapsed. They are executed in independent processes, reading distinct data files, and writting results in distinct output files. I have also tried to solve these tests sequentially with multithread, and I got similar times to those cases with only one thread sequentially.
I have checked the behaviour of these processes using top/htop. They get different processors to execute. So my question is how the execution of these tests concurrently would get so much impact on time elapsed if they are solving in different cores with only one thread and they are individual processes?
Any thoughts would be appreciated.
It's very easy to make many threads perform worse than a single thread. The key to successful multi-threading and speedup is to understand not just the fact that the program is multi-threaded, but to know exactly how your threads interact. Here are a few questions you should ask yourself as you review your code:
1) Do the individual threads share resources? If so what are those resources and when you are accessing them do they block other threads?
2) What's the slowest resource your multi-threaded code relies on? A common bottleneck (and oft neglected) is disk IO. Multiple threads can process data much faster but they won't make a disk read faster and in many cases multithreading can make it much worse (e.g. thrashing).
3) Is access to common resources properly synchronized?
To this end, and without knowing more about your problem, I'd recommend:
a) Not reading different files from different threads. You want to keep your disk IO as sequential as possible and this is easier from a single thread. Maybe batch read files from a single thread and then farm them out for processing.
b) Keep your threads as autonomous as possible - any communication back and forth will cause thread contention and slow things down.
I am using python 2.7 .I am using multi-threading.Now if a thread dies I again
create one to compensate for it.So should I create a lot of threads before hand and store them
and use from them when one or more existing threads die or should I create one when some thread dies??
Which is more efficient in terms of time ??
When you say a thread "dies", do you mean you intentionally terminate it or it fails due to error?
If you're intentionally terminating it and you're worried about the time required to spawn a new thread, why not keep the thread persistent and simply have it do the job that the new thread would have done? This is a pretty standard approach - maintain a pool of "worker" threads and have a work queue with pending items to execute. They all run an identical loop which is to pull an item off the queue and execute it. These items can be objects with methods which contain the code to execute if it's convenient to work that way - if the tasks are all very similar then it might be easier to put the code into the thread's own function instead.
If you're talking about threads failing due to error, I wouldn't have imagined this was common enough to worry about it. If it is, you probably need to look at making your code more robust.
In either case, spawning a thread on most systems should be a lightweight activity - a lot more lightweight than spawning a whole new process, for example. As a result, I really wouldn't worry about keeping a pool of threads in reserve to use - that really sounds like early optimisation to me.
Even if spawning threads were slow, consider what you would be doing by spawning threads in advance - you would be taking up more memory (some memory in the OS to keep track of a the thread, some in Python for the objects that it uses to track the thread), although not a great deal; you'd also be spending more time at the start of your program creating all these threads. So, you might save a little time while you were running, but instead your program takes significantly longer to start. That doesn't sound like a sensible trade-off to me unless the speed and latency of your code is absolutely critical while it's running, and if speed is that critical then I'm not sure a pure Python solution is the right approach anyway. Something like C/C++ is going to give you better control of scheduling, at the expense of much more complexity.
In summary: seriously, don't worry about it, just spawn threads as you need them. Trust me, there will be much bigger speed problems elsewhere in your code which are much more deserving of your time.
I read many answers given here for questions related to thread safety, re-entrancy, but when i think about them, some more questions came to mind, hence this question/s.
1.) I have one executable program say some *.exe. If i run this program on command prompt, and while it is executing, i run the same program on another command prompt, then in what conditions the results could be corrupted, i.e. should the code of this program be re-entrant or it should be thread safe alone?
2.) While defining re-entrancy, we say that the routine can be re-entered while it is already running, in what situations the function can be re-entered (apart from being recursive routine, i am not talking recursive execution here). There has to be some thread to execute the same code again, or how can that function be entered again?
3.) In a practical case, will two threads execute same code, i.e. perform same functionality. I thought the idea of multi-threading is to execute different functionality, concurrently(on different cores/processors).
Sorry if these queries seem different, but they all occured to me, same time when i read about the threadsafe Vs reentrant post on SO, hence i put them together.
Any pointers, reading material will be appreciated.
thanks,
-AD.
I'll try to explain these, in order:
Each program runs in its own process, and gets its own isolated memory space. You don't have to worry about thread safety in this situation. (However, if the processes are both accessing some other shared resource, such as a file, you may have different issues. For example, process 1 may "lock" the data file, preventing process 2 from being able to open it).
The idea here is that two threads may try to run the same routine at the same time. This is not always valid - it takes special care to define a class or a process in a way that multiple threads can use the same instance of the same class, or the same static function, without errors occurring. This typically requires synchronization in the class.
Two threads often execute the same code. There are two different conceptual ways to parition your work when threading. You can either think in terms of tasks - ie: one thread does task A while another does task B. Alternatively, you can think in terms of decomposing the the problem based on data. In this case, you work with a large collection, and each element is processed using the same routine, but the processing happens in parallel. For more info, you can read this blog post I wrote on Decomposition for Parallelism.
Two processes cannot share memory. So thread-safety is moot here.
Re-entrancy means that a method can be safely executed by two threads at the same time. This doesn't require recursion - threads are separate units of execution, and there is nothing keeping them both from attempting to run the same method simultaneously.
The benefits to threading can happen in two ways. One is when you perform different types of operations concurrently (like running cpu-intensive code and I/O-intensive code at the ame time). The other is when you can divide up a long-running operation among multiple processors. In this latter case, two threads may be executing the same function at the same time on different input data sets.
First of all, I strongly suggest you to look at some basic stuffs of computer system, especially how a process/thread is executing on CPU and scheduled by operating system. For example, virtual address, context switching, process/thread concepts(e.g., each thread has its own stack and register vectors while heap is shared by threads. A thread is an execution and scheduling unit, so it maintains control flow of code..) and so on. All of the questions are related to understanding how your program is actually working on CPU
1) and 2) are already answered.
3) Multithreading is just concurrent execution of any arbitrary thread. The same code can be executed by multiple threads. These threads can share some data, and even can make data races which are very hard to find. Of course, many times threads are executing separate code(we say it as thread-level parallelism).
In this context, I have used concurrent as two meaning: (a) in a single processor, multiple threads are sharing a single physical processor, but operating system gives a sort of illusion that threads are running concurrently. (b) In a multicore, yes, physically two or more threads can be executed concurrently.
Having concrete understanding of concurrent/parallel execution takes quite long time. But, you already have a solid understanding!
JDK's concurrency package, Boost's thread library, Perl's Thread library (not in Python though) all implement barrier, I haven't come across a need for using a barrier, so wondering what would a typical use case be in multi-threaded applications.
Barriers can be used all over the place through contrived examples but you'll often seem them in a scatter/reduce method where the results of the different threads are all required before proceeding.
If you wanted to parallelize a sort for instance you could split the list n times and start n threads to sort their section and pause, when they're all finished they would die letting the parent know that it's finally okay to combine the sorted chunks. (I know there are better ways but it's one implementation).
The other place I've seen it is in parallelized networking where you have to send a certain amount of data per payload. So the interface will start up n buckets and wait for them all to fill before sending out the transmission. When you think about a partitioned T1 line it sorta makes sense, sending one burst of data over the 64 multiplexed partitions would be better than sending 1 partition's data (which essentially costs the same since the packet has to be padded with 0's.)
Hope those are some things to get you thinking about it the problem!
Example: a set of threads work concurrently to compute a result-set and the said result-set (in part/total) is required as input for the next stage of processing to some/all threads at the "barrier".
A barrier makes it easier to synchronize multiple threads without having to craft a solution around multiple conditions & mutexes.
I can't say I have seen barriers often though. At some point, as the number of threads grows, it might be worthwhile considering a more "decoupled" system as to manage possible dead-locks.
MSDN: A Barrieris an object that prevents individual tasks in a parallel operation from continuing until all tasks reach the barrier. It is useful when a parallel operation occurs in
phases, and each phase requires synchronization between tasks.
Found here
When performing many disk operations, does multithreading help, hinder, or make no difference?
For example, when copying many files from one folder to another.
Clarification: I understand that when other operations are performed, concurrency will obviously make a difference. If the task was to open an image file, convert to another format, and then save, disk operations can be performed concurrently with the image manipulation. My question is when the only operations performed are disk operations, whether concurrently queuing and responding to disk operations is better.
Most of the answers so far have had to do with the OS scheduler. However, there is a more important factor that I think would lead to your answer. Are you writing to a single physical disk, or multiple physical disks?
Even if you parallelize with multiple threads...IO to a single physical disk is intrinsically a serialized operation. Each thread would have to block, waiting for its chance to get access to the disk. In this case, multiple threads are probably useless...and may even lead to contention problems.
However, if you are writing multiple streams to multiple physical disks, processing them concurrently should give you a boost in performance. This is particularly true with managed disks, like RAID arrays, SAN devices, etc.
I don't think the issue has much to do with the OS scheduler as it has more to do with the physical aspects of the disk(s) your writing to.
That depends on your definition of "I/O bound" but generally multithreading has two effects:
Use multiple CPUs concurrently (which won't necessarily help if the bottleneck is the disk rather than the CPU[s])
Use a CPU (with a another thread) even while one thread is blocked (e.g. waiting for I/O completion)
I'm not sure that Konrad's answer is always right, however: as a counter-example, if "I/O bound" just means "one thread spends most of its time waiting for I/O completion instead of using the CPU", but does not mean that "we've hit the system I/O bandwidth limit", then IMO having multiple threads (or asynchronous I/O) might improve performance (by enabling more than one concurrent I/O operation).
I would think it depends on a number of factors, like the kind of application you are running, the number of concurrent users, etc.
I am currently working on a project that has a high degree of linear (reading files from start to finish) operations. We use a NAS for storage, and were concerned about what happens if we run multiple threads. Our initial thought was that it would slow us down because it would increase head seeks. So we ran some tests and found out that the ideal number of threads is the same as the number of cores in the computer.
But your mileage may vary.
It can do, simply because whenever there is more work for a thread to do (identifying the next file to copy) the OS wakes it up, so threads are a simple way to hook into the OS scheduler and yet still write code in a traditional sequential way, instead of having to break it up into a state machine with callbacks.
This is mainly an assistance with clear programming rather than performance.
In most cases, using multi-thread for disk IO will not benefit efficiency. Let's imagine 2 circumstances:
Lock-Free File: We can split the file for each thread by giving them different IO offset. For instance, a 1024B bytes file is split into n pieces and each thread writes the 1024/n respectively. This will cause a lot of verbose disk head movement because of the different offset.
Lock File: Actually lock the IO operation for each critical section. This will cause a lot of verbose thread switches and it turns out that only one thread can write the file simultaneously.
Correct me if I' wrong.
No, it makes no sense. At some point, the operations have to be serialized (by the OS). On the other hand, since modern OS's have to cope with multiple processes anyway I doubt that there's an added overhead.
I'd think it would hinder the operations... You only have one controller and one drive.
You could use a second thread to do the operation, and a main thread that shows an updated UI.
I think it could worsen the performance, because the multiple threads will compete for the same resources.
You can test the impact of doing concurrent IO operations on the same device by copying a set of files from one place to another and measuring the time, then split the set in two parts and make the copies in parallel... the second option will be sensibly slower.