share python object between multiprocess in python3 - multithreading

Here I create a producer-customer program,the parent process(producer) create many child process(consumer),then parent process read file and pass data to child process.
but , here comes a performance problem,pass message between process cost too much time (I think).
for an example ,a 200MB original data ,parent process read and pretreat will cost less then 8 seconds , than just pass data to child process by multiprocess.pipe will cost another 8 seconds , and child processes do the remain work just cost another 3 ~ 4 seconds.
so ,a complete work flow cost less than 18 seconds ,and more than 40% time cost on communication between process , it is much bigger than I used think about ,and I tried multiprocess.Queue and Manager ,they are worse.
I works with windows7 / Python3.4.
I had google for several days , and POSH maybe a good solution , but it can't build with python3.4
there I have 3 ways:
1.is there any way can share python object direct between process in Python3.4 ? as POSH
or
2.is it possable pass the "pointer" of an object to child process and child process can recovery the "pointer" to python object?
or
3.multiprocess.Array may be a valid solution , but if I want share complex data structure, such as list, how it works? should I make a new class base on it and provide interfaces as list?
Edit1:
I tried the 3rd way,but it works worse.
I defined those value:
p_pos = multiprocessing.Value('i') #producer write position
c_pos = multiprocessing.Value('i') #customer read position
databuff = multiprocess.Array('c',buff_len) # shared buffer
and two function:
send_data(msg)
get_data()
in send_data function(parent process),it copies msg to databuff , and send the start and end position (two integer)to child process via pipe.
than in get_data function (child process) ,it received the two position and copy the msg from databuff.
in final,it cost twice than just use pipe #_#
Edit 2:
Yes , I tried Cython ,and the result looks good.
I just changed my python script's suffix to .pyx and compile it ,and the program speed up for 15%.
No doubt , I met the " Unable to find vcvarsall.bat" and " The system cannot find the file specified" error , and I cost whole day for solved the first one , and blocked by the second one.
Finally , I found Cyther , and all troubles gone ^_^.

I was at your place five month ago. I looked around few times but my conclusion is multiprocessing with Python has exactly the problem you describe :
Pipes and Queue are good but not for big objects from my experience
Manager() proxies objects are slow except arrays and those one are limited. if you want to share a complex data structure use a Namespace like it is done here : multiprocessing in python - sharing large object (e.g. pandas dataframe) between multiple processes
Manager() has a shared list you are looking for : https://docs.python.org/3.6/library/multiprocessing.html
There are no pointers or real memory management in Python, so you can't share selected memory cells
I solved this kind of problem by learning C++, but it's probably not what you want to read...

To pass data (especially big numpy arrays) to a child process, I think mpi4py can be very efficient since I can work directly on buffer-like objects.
An example of using mpi4py to spawn processes and communicate (using also trio, but it is another story) can be found here.

Related

USB serial data, network, database and GUI, should I go for multiprocessing in Python?

I am designing a program in Python which
reads data via usb in two second interval from arduino to a Sqlite table (128kB each read outs)
Process the incoming data and store the results on another table
finally query the data on the table and show them on GUI created by tkinter and send the same data on the network to a server.
The question is for which part should I use multiprocessing or threading? Do I even need them? If I run first part from a separate Python file on background does it use necessarily different cpu core?
EDIT:
I found about pickling, now the question is:
is it good idea to pickle a 1kb string every 3 seconds, of course in the ramdrive? and depickle in another script.
I tested already two script and it is working, but I am not sure if this solution can be used in long term running?
It looks promising! specially when I dont see my selft stuck in multithreading or multiprocessing modules and seems like OS will assign necessary cores and threads.

Managing different Python scripts with different priority to each

I need to run different Python processes, in a certain order of priority.
Specifically, I have 3 processes, and I need them to work this way:
An object detection script, used to locate a person and their position. I need this one to run continuously at a high FPS;
another process that, once some conditions are met (when the person is present in the picture in the required position) starts taking screenshots of the image for a certain amount of time;
another script that analyzes the screenshots taken by the second one.
I wrote the 3 scripts already and they work fine, but the problem is that process 3 is particularly computationally demanding, and I don't want it to prevent processes 1 and 2 from running smoothly.
My idea is that I could give highest priority to process 1, and send screenshots taken by process 2...to a queue, or something like this.
When the person is not detected in the picture, I could run process 3, and empty the queue as the screenshots are analyzed. However, script 3 should still run with limited resources, so that FPS of script 1 isn't affected too much, and it can still detect if the person enters the picture again.
I'm afraid this might all be a little vague, but could you please suggest me a way or tool I could use to manage the processes this way?
So far, I tried simply saving the screenshots to a folder, but I don't know how to limit the resources usage by process 3.
I'm familiar with the basic usage of Docker, so I was thinking that maybe I could:
run the processes in different containers, limiting resources allocated to the 3rd one (?);
use a message broker (Kafka, RabbitMQ?) to store screenshots;
but again, I'm a newbie when it comes to this stuff (speaking of which, I hope I tagged this question correctly), so I don't know if it's an efficient way to to do this (or if it can be done this way, for that matter).

I/O or CPU bound? How to check if running concurrently?

I’m new to Python and I'm struggling to understand some things in multiprocessing/threading. I want to speed up a function and have been trying different approaches from the multiprocessing module, but I can’t get it to run any faster. It’s possible it won’t run any faster, but I wanted to be sure this is the case before giving up. This isn’t a full description, but the most time-consuming activities are:
-repeatedly generating random data (10,000 rows and 10 columns)
-using a pre-fit model to predict an outcome for each row and
-comparing each predicted value to an initial value.
It performs this multiple times depending on how many of the predicted values equal the initial value, updating the parameters of the distribution each time. The output of the function is a single numeric value.
I want to loop over several of these initial values and end up with a list of the output values. I was hoping to get multiple iterations to run concurrently (but I’m open to anything that could make it faster). I’ve been ignorantly attempting pool.apply, starmap and Process but haven’t seen a change in time.
My questions are:
Based on the description of what I’m doing, is my program I/O or CPU bound? (Is it possible to tell from that? Is this even the right question to be asking?)
Should I be using multithreading or multiprocessing?
How can I determine if the iterations are running concurrently or not?
Given you didn't mention anything about drives, I'm going to assume it's not very IO bound (although still possible). Are you using multiple threads/processes yet? If not, that's definitely your issue.
I'd probably look at Pythons Thread library and because of the loop to create data, maybe the thread pool library. You just need all of your threads running that rand function at the same time.
EDIT: I forgot to mention. If you open Task Manager/System Monitor, you should be able to see load per CPU/Thread. If only one is maxed at any given time, you aren't concurrent.
Example: I wrote a quick example to help with the thread pool. Your 10,000 item list with 10 columns was not even noticeable on my i7. I increased the columns to 10,000 and it used 4GB of RAM and probably 30 seconds of 100% CPU # 3.4GHz.
from multiprocessing import Pool, Array
import random
def thread_function(_):
"""Return a random number."""
l = []
for _ in range(10000):
l.append(random.randint(0, 10000))
return l
if __name__ == '__main__':
rand_list = Array('i', range(10000))
with Pool() as pool:
rand_list = pool.map(thread_function, rand_list)
print(len(rand_list))

processing data in parallel python

I have a script, parts of which at some time able to run in parallel. Python 3.6.6
The goal is to decrease execution time at maximum.
One of the parts is connection to Redis, getting the data for two keys, pickle.loads for each and returning processed objects.
What’s the best solution for such a tasks?
I’ve tried Queue() already, but Queue.get_nowait() locks the script, and after {process}.join() it also stops execution even though the task is done. Using pool.map raises TypeError: can't pickle _thread.lock objects.
All I could achieve is parallel running of all parts but still cannot connect the results
cPickle.load() will release the GIL so you can use it in multiple threads easily. But cPickle.loads() will not, so don't use that.
Basically, put your data from Redis into a StringIO then cPickle.load() from there. Do this in multiple threads using concurrent.futures.ThreadPoolExecutor.

Designing concurrency in a Python program

I'm designing a large-scale project, and I think I see a way I could drastically improve performance by taking advantage of multiple cores. However, I have zero experience with multiprocessing, and I'm a little concerned that my ideas might not be good ones.
Idea
The program is a video game that procedurally generates massive amounts of content. Since there's far too much to generate all at once, the program instead tries to generate what it needs as or slightly before it needs it, and expends a large amount of effort trying to predict what it will need in the near future and how near that future is. The entire program, therefore, is built around a task scheduler, which gets passed function objects with bits of metadata attached to help determine what order they should be processed in and calls them in that order.
Motivation
It seems to be like it ought to be easy to make these functions execute concurrently in their own processes. But looking at the documentation for the multiprocessing modules makes me reconsider- there doesn't seem to be any simple way to share large data structures between threads. I can't help but imagine this is intentional.
Questions
So I suppose the fundamental questions I need to know the answers to are thus:
Is there any practical way to allow multiple threads to access the same list/dict/etc... for both reading and writing at the same time? Can I just launch multiple instances of my star generator, give it access to the dict that holds all the stars, and have new objects appear to just pop into existence in the dict from the perspective of other threads (that is, I wouldn't have to explicitly grab the star from the process that made it; I'd just pull it out of the dict as if the main thread had put it there itself).
If not, is there any practical way to allow multiple threads to read the same data structure at the same time, but feed their resultant data back to a main thread to be rolled into that same data structure safely?
Would this design work even if I ensured that no two concurrent functions tried to access the same data structure at the same time, either for reading or for writing?
Can data structures be inherently shared between processes at all, or do I always explicitly have to send data from one process to another as I would with processes communicating over a TCP stream? I know there are objects that abstract away that sort of thing, but I'm asking if it can be done away with entirely; have the object each thread is looking at actually be the same block of memory.
How flexible are the objects that the modules provide to abstract away the communication between processes? Can I use them as a drop-in replacement for data structures used in existing code and not notice any differences? If I do such a thing, would it cause an unmanageable amount of overhead?
Sorry for my naivete, but I don't have a formal computer science education (at least, not yet) and I've never worked with concurrent systems before. Is the idea I'm trying to implement here even remotely practical, or would any solution that allows me to transparently execute arbitrary functions concurrently cause so much overhead that I'd be better off doing everything in one thread?
Example
For maximum clarity, here's an example of how I imagine the system would work:
The UI module has been instructed by the player to move the view over to a certain area of space. It informs the content management module of this, and asks it to make sure that all of the stars the player can currently click on are fully generated and ready to be clicked on.
The content management module checks and sees that a couple of the stars the UI is saying the player could potentially try to interact with have not, in fact, had the details that would show upon click generated yet. It produces a number of Task objects containing the methods of those stars that, when called, will generate the necessary data. It also adds some metadata to these task objects, assuming (possibly based on further information collected from the UI module) that it will be 0.1 seconds before the player tries to click anything, and that stars whose icons are closest to the cursor have the greatest chance of being clicked on and should therefore be requested for a time slightly sooner than the stars further from the cursor. It then adds these objects to the scheduler queue.
The scheduler quickly sorts its queue by how soon each task needs to be done, then pops the first task object off the queue, makes a new process from the function it contains, and then thinks no more about that process, instead just popping another task off the queue and stuffing it into a process too, then the next one, then the next one...
Meanwhile, the new process executes, stores the data it generates on the star object it is a method of, and terminates when it gets to the return statement.
The UI then registers that the player has indeed clicked on a star now, and looks up the data it needs to display on the star object whose representative sprite has been clicked. If the data is there, it displays it; if it isn't, the UI displays a message asking the player to wait and continues repeatedly trying to access the necessary attributes of the star object until it succeeds.
Even though your problem seems very complicated, there is a very easy solution. You can hide away all the complicated stuff of sharing you objects across processes using a proxy.
The basic idea is that you create some manager that manages all your objects that should be shared across processes. This manager then creates its own process where it waits that some other process instructs it to change the object. But enough said. It looks like this:
import multiprocessing as m
manager = m.Manager()
starsdict = manager.dict()
process = Process(target=yourfunction, args=(starsdict,))
process.run()
The object stored in starsdict is not the real dict. instead it sends all changes and requests, you do with it, to its manager. This is called a "proxy", it has almost exactly the same API as the object it mimics. These proxies are pickleable, so you can pass as arguments to functions in new processes (like shown above) or send them through queues.
You can read more about this in the documentation.
I don't know how proxies react if two processes are accessing them simultaneously. Since they're made for parallelism I guess they should be safe, even though I heard they're not. It would be best if you test this yourself or look for it in the documentation.

Resources