python apply_async does not call method - python-3.x

I have a method which needs to process through a large database, that would take hours/days to dig through
The arguments are stored in a (long) list of which max X should be processed in one batch. The method does not need to return anything, yet i return "True" for "fun"...
The function is working perfectly when I'm iterating through it linearly (generating/appending the results in other tables not seen here), yet I am unable to get apply_async or map_async work. (it worked before in other projects)
Any hint of what might I be doing wrong would be appreciated, thanks in advance!
See code below:
import multiprocessing as mp
class mainClass:
#loads of stuff
def main():
multiprocess = True
batchSize = 35
mC = mainClass()
while True:
toCheck = [key for key, value in mC.lCheckSet.items()] #the tasks are stored in a dictionary, I'm referring to them with their keys, which I turn to a list here for iteration.
if multiprocess == False:
#this version works perfectly fine
for i in toCheck[:batchSize]:
mC.check(i)
else:
#the async version does not, either with apply_async...
with mp.Pool(processes = 8) as pool:
temp = [pool.apply_async(mC.check, args=(toCheck[n],)) for n in range(len(toCheck[:batchSize]))]
results = [t.get() for t in temp]
#...or as map_async
pool = mp.Pool(processes = 8)
temp = pool.map_async(mC.check, toCheck[:batchSize])
pool.close()
pool.join()
if __name__=="__main__":
main()

The "smell" here is that you are instantiating your maincClass on the main Process, just once, and then trying to call a method on it on the different processes - but note that when you pass mC.check to your process pool, it is a method already bound to the class instantiated in this process.
I'd guess there is where your problem lies. Although that could possibly work - and it does - I made this simplified version and it works as intended :
import multiprocessing as mp
import random, time
class MainClass:
def __init__(self):
self.value = 1
def check(self, arg):
time.sleep(random.uniform(0.01, 0.3))
print(id(self),self.value, arg)
def main():
mc = MainClass()
with mp.Pool(processes = 4) as pool:
temp = [pool.apply_async(mc.check, (i,)) for i in range(8)]
results = [t.get() for t in temp]
main()
(Have you tried just adding some prints to make sure the method is not running at all?)
So, the problem lies likely in some complex state in your MainClass that does not make it to the parallel processes in a good way. A possible work-around is to instantiate your mainclasses inside each process - that can be easily done since MultiProcessing allow you to get the current_process, and use this object as a namespace to keep data in the process instantiated in the worker Pool, across different calls to apply async.
So, create a new check function like the one bellow - and instead of instantiating your mainclass in the mainprocess, instantiate it inside each process in the pool:
import multiprocessing as mp
import random, time
def check(arg):
process = mp.current_process
if not hasattr(process, "main_class"):
process.main_class = MainClass()
process.main_class.check(arg)
class MainClass:
def __init__(self):
self.value = random.randrange(100)
def check(self, arg):
time.sleep(random.uniform(0.01, 0.3))
print(id(self),self.value, arg)
def main():
mc = MainClass()
with mp.Pool(processes = 2) as pool:
temp = [pool.apply_async(check, (i,)) for i in range(8)]
results = [t.get() for t in temp]
main()

I got to this question with the same problem, my apply_async calls not called at all, but the reason on my case was that the parameters number on apply_async call was different to the number on function declaration

Related

Why serial code is faster than concurrent.futures in this case?

I am using the following code to process some pictures for my ML project and I would like to parallelize it.
import multiprocessing as mp
import concurrent.futures
def track_ids(seq):
'''The func is so big I can not put it here'''
ood = {}
for i in seq:
# I load around 500 images and process them
ood[i] = some Value
return ood
seqs = []
for seq in range(1, 10):# len(seqs)+1):
seq = txt+str(seq)
seqs.append(seq)
# serial call of the function
track_ids(seq)
#parallel call of the function
with concurrent.futures.ProcessPoolExecutor(max_workers=mp.cpu_count()) as ex:
ood_id = ex.map(track_ids, seqs)
if I run the code serially it takes 3.0 minutes but for parallel with concurrent, it takes 3.5 minutes.
can someone please explain why is that? and present a way to solve the problem.
btw, I have 12 cores.
Thanks
Here's a brief example of how one might go about profiling multiprocessing code vs serial execution:
from multiprocessing import Pool
from cProfile import Profile
from pstats import Stats
import concurrent.futures
def track_ids(seq):
'''The func is so big I can not put it here'''
ood = {}
for i in seq:
# I load around 500 images and process them
ood[i] = some Value
return ood
def profile_seq():
p = Profile() #one and only profiler instance
p.enable()
seqs = []
for seq in range(1, 10):# len(seqs)+1):
seq = txt+str(seq)
seqs.append(seq)
# serial call of the function
track_ids(seq)
p.disable()
return Stats(p), seqs
def track_ids_pr(seq):
p = Profile() #profile the child tasks
p.enable()
retval = track_ids(seq)
p.disable()
return (Stats(p, stream="dummy"), retval)
def profile_parallel():
p = Profile() #profile stuff in the main process
p.enable()
with concurrent.futures.ProcessPoolExecutor(max_workers=mp.cpu_count()) as ex:
retvals = ex.map(track_ids_pr, seqs)
p.disable()
s = Stats(p)
out = []
for ret in retvals:
s.add(ret[0])
out.append(ret[1])
return s, out
if __name__ == "__main__":
stat, retval = profile_parallel()
stat.print_stats()
EDIT: Unfortunately I found out that pstat.Stats objects cannot be used normally with multiprocessing.Queue because it is not pickleable (which is needed for the operation of concurrent.futures). Evidently it normally will store a reference to a file for the purpose of writing statistics to that file, and if none is given, it will by default grab a reference to sys.stdout. We don't actually need that reference however until we actually want to print out the statistics, so we can just give it a temporary value to prevent the pickle error, and then restore an appropriate value later. The following example should be copy-paste-able and run just fine rather than the pseudocode-ish example above.
from multiprocessing import Queue, Process
from cProfile import Profile
from pstats import Stats
import sys
def isprime(x):
for d in range(2, int(x**.5)):
if x % d == 0:
return False
return True
def foo(retq):
p = Profile()
p.enable()
primes = []
max_n = 2**20
for n in range(3, max_n):
if isprime(n):
primes.append(n)
p.disable()
retq.put(Stats(p, stream="dummy")) #Dirty hack: set `stream` to something picklable then override later
if __name__ == "__main__":
q = Queue()
p1 = Process(target=foo, args=(q,))
p1.start()
p2 = Process(target=foo, args=(q,))
p2.start()
s1 = q.get()
s1.stream = sys.stdout #restore original file
s2 = q.get()
# s2.stream #if we are just adding this `Stats` object to another the `stream` just gets thrown away anyway.
s1.add(s2) #add up the stats from both child processes.
s1.print_stats() #s1.stream gets used here, but not before. If you provide a file to write to instead of sys.stdout, it will write to that file)
p1.join()
p2.join()

Slow multiprocessing when parent object contains large data

Consider the following snippet:
import numpy as np
import multiprocessing as mp
import time
def work_standalone(args):
return 2
class Worker:
def __init__(self):
self.data = np.random.random(size=(10000, 10000))
# leave a trace whenever init is called
with open('rnd-%d' % np.random.randint(100), 'a') as f:
f.write('init called\n')
def work_internal(self, args):
return 2
def _run(self, target):
with mp.Pool() as pool:
tasks = [[idx] for idx in range(16)]
result = pool.imap(target, tasks)
for res in result:
pass
def run_internal(self):
self._run(self.work_internal)
def run_standalone(self):
self._run(work_standalone)
if __name__ == '__main__':
t1 = time.time()
Worker().run_standalone()
t2 = time.time()
print(f'Standalone took {t2 - t1:.3f} seconds')
t3 = time.time()
Worker().run_internal()
t4 = time.time()
print(f'Internal took {t3 - t4:.3f} seconds')
I.e. we have an object containing a large variable that uses multiprocessing to parallelize some work that has nothing to do with that large variable, i.e. does not read from or write to. The location of the worker process has a huge impact on the runtime:
Standalone took 0.616 seconds
Internal took 19.917 seconds
Why is this happening? I am completely lost. Note that __init__ is only called twice, so the random data is not created for every new process in the pool. The only reason I can think of why this would be slow is that data is copied around, but that would not make sense since it is never used anywhere, and python is supposed to use copy-on-write semantics. Also note that the difference disappears if you make run_internal a static method.
The issue you have is due to the target you are calling from the pool. That target is the function with the reference to Worker instance.
Now, you're right that the __init__() is only called twice. But remember, when you send anything to and from the processes, python will need to pickle the data first.
So, because your target is self.work_internal(), python has to pickle the Worker() instance every time the imap is called. This leads to one issue, self.data being copied over again and again.
The following is the proof. I just added 1 "input" statements, and fixed the last time of time calculation.
import numpy as np
import multiprocessing as mp
import time
def work_standalone(args):
return 2
class Worker:
def __init__(self):
self.data = np.random.random(size=(10000, 10000))
# leave a trace whenever init is called
with open('rnd-%d' % np.random.randint(100), 'a') as f:
f.write('init called\n')
def work_internal(self, args):
return 2
def _run(self, target):
with mp.Pool() as pool:
tasks = [[idx] for idx in range(16)]
result = pool.imap(target, tasks)
input("Wait for analysis")
for res in result:
pass
def run_internal(self):
self._run(self.work_internal)
# self._run(work_standalone)
def run_standalone(self):
self._run(work_standalone)
def work_internal(target):
with mp.Pool() as pool:
tasks = [[idx] for idx in range(16)]
result = pool.imap(target, tasks)
for res in result:
pass
if __name__ == '__main__':
t1 = time.time()
Worker().run_standalone()
t2 = time.time()
print(f'Standalone took {t2 - t1:.3f} seconds')
t3 = time.time()
Worker().run_internal()
t4 = time.time()
print(f'Internal took {t4 - t3:.3f} seconds')
You can run the code, when it shows up "wait for analysis", go and check the memory usage.
Like so
Then on the second time you see the message, press enter. And observe the memory usage increasing and decreasing again.
On the other hand, if you change self._run(self.work_internal) to self._run(work_standalone) you would notice that the speed is very fast, and the memory is not increasing, as well as the time taken is a lot shorter than doing self.work_internal.
Solution
One way to solve your issue is to set self.data as a static class variable. In normal cases, this would prevent instances from having to copy/reinit the variable again. This also prevented the issue from occuring.
class Worker:
data = np.random.random(size=(10000, 10000))
def __init__(self):
pass
...

call method on running process from parent process

I'm trying to write a program that interfaces with hardware via pyserial according to this diagram https://github.com/kiyoshi7/Intrument/blob/master/Idea.gif . my problem is that I don't know how to tell the child process to run a method.
I tried reducing my problem down to the essence of what I am trying to do can call the method request() from the main script. I just dont know how to handle two way communication like this, in examples using queue i just see data shared or i cant understand the examples
import multiprocessing
from time import sleep
class spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
self.Update()
def request(self, x):
print("{} was requested.".format(x))
def Update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
p = multiprocessing.Process(target=spawn, args=(1,1))
p.start()
sleep(5)
p.request(2) #here I'm trying to run the method I want
update thanks to Carcigenicate
import multiprocessing
from time import sleep
from operator import methodcaller
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
spawn.request(2) # Now you can reference the "spawn"
You're going to need to rearrange things a bit. I would not do the long running (infinite) work from the constructor. That's generally poor practice, and is complicating things here. I would instead initialize the object, then run the loop in the separate process:
from operator import methodcaller
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
spawn.request(2) # Now you can reference the "spawn" object to do whatever you like
Unfortunately, since Process requires that it's target argument is pickleable, you can't just use a lambda wrapper like I originally had (whoops). I'm using operator.methodcaller to create a pickleable wrapper. methodcaller("update") returns a function that calls update on whatever is given to it, then we give it spawn to call it on.
You could also create a wrapper function using def:
def wrapper():
spawn.update()
. . .
p = multiprocessing.Process(target=wrapper) # Run the loop in the process
But that only works if it's feasible to have wrapper as a global function. You may need to play around to find out what works best, or use a multiprocessing library that doesn't require pickleable tasks.
Note, please use proper Python naming conventions. Class names start with capitals, and method names are lowercase. I fixed that up in the code I posted.

Python3: Multiprocessing consumes extensively much RAM and slows down

I start multiple processes in order to create a list of new objects. htop shows me in between 1 and 4 processes (I always create 3 new objects).
def foo(self):
with multiprocessing.Pool(processes=3, maxtasksperchild=10) as pool:
result = pool.map_async(self.new_obj, self.information)
self.new_objs = result.get()
pool.terminate()
gc.collect()
I call foo() multiple times, each time it is called, the whole process is running slower, the program does not even finish in the end, as it slows down to much. The program starts to eat up all my RAM, while the sequential approach does not have any significant RAM usage.
When I kill the program, most of the time this was the function the program was last executing.
->File "threading.py", line 293, in wait
waiter.acquire()
Edit
To give some information about my circumstances. I create a tree made of nodes. foo() is called by a parent node in order to create its child nodes. The result returned by the processes are these child nodes. Those are saved in a list at the parent node. I want to parallelize the creation of those child nodes instead of creating them in a sequential way.
I think your issue has mainly to do with the fact that your parallelised function is a method of the object. It's hard to be certain without more information, but consider this little toy program:
import multiprocessing as mp
import numpy as np
import gc
class Object(object):
def __init__(self, _):
self.data = np.empty((100, 100, 100), dtype=np.float64)
class Container(object):
def __new__(cls):
self = object.__new__(cls)
print("Born")
return self
def __init__(self):
self.objects = []
def foo(self):
with mp.Pool(processes=3, maxtasksperchild=10) as pool:
result = pool.map_async(self.new_obj, range(50))
self.objects.extend(result.get())
pool.terminate()
gc.collect()
def new_obj(self, i):
return Object(i)
def __del__(self):
print("Dead")
if __name__ == '__main__':
c = Container()
for j in range(5):
c.foo()
Now Container is called only once, so you'd expect to see a "Born", followed by a "Dead" being printed out; but since the code being executed by the processes is a method of the container, this means the whole container has to be executed elsewhere ! Running this, you will see a stream of intermingled "Born" and "Dead" as your container is being rebuilt on every execution of map:
Born
Born
Born
Born
Born
Dead
Born
Dead
Dead
Born
Dead
Born
...
<MANY MORE LINES HERE>
...
Born
Dead
To convince yourself that the entire container is being copied and sent around every time, try to set some non-serialisable value:
def foo(self):
with mp.Pool(processes=3, maxtasksperchild=10) as pool:
result = pool.map_async(self.new_obj, range(50))
self.fn = lambda x: x**2
self.objects.extend(result.get())
pool.terminate()
gc.collect()
Which will immediately raise an AttributeError as it cannot serialise the container.
Let's sum up: when sending 1000 requests to the pool, Container will be serialised, sent to the processes and deserialised there a 1000 times. Sure, they will eventually be dropped (assuming there's not too much weird cross-referencing going on), but that will definitely put a lot of pressure on the RAM, as the object is serialised, called, updated, reserialised... for every element in your mapped inputs.
How can you solve that ? Well, ideally, do not share state:
def new_obj(_):
return Object(_)
class Container(object):
def __new__(cls):
self = object.__new__(cls)
print("Born")
return self
def __init__(self):
self.objects = []
def foo(self):
with mp.Pool(processes=3, maxtasksperchild=10) as pool:
result = pool.map_async(new_obj, range(50))
self.objects.extend(result.get())
pool.terminate()
gc.collect()
def __del__(self):
print("Dead")
This completes in a fraction of the time, and only produces the tiniest blimp on the RAM (as a single Container is ever built). If you need some of the internal state to be passed there, extract it and send just that:
def new_obj(tup):
very_important_state, parameters = tup
return Object(very_important_state=very_important_state,
parameters=parameters)
class Container(object):
def __new__(cls):
self = object.__new__(cls)
print("Born")
return self
def __init__(self):
self.objects = []
def foo(self):
important_state = len(self.objects)
with mp.Pool(processes=3, maxtasksperchild=10) as pool:
result = pool.map_async(new_obj,
((important_state, i) for i in range(50)))
self.objects.extend(result.get())
pool.terminate()
gc.collect()
def __del__(self):
print("Dead")
This has the same behaviour as before. If you absolutely cannot avoid sharing some mutable state between the processes, checkout out the multiprocessing tools for doing that without having to copy everything everywhere everytime.

How to decorate an asyncio.coroutine to retain its __name__?

I've tried to write a decorator function which wraps an asyncio.coroutine and returns the time it took to get done. The recipe below contains the code which is working as I expected. My only problem with it that somehow I loose the name of the decorated function despite the use of #functools.wraps. How to retain the name of the original coroutine? I checked the source of asyncio.
import asyncio
import functools
import random
import time
MULTIPLIER = 5
def time_resulted(coro):
#functools.wraps(coro)
#asyncio.coroutine
def wrapper(*args, **kargs):
time_before = time.time()
result = yield from coro(*args, **kargs)
if result is not None:
raise TypeError('time resulted coroutine can '
'only return None')
return time_before, time.time()
print('= wrapper.__name__: {!r} ='.format(wrapper.__name__))
return wrapper
#time_resulted
#asyncio.coroutine
def random_sleep():
sleep_time = random.random() * MULTIPLIER
print('{} -> {}'.format(time.time(), sleep_time))
yield from asyncio.sleep(sleep_time)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
tasks = [asyncio.Task(random_sleep()) for i in range(5)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
for task in tasks:
print(task, task.result()[1] - task.result()[0])
print('= random_sleep.__name__: {!r} ='.format(
random_sleep.__name__))
print('= random_sleep().__name__: {!r} ='.format(
random_sleep().__name__))
The result:
= wrapper.__name__: 'random_sleep' =
1397226479.00875 -> 4.261069174838891
1397226479.00875 -> 0.6596335046471768
1397226479.00875 -> 3.83421163259601
1397226479.00875 -> 2.5514027672929713
1397226479.00875 -> 4.497471439365472
Task(<wrapper>)<result=(1397226479.00875, 1397226483.274884)> 4.266134023666382
Task(<wrapper>)<result=(1397226479.00875, 1397226479.6697)> 0.6609499454498291
Task(<wrapper>)<result=(1397226479.00875, 1397226482.844265)> 3.835515022277832
Task(<wrapper>)<result=(1397226479.00875, 1397226481.562422)> 2.5536720752716064
Task(<wrapper>)<result=(1397226479.00875, 1397226483.51523)> 4.506479978561401
= random_sleep.__name__: 'random_sleep' =
= random_sleep().__name__: 'wrapper' =
As you can see random_sleep() returns a generator object with different name. I would like to retain the name of the decorated coroutine. I am not aware if this is problem is specific to asyncio.coroutines or not. I also tried the code with different decorator orders, but all has the same result. If I comment #functools.wraps(coro) then even random_sleep.__name__ becomes wrapper as I expected.
EDIT: I've posted this issue to Python Issue Tracker and received the following answer by R. David Murray: "I think this is a specific case of a more general need to improve 'wraps' that was discussed on python-dev not too long ago."
The issue is that functools.wraps changes only wrapper.__name__ and wrapper().__name__ stays wrapper. __name__ is a readonly generator attribute. You could use exec to set appropriate name:
import asyncio
import functools
import uuid
from textwrap import dedent
def wrap_coroutine(coro, name_prefix='__' + uuid.uuid4().hex):
"""Like functools.wraps but preserves coroutine names."""
# attribute __name__ is not writable for a generator, set it dynamically
namespace = {
# use name_prefix to avoid an accidental name conflict
name_prefix + 'coro': coro,
name_prefix + 'functools': functools,
name_prefix + 'asyncio': asyncio,
}
exec(dedent('''
def {0}decorator({0}wrapper_coro):
#{0}functools.wraps({0}coro)
#{0}asyncio.coroutine
def {wrapper_name}(*{0}args, **{0}kwargs):
{0}result = yield from {0}wrapper_coro(*{0}args, **{0}kwargs)
return {0}result
return {wrapper_name}
''').format(name_prefix, wrapper_name=coro.__name__), namespace)
return namespace[name_prefix + 'decorator']
Usage:
def time_resulted(coro):
#wrap_coroutine(coro)
def wrapper(*args, **kargs):
# ...
return wrapper
It works but there is probably a better way than using exec().
In the time since this question was asked, it became possible to change the name of a coroutine. It is done by setting __qualname__ (not __name__):
async def my_coro(): pass
c = my_coro()
print(repr(c))
# <coroutine object my_coro at 0x7ff8a7d52bc0>
c.__qualname__ = 'flimflam'
print(repr(c))
# <coroutine object flimflam at 0x7ff8a7d52bc0>
import asyncio
print(repr(asyncio.ensure_future(c)))
# <Task pending name='Task-737' coro=<flimflam() running at <ipython-input>:1>>
The usage of __qualname__ in a coroutine object's __repr__ is defined in the CPython source

Resources