How to create an async multiprocessing JobQueue in Python? - python-3.x

I'm trying to make a Python 'JobQueue' that performs computationally intensive tasks asynchronously, on a local machine, with a mechanism that returns the results of each task to the main process. Python's multiprocessing.Pool has an apply_async() function that meets those requirements by accepting an arbitrary function, its multiple arguments, and callback functions that return the results. For example...
import multiprocessing
pool = multiprocessing.Pool(poolsize)
pool.apply_async(func, args=args,
callback=mycallback,
error_callback=myerror_callback)
The only problem is that the function given to apply_async() must be serializable with Pickle and the functions I need to run concurrently are not. FYI, the reason is, the target function is a member of an object that contains an IDL object, for example:
from idlpy import IDL
self.idl_obj = IDL.obj_new('ImageProcessingEngine')
This is the error message received at the pool.apply_async() line:
'Can't pickle local object 'IDL.__init__.<locals>.run''
What I tried
I made a simple implementation of a JobQueue that works perfectly fine in Python 3.6+ provided the Job object and it's run() method are Pickleable. I like how the main process can receive an arbitrarily complex amount of data returned from the asynchronously executed function via a callback function.
I tried to use pathos.pools.ProcessPool since it uses dill instead of pickle. However, it doesn't have a method similar to apply_async().
Are there any other options, or 3rd party libraries that provide this functionality using dill, or by some other means?

How about creating a stub function that would instantiate the IDL endopoint as a function static variable?
Please note that this is only a sketch of the code as it is hard to say from the question if you are passing IDL objects as parameters to the function you run in parallel or it serves another purpose.
def stub_fun(paramset):
if 'idl_obj' not in dir(stub_fun): # instantiate once
stub_fun.idl_obj = IDL.obj_new('ImageProcessingEngine')
return stub_fun.idl_obj(paramset)

Related

How to test writing to the queue without actually writing to the queue

I am sure this is probably a simple question but I am new to programming and I am having a tough time understanding mocks. I am wondering if its possible in python to test a function that writes to the queue without actually writing to the queue.
def write_to_queue():
#call queue client
#do some status checking
return status
def post() :
#do something
return write_to_queue()
Is there a way to test write_to_queue without actually writing to the queue?
Mocking usually works like this:
Make sure the object to mock is injected into the tested code and not just created there.
Inject the mock instead of the real thing in a test scenario.
Assert on the expected behavior by verifying that the expected method calls where executed.

How is it possible to execute python code during deserialization?

I was reading about pickling in the context of persisting instances, and ran across this snippet:
Pickle files can be hacked. If you receive a raw pickle file over the network, don't trust it! It could have malicious code in it, that would run arbitrary python when you try to de-pickle it. [1]
My understanding is that pickling turns a data-structure into an array of bytes, and the pickle library also contains methods to take a pickled byte array and rebuild a python instance from it.
I tested some code to see if simply putting code into the class or init method would run it:
import pickle
class A:
print('class')
def __init__(self):
print('instance')
a = A()
print('pickling...')
with open('/home/usrname/Desktop/pfile', 'wb') as pfile:
pickle.dump(a, pfile, pickle.HIGHEST_PROTOCOL)
print('de-pickling...')
with open('/home/usrname/Desktop/pfile', 'rb') as pfile:
a2 = pickle.load(pfile)
However this only yields
class
instance
pickling...
de-pickling...
suggesting that the __ init__ method doesn't actually get run when the instance is unpickled. So I'm still confused how you would make code run during that process.
Really thorough writeup here: https://intoli.com/blog/dangerous-pickles/
From what I understand, it has to do with how pickles are interpreted by the Pickle Machine (PM) and run. You can craft a pickle file that will cause it to evaluate using eval() the statements provided.

Unit testing a private callback factory

I am trying to create unit tests for a private callback factory with an alarm manager class. I don't want to test the private method directly as I think this contravenes TDD good practice but I cannot figure out a good way to test this.
Please don't just tell me my code is bad (I already know!). I just want to know if it is possible to use the unittest framework to test this or if I need to re-architect my code.
My class is an alarm manager that is responsible for managing the creation, scheduling and sound of an alarm. When an alarm is created, it is passed to a scheduler with a callback function. To create this callback function, there is a private callback factory that takes in the alarm object to generate the custom callback (while capturing some variables from the manager class)
The problem I have is that unless I use the full functionality of the scheduler, the callback will never get executed but obviously there will be a lot of pain with patching time to make sure it executes in a reasonable time. Furthermore, the callback is never returned to the user so no easy way of checking (or even calling it there)
The manager class looks largely like the code below:
class Manager:
def __init__(self):
self._scheduler = alarm.scheduler.Scheduler()
self._player = sound.player.Player()
def create_alarm(self, name, new_alarm):
self._scheduler.add_job(name, new_alarm.find_next_alarm(),\
self._create_callback(name, new_alarm))
def _create_callback(self, name, callback_alarm):
def callback():
self._player.play(callback_alarm.get_playback())
return callback
Overall, is there a way to somehow extract the callback object if I make the scheduler a mock object. Or is there some other clever way to test that the callback is doing what it should be?

Py3.6 :: ThreadPoolExecutor future.add_done_callback vs. concurrent.futures.as_completed

I’m learning concurrent.futures.ThreadPoolExecutor in Py3.6 and a bit confused as to what’s the difference, pros-and-cons between using
1 future.add_done_callback(callback)
2 concurrent.futures.as_completed(futures)
When would you choose one over the other? If I understand correctly the purpose is same for both more or less.. #1 calls the callback(future) fn as soon as the task has finished and corresponding future has settled, and #2 returns the futures object in the order which the tasks finish and futures settle..
In both cases we we can retrieve the returned value using future.results() (or raise future.exception() if exception was raised).
Thanks for any clarification around that.
The definitions of the functions are at https://github.com/python/cpython/blob/f85af035c5cb9a981f5e3164425f27cf73231b5f/Lib/concurrent/futures/_base.py#L200
def as_completed(fs, timeout=None):
"""An iterator over the given futures that yields each as it completes.
add_done_callback is a method within futures class and a lower level function than as_completed. Essentially, as_completed uses add_done_callback internally. as_completed also has timeout argument for callback.
In general, we may use as_completed if working with multiple futures, while add_done_callback is used with single future.
Overall, both add_done_callback and as_completed may achieve the similar objectives for simpler programs.
Just a thought. We can use different callback function for each future in the list of futures with add_done_callback, while as_completed may work only accept a single single callback.

Groovy Thread Safety

I am using groovy scripting execution in multi threaded mode.
Script themselves are thread safe.
Its like below:
//Startup Code. Single threaded
Class<?> scriptClass = getScriptClass(fileName); //utility to get script class from file name
Method method = getMethods(scriptClass); //Utility to get a specific Method
storeMethod(method); //Store method globally.
Object scriptInstance = scriptClass.newInstance();
storeScriptInstance(scriptInstance); //Store script Instance
Multiple threads execute following code: (without any synchronization.)
ScriptInstance scriptInstance = getScriptInstance(); //Utility to get scriptInstance stored in init
Method method = getMethod(); //Utility for getting method stored in init step
Object obj[] = new Object[] { context }; //context variable available per thread.
method.invoke(scriptInstance,obj);
script consists of just one function which is totally thread safe (function is reentrant and modifies context variable.)
This works in my unit testing with multiple threads but couldn't find any material to support this claim.
Question => Is it safe under multiple thread execution? More generically, sharing of same script instances across threads to execute scripts functions which themselves are thread safe is safe? Script instances shouldn't have global variables in execution.
Context is an argument to script and not global variable.
Please help.
Without the actual function I cannot tell if this is supposed to be threadsafe or not. Since you say that the function modifies the context variable I conclude that you mutate global state. In that case it is not threadsafe without synchronization of some kind. If my assumption is wrong and no global state is mutated, then executing a method by reflection is surely not the problem

Resources