I have three test modules: testmodule_one, testmodule_two, and testmodule_three.
testmodule_two and testmodule_three depend on testmodule_one.
I have a global fixture defined in conftest.py:
#pytest.fixture(scope="module")
def integration_test_cleanup():
yield
# Do my tear down routines
I was hoping to centralize all my tear down routines in this one fixture.
But what happens if testmodule_one, testmodule_two, and testmodule_three all use integration_test_cleanup that has a module scope?
Will pytest see that testmodule_two and testmodule_three depend on testmodule_one and not yield until they are finished? Or will it yield immediately after testmodule_one finishes?
I ask because I'm wondering if I could avoid breaking out integration_test_cleanup into pieces that are scoped at the session and module level.
Does pytest follow the dependency chain and use it to determine when to yield?
Related
I have a multiprocessing Lock, that I define as
import multiprocessing
lock1 = mp.Lock()
To share this lock among the different child processes I do:
def setup_process(lock1):
global lock_1
lock_1 = lock1
pool = mp.Pool(os.cpu_count() - 1,
initializer=setup_process,
initargs=[lock1])
Now I've noticed that if the processes call the following function, and the function is defined in the same python module (i.e., same file):
def test_func():
print("lock_1:", lock_1)
with lock_1:
print(str(mp.current_process()) + " has the lock in test function.")
I get an output like:
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-1' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-2' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-3' parent=82414 started daemon> has the lock in test function.
However, if test_function is defined in a different file, the Lock is not recognized, and I get:
NameError:
name 'lock_1' is not defined
This seems to happen for every function, where the important distinction is whether the function is defined in this module or in another one. I'm sure I'm missing something very obvious with the global variables, but I'm new to this and I haven't been able to figure it out. How can I make the Locks be recognized everywhere?
Well, I learned something new about python today: global isn't actually truly global. It only is accessible at the module scope.
There are a multitude of ways of sharing your lock with the module in order to allow it to be used, and the docs even suggest a "canonical" way of sharing globals between modules (though I don't feel it's the most appropriate for this situation). To me this situation illustrates one of the short fallings of using globals in the first place, though I have to admit in the specific case of multiprocessing.Pool initializers it seems to be the accepted or even intended use case to use globals to pass data to worker functions. It actually makes sense that globals can't cross module boundaries because that would make the separate module 100% dependent on being executed by a specific script, so it can't really be considered a separate independent library. Instead it could just be included in the same file. I recognize that may be at odds with splitting things up not to create re-usable libraries but simply just to organize code logically in shorter to read segments, but that's apparently a stylistic choice by the designers of python.
To solve your problem, at the end of the day, you are going to have to pass the lock to the other module as an argument, so you might as well make test_func recieve lock_1 as an argument. You may have found however that this will cause a RuntimeError: Lock objects should only be shared between processes through inheritance message, so what to do? Basically, I would keep your initializer, and put test_func in another function which is in the __main__ scope (and therefore has access to your global lock_1) which grabs the lock, and then passes it to the function. Unfortunately we can't use a nicer looking decorator or a wrapper function, because those return a function which only exists in a local scope, and can't be imported when using "spawn" as the start method.
from multiprocessing import Pool, Lock, current_process
def init(l):
global lock_1
lock_1 = l
def local_test_func(shared_lock):
with shared_lock:
print(f"{current_process()} has the lock in local_test_func")
def local_wrapper():
global lock_1
local_test_func(lock_1)
from mymodule import module_test_func #same as local_test_func basically...
def module_wrapper():
global lock_1
module_test_func(lock_1)
if __name__ == "__main__":
l = Lock()
with Pool(initializer=init, initargs=(l,)) as p:
p.apply(local_wrapper)
p.apply(module_wrapper)
Il posed question, I did not understand the ture cause of the issue (it seems to have been related to my usage of flask in one of the subprocesses).
PLEASE IGNORE THIS (can't delete due to bounty)
Essentially, I have to start some Processes and or a pool when running a python library as a module.
However, since __name__ == '__main__' is always true in __main__.py this proves to be an issue (see multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html)
I've attempted multiple solutions ranging from: pytgquabr.com:8182/58288945/using-multiprocessing-with-runpy to a file-based mutext to only allow the contents of main to run once but multiprocessing still behaves strangely (e.g. Processes die almost as soon as they start with no error logs).
Any idea of what the "proper" way of going about this is ?
Guarding the __main__ module is only needed if an object defined inside __main__ is used in another process. Looking up this definition is what causes the execution of __main__ in the subprocess.
When using a __main__.py, restrict all definitions used with multiprocessing to other modules. __main__.py should only import and use these.
# my_package/some_module.py
def module_print(*args, **kwargs):
"""Function defined in some module - fine for use inside multiprocess"""
print(*args, **kwargs)
# my_package/__main__.py
import multiprocessing # imports are allowed
from .some_module import module_print
def do_multiprocess():
"""Function defined in __main__ module - fine for use wrapping multiprocess"""
with multiprocessing.Pool(processes=12) as pool:
pool.map(module_print, range(20)) # multiprocessing external function is allowed
do_multiprocess() # directly calling __main__ function is allowed
I am trying to create unit tests for a private callback factory with an alarm manager class. I don't want to test the private method directly as I think this contravenes TDD good practice but I cannot figure out a good way to test this.
Please don't just tell me my code is bad (I already know!). I just want to know if it is possible to use the unittest framework to test this or if I need to re-architect my code.
My class is an alarm manager that is responsible for managing the creation, scheduling and sound of an alarm. When an alarm is created, it is passed to a scheduler with a callback function. To create this callback function, there is a private callback factory that takes in the alarm object to generate the custom callback (while capturing some variables from the manager class)
The problem I have is that unless I use the full functionality of the scheduler, the callback will never get executed but obviously there will be a lot of pain with patching time to make sure it executes in a reasonable time. Furthermore, the callback is never returned to the user so no easy way of checking (or even calling it there)
The manager class looks largely like the code below:
class Manager:
def __init__(self):
self._scheduler = alarm.scheduler.Scheduler()
self._player = sound.player.Player()
def create_alarm(self, name, new_alarm):
self._scheduler.add_job(name, new_alarm.find_next_alarm(),\
self._create_callback(name, new_alarm))
def _create_callback(self, name, callback_alarm):
def callback():
self._player.play(callback_alarm.get_playback())
return callback
Overall, is there a way to somehow extract the callback object if I make the scheduler a mock object. Or is there some other clever way to test that the callback is doing what it should be?
I'm trying to make a Python 'JobQueue' that performs computationally intensive tasks asynchronously, on a local machine, with a mechanism that returns the results of each task to the main process. Python's multiprocessing.Pool has an apply_async() function that meets those requirements by accepting an arbitrary function, its multiple arguments, and callback functions that return the results. For example...
import multiprocessing
pool = multiprocessing.Pool(poolsize)
pool.apply_async(func, args=args,
callback=mycallback,
error_callback=myerror_callback)
The only problem is that the function given to apply_async() must be serializable with Pickle and the functions I need to run concurrently are not. FYI, the reason is, the target function is a member of an object that contains an IDL object, for example:
from idlpy import IDL
self.idl_obj = IDL.obj_new('ImageProcessingEngine')
This is the error message received at the pool.apply_async() line:
'Can't pickle local object 'IDL.__init__.<locals>.run''
What I tried
I made a simple implementation of a JobQueue that works perfectly fine in Python 3.6+ provided the Job object and it's run() method are Pickleable. I like how the main process can receive an arbitrarily complex amount of data returned from the asynchronously executed function via a callback function.
I tried to use pathos.pools.ProcessPool since it uses dill instead of pickle. However, it doesn't have a method similar to apply_async().
Are there any other options, or 3rd party libraries that provide this functionality using dill, or by some other means?
How about creating a stub function that would instantiate the IDL endopoint as a function static variable?
Please note that this is only a sketch of the code as it is hard to say from the question if you are passing IDL objects as parameters to the function you run in parallel or it serves another purpose.
def stub_fun(paramset):
if 'idl_obj' not in dir(stub_fun): # instantiate once
stub_fun.idl_obj = IDL.obj_new('ImageProcessingEngine')
return stub_fun.idl_obj(paramset)
I’m learning concurrent.futures.ThreadPoolExecutor in Py3.6 and a bit confused as to what’s the difference, pros-and-cons between using
1 future.add_done_callback(callback)
2 concurrent.futures.as_completed(futures)
When would you choose one over the other? If I understand correctly the purpose is same for both more or less.. #1 calls the callback(future) fn as soon as the task has finished and corresponding future has settled, and #2 returns the futures object in the order which the tasks finish and futures settle..
In both cases we we can retrieve the returned value using future.results() (or raise future.exception() if exception was raised).
Thanks for any clarification around that.
The definitions of the functions are at https://github.com/python/cpython/blob/f85af035c5cb9a981f5e3164425f27cf73231b5f/Lib/concurrent/futures/_base.py#L200
def as_completed(fs, timeout=None):
"""An iterator over the given futures that yields each as it completes.
add_done_callback is a method within futures class and a lower level function than as_completed. Essentially, as_completed uses add_done_callback internally. as_completed also has timeout argument for callback.
In general, we may use as_completed if working with multiple futures, while add_done_callback is used with single future.
Overall, both add_done_callback and as_completed may achieve the similar objectives for simpler programs.
Just a thought. We can use different callback function for each future in the list of futures with add_done_callback, while as_completed may work only accept a single single callback.