Multiprocessing on a Dictionary of Class Instances - python-3.x

In the generic example below I use Foobar_Collection to manage a dictionary of Foo instances. Additionaly, Foobar_Collection carries a method which will sequentially call myMethod()shared by all insances of Foo. It works fine so far. However, I wonder wether I could take advantage
of multiprocessing, so that run_myMethodForAllfoobars() could divide the work for several chunks of instances? The instance methods are "independent" of each other ( I think this case is called embarrassingly parallel). Any help would be great!
class Foobar_Collection(dict):
def __init__(self, *arg, **kw):
super(Foobar_Collection, self).__init__(*arg,**kw)
def foobar(self,*arg,**kw):
foo = Foo(*arg,**kw)
self[foo.name] = foo
return foo
def run_myMethodForAllfoobars(self):
for name in self:
self[name].myMethod(10)
return None
class Foo(object):
def __init__(self,name):
self.name = name
self.result = 0
# just some toy example method
def myMethod(self,x):
self.result += x
return None
Foobar = Foobar_Collection()
Foobar.foobar('A')
Foobar.foobar('B')
Foobar.foobar('C')
Foobar.run_myMethodForAllfoobars()

You can use multiprocessing for this situation, but it's not great because the method that you're trying to parallelize is useful for its side effects rather than its return value. This means you'll need to serialize the Foo object in both directions (sending it to the child process, then sending the modified version back). If your real objects are more complex than the Foo objects in your example, the overhead of copying all of each the object's data may make this slower than just doing everything in one process.
def worker(foo):
foo.myMethod(10)
return foo
class Foobar_Collection(dict):
#...
def run_myMethodForAllfoobars(self):
with multiprocessing.Pool() as pool:
results = pool.map(worker, self.values())
self.update((foo.name, foo) for foo in results)
A better design might let you only serialize the information you need to do the calculation. In your example, the only thing you need from the Foo object is its result (which you'll add 10 to), which you could extract and process without passing around the rest of the object:
def worker(num):
return num + 10
class Foobar_Collection(dict):
#...
def run_myMethodForAllfoobars(self):
with multiprocessing.Pool() as pool:
results = pool.map(worker, (foo.result for foo in self.values()))
for foo, new_result in zip(self.values(), results):
foo.result = new_result
Now obviously this doesn't actually run myMethod on the foo objects any more (though it's equivalent to doing so). If you can't decouple the method from the object like this, it may be hard to get good performance.

Related

Creation of Python unit test

function_one.py
class FunctionOne(Base):
def __init__(self, amount, tax):
super().__init__(amount, tax)
function_two.py
Class FunctionTwo:
def __init__(self, a, b, c):
self.__a = a
self.__b = b
self.__c = c
def _get_info(self):
x = FunctionOne(0, 1)
return x
test_function_two.py
class TestPostProcessingStrategyFactory(unittest.TestCase):
def test__get_info(self):
a = “a”
b = “b”
c = “c”
amount = 0
tax = 1
function_two = FunctionTwo(a, b, c)
assert function_two.__get_info() == FunctionOne(0,1)
I am trying to create unit test for the function_two.py source code. I get the assertion error that the object at ******** != object at *********.
So the two objects address is different. How can make this test pass by correcting the assert statement
assert function_two.__get_info() == FunctionOne(0,1)
You need to understand that equality comparisons depend on the __eq__ method of a class. From the code you provided it appears that simply initializing two objects of FunctionOne with the same arguments does not result in two objects that compare as equal. Whatever implementation of __eq__ underlies that class, only you know that.
However, I would argue the approach is faulty to begin with because unit tests, as the name implies, are supposed to isolate your units (i.e. functions typically) as much as possible, which is not what you are doing here.
When you are testing a function f that calls another of your functions g, strictly speaking, the correct approach is mocking g during the test. You need to ensure that you are testing f and only f. This extends to instances of other classes that you wrote, since their methods are also just functions that you wrote.
Have a look at the following example code.py:
class Foo:
def __init__(self, x, y):
...
class Bar:
def __init__(self, a, b):
self.__a = a
self.__b = b
def get_foo(self):
foo = Foo(self.__a, self.__b)
return foo
Say we want to test Bar.get_foo. That method uses our Foo class inside it, instantiating it and returning that instance. We want to ensure that this is what the method does. We don't want to concern ourselves with anything that relates to the implementation of Foo because that is for another test case.
What we need to do is mock that class entirely. Then we substitute some unique object to be returned by calling our mocked Foo and check that we get that object from calling get_foo.
In addition, we want to check that get_foo called the (mocked) Foo constructor with the arguments we expected, i.e. with its __a and __b attributes.
Here is an example test.py:
from unittest import TestCase
from unittest.mock import MagicMock, patch
from . import code
class BarTestCase(TestCase):
#patch.object(code, "Foo")
def test_get_foo(self, mock_foo_cls: MagicMock) -> None:
# Create some random but unique object that should be returned,
# when the mocked class is called;
# this object should be the output of `get_bar`:
mock_foo_cls.return_value = expected_output = object()
# We remember the arguments to initialize `bar` for later:
a, b = "spam", "eggs"
bar = code.Bar(a=a, b=b)
# Run the method under testing:
output = bar.get_foo()
# Check that we get that EXACT object returned:
self.assertIs(expected_output, output)
# Ensure that our mocked class was instantiated as expected:
mock_foo_cls.assert_called_once_with(a, b)
That way we ensure proper isolation from our Foo class during the Bar.get_foo test.
Side note: If we wanted to be super pedantic, we should even isolate our test method from the initialization of Bar, but in this simple example that would be overkill. If your __init__ method does many things aside from just setting some instance attributes, you should definitely mock that during your test as well.
Hope this helps.
References:
The Mock class
The patch decorator
TestCase.assertIs
Mock.assert_called_once_with

Call constructor of type parameter in generic class

I'm writing a generic class over AnyStr, so allowing bytes or str.
class MyObject(Generic[AnyStr]):
...
Inside (multiple) methods of this class, I would like to construct the empty bytes or empty string object, b'' or '', depending on the type parameter. How can I do this?
You should have a base class with the shared methods applying to both str and bytes that take advantage of common behavior (for example, both str and bytes having length, or both str and bytes being indexable), and two subclasses providing implementations for the specific behaviors. To force the subclasses to provide those specific behaviors (such that mypy can assume a call to their specific methods would succeed in the base class), you make an equivalent #abstractmethod in the base class.
Here's how it all looks like:
from abc import abstractmethod, ABC
from typing import AnyStr, Generic, final
class MyObject(ABC, Generic[AnyStr]):
#classmethod
#abstractmethod
def empty(cls) -> AnyStr:
pass
def __init__(self, data: AnyStr):
self.data: AnyStr = data
# Example shared method.
def is_empty(self) -> bool:
# Assume that for the sake of the example we can't do `len(self.data) == 0`, and that we need
# to check against `empty()` instead.
return self.data == self.__class__.empty()
class MyStr(MyObject[str]):
#classmethod
#final
def empty(cls) -> str:
return ""
class MyBytes(MyObject[bytes]):
#classmethod
#final
def empty(cls) -> bytes:
return b""
We make empty() a class method instead of an instance method because it doesn't depend on an instance with particular data to know what an empty str / bytes looks like.
Additionally, we make empty() a final method so subclasses of either MyStr or MyBytes` that want to further provide specific behavior don't get to change what is considered "empty" (as there is only one thing that can be considered empty).
All of above will typecheck under mypy --strict.
On the caller side, they would never instantiate MyObject[str] or MyObject[bytes] (in fact, mypy will prevent that, as we would want, because MyObject doesn't have an implementation for empty()). Instead, because you said in comments that caller will know ahead of time whether they want bytes or str, they instantiate MyStr or MyBytes directly.

Python enum.Enum: Create variables which i can assign enum.Enum members

Creating enumerations in Python 3.4+ is pretty easy:
from enum import Enum
class MyEnum(Enum):
A = 10
B = 20
This gets me a typedef MyEnum.
With this i can assign a variable:
x = MyEnum.A
So far so good.
However things start to get complicate if i like to use enum.Enum's as arguments to functions or class methods and want to assure that class attributes only hold enum.Enum members but not other values.
How can i do this? My idea is sth like this, which i consider more as a workaround than a solution:
class EnContainer:
def __init__(self, val: type(MyEnum.A) = MyEnum.A):
assert isinstance(val, type(MyEnum.A))
self._value = val
Do you have any suggestions or do you see any problems with my approach? I have to consider about 10 different enumerations and would like to come to a consistent approach for initialization, setters and getters.
Instead of type(MyEnum.A), just use MyEnum:
def __init__(self, val: MyEnum = MyEnum.A):
assert isinstance(val, MyEnum)
Never use assert for error checking, they are for program validation -- in other words, who is calling EnContainer? If only your own code is calling it with already validated data, then assert is fine; but if code outside your control is calling it, then you should be using proper error checking:
def __init__(self, val: MyEnum = MyEnum.A):
if not isinstance(val, MyEnum):
raise ValueError(
"EnContainer called with %s.%r (should be a 'MyEnum')"
% (type(val), val)
)

How to multithread with getattr, one thread per property?

Suppose I have the following object with multiple expensive properties, as so:
class Object:
def __init__(self, num):
self.num = num
#property
def expensive_property(self):
return expensive_calculation
#property
def expensive_property1(self):
return expensive_calculation
#property
def expensive_property2(self):
return expensive_calculation
Note: The number of expensive properties may increase over time.
Given a list of Objects how could I compute each expensive property per thread, for all objects in the list. I am having a hard time figuring out how I should arrange my pool.
This is kinda what I am trying to achieve:
from multithreading.dummy import Pool
from multithreading.dummy import Queue
object_list = [Object(i) for i in range(20)]
properties = [expensive_property2, expensive_propert5, expensive_property9, expensive_property3]
def get(obj, expensive_property):
return [getattr(expensive_property, o) for o in obj]
tasks = Queue()
for p in properties :
tasks.put((get, o, p))
results = []
with Pool(len(properties )) as pool:
while True:
task = tasks.get()
if task is None:
break
func, *args = task
result = pool.apply_async(func, args)
results.append(result)
This is a little crazy because apply_async has an internal queue to distribute tasks over the pool. I can imagine reasons to have another queue around observability or backpressure. Is your example your full program? or are you enqueuing work from a different process/thread?
If your computation is CPU bound one option could be to remove the queue to make things a little simpler:
def wait_all(async_results, timeout_seconds_per_task=1):
for r in async_results:
r.get(timeout_seconds)
wait_all(
[pool.apply_async(get, (o, p)) for p in properties],
timeout_seconds_per_task=1,
)
Like your example above this allows you to distribute computation across your available cpus (pool even defaults to the number of cpus on your machine). If your work is IO bound (suggested by your sleep) processes may have diminishing returns.
You'd have to benchmark but for IO bound you could create a thread pool using the same pattern https://stackoverflow.com/a/3034000/594589
Other options could be to use nonblocking IO with event loop such as gevent, or asyncio. Both would allow you to model the same pool based pattern!

negative effects of infinite recursion in python3?

What negative effects should I be concerned about with a indefinitely long running script that is infinitely recursive?
My script looks something like this:
class foo():
def __init__(self):
some_instance_vars = 42
self.a()
def a(self):
manipulate(self.some_instance_vars)
do_io()
self.b()
def b(self):
manipulate_again(self.some_instance_vars)
do_io()
self.c()
def c(self):
manipulate_more(self.some_instance_vars)
do_io()
self.a()
foo = Foo()
Some things I can imagine:
This would break the garbage collection because the previous calls are always in scope.
Python puts something in memory for each loop and it infinitely builds up and eventually creates a memory problem.
Are these things problems? Are there other problems I haven't thought of?

Resources