Test-Driven Development in Python - python-3.x

How do i write a test, to Test for the default behavior (of a method ) of printing a range that we give it? Below is my attempt. Pasted code from my implementation file and the test case file.
`class FizzBuzzService:
def print_number(self, num):
for i in range(num):
print(i, end=' ')
import unittest
from app.logic import FizzBuzzService
class FizzBuzzServiceTestCases(unittest.TestCase):
def setUp(self):
"""
Create an instance of fizz_buzz_service
"""
self.fizzbuzz = FizzBuzzService()
def test_it_prints_a_number(self):
"""
Test for the default behavior of printing the range that we give
fizz_buzz_service
"""
number_range = range(10)
self.assertEqual(self.fizzbuzz.print_number(10), print(*number_range))

For me at least TDD is about finding a good design as much as it's about testing. As you've seen, testing for things like output is hard.
printing like this is known as a side effect - put simply it's doing something not based solely on the input parameter to the method. My solution would be to make print_number less side effecty, then test it like that. If you need to print it you can write another function higher up that prints, the output of print_number, but contains no meaningful logic other than that, which doesn't really need testing. Here's an example with your code changed to not have a side effect (it's one of several possible alternatives)
class FizzBuzzService:
def print_number(self, num):
for i in range(num):
yield i
import unittest
class FizzBuzzServiceTestCases(unittest.TestCase):
def setUp(self):
"""
Create an instance of fizz_buzz_service
"""
self.fizzbuzz = FizzBuzzService()
def test_it_prints_a_number(self):
"""
Test for the default behavior of printing the range that we give
fizz_buzz_service
"""
number_range = range(10)
output = []
for x in self.fizzbuzz.print_number(10):
output.append(x)
self.assertEqual(range(10), output)

You need to capture standard outputs in your tests to do that -
import sys
import cStringIO
def test_it_prints_a_number(self):
inital_stdout = sys.stdout
sys.stdout = cStringIO()
self.fizzbuzz.print_number(10)
value = sys.stdout.getvalue()
self.assertEqual(value, str(range(10)))
As you can see it's really messy, thus I'd highly recommend against it. Tests written on the based on string contents, especially standard outputs are utterly fragile. Besides the whole point of TDD is to write well-designed isolated code that is easily testable. If your code is difficult to test, than it is a sure shot indication that there's a problem in your design.
How about you divide your code into two parts, one that produce the numbers and need to be tested and other that just print it.
def get_numbers(self, num):
return range(num)
def print_number(self, num):
print(get_numbers)
# Now you can easily test get_numbers method.
Now if you really want to test printing functionality, then the better way would be use mocking.

Related

Python multiprocess: run several instances of a class, keep all child processes in memory

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

Parameterized fixture with pytest-datafiles

I have a Python function that processes different types of files for which I want set up a testing scheme. For each of the different file types it can handle I have a test file. I'd like to use pytest-datafiles so the tests automatically get performed on copies in a tmpdir. I'm trying to setup a parameterized fixture, similar to #pytest.fixture(params=[...]), so that the test function automatically gets invoked for each test file. How do I achieve this?
I tried the code below, but my datafiles are not copied to the tmpdir, and the test collection fails, because the test_files() fixture does not yield any output. I'm quite new to pytest, so possibly I don't fully understand how it works.
#pytest.fixture(params = [1,2])
#pytest.mark.datafiles('file1.txt','file1.txt')
def test_files(request,datafiles):
for testfile in datafiles.listdir():
yield testfile
#pytest.fixture(params = ['expected_output1','expected_output2'])
def expected_output(request):
return request.param
def my_test_function(test_files,expected_output):
assert myFcn(test_files) == expected_output
After reading up on fixtures and marks I conclude that the way I tried to use pytest.mark.datafiles is probably not possible. Instead I used the built-in tmpdir functionality in pytest, as demonstrated below. (Also, the fact that I named my fixture function test_files() may have messed things up since pytest would recognize it as a test function.)
testFileNames = {1:'file1.txt', 2:'file2.txt'}
expectedOutputs = {1:'expected_output1', 2:'expected_output2'}
#pytest.fixture(params = [1,2])
def testfiles(request,tmpdir):
shutil.copy(testFileNames[request.param],tmpdir)
return os.path.join(tmpdir,testFileNames[request.param])
#pytest.fixture(params = [1,2])
def expected_output(request):
return expectedOutputs[request.param]
def my_test_function(testfiles,expected_output):
assert myFcn(testfiles) == expected_output

how to get pytest fixture return value in autouse mode?

I am new to learn pytest. In bellow sample code.
how can i get A() object in test_one function when fixture is in autouse mode?
import pytest
import time
class A:
def __init__(self):
self.abc = 12
#pytest.fixture(autouse=True)
def test_foo():
print('connecting')
yield A()
print('disconnect')
def test_one():
#how can i get A() object?
print([locals()])
assert 1 == 1
You can always add the fixture as parameter despite the autouse:
def test_one(test_foo):
print(test_foo)
assert 1 == 1
If you don't want to use the fixture parameter for some reason, you have to save the object elsewhere to be able to access it from your test :
a = None
#pytest.fixture(autouse=True)
def test_foo():
global a
a = A()
yield
a = None
def test_one():
print(a)
assert 1 == 1
This could be made a little better if using a test class and put a in a class variable to avoid the use of the global var, but the first variant is still the preferred one, as it localizes the definition of the object.
Apart from that, there is no real point in yielding an object you don't have access to. You may consider if autouse is the right option for your use case. Autouse is often used for stateless setup / teardown.
If your use case is to do some setup/teardown regardless (as suggested by the connect/disconnect comments), and give optional access to an object, this is ok, of course.

How to capture the iteration number dyanmically inside test while using pytest-repeat

I'm executing my selenium script multiple times by using pytest-repeat. i need to capture the iteration number during execution and make use of it.
I explored pytest.mark, pytest.collect & pytest.Collector
class Testone():
#pytest.fixture()
def setup(self):
#pytest.mark.repeat(RowCount)
def test_create_eq(self,setup):
Need to capture the iteration number here.
I think there should be an easier and straightforward way than what I describe below. pytest-repeat has a fixture __pytest_repeat_step_number which I hoped could provide the current step number for the test but it did not.
request.node.name provides the name of the test function generated by pytest_repeat and it has the step number which can be extracted for your purpose.
import pytest
class Testone():
#pytest.fixture()
def setup(self):
pass
#pytest.mark.repeat(4)
def test_create_eq(self,setup,request):
current_step = request.node.name.split('[')[1].split('-')[0] #string form; parse to int, if required

OpenMDAO 1.x: recording in parallel

When running an analysis under MPI with distributed components in a ParallelGroup, I get an error when adding a DumpRecorder to the analysis. Below is a small example that demonstrates this (this was run with the latest master branch commit aaa67a4d51f4081e9e41b250b0a76b077f6f0c21 from 28/10/2015):
import numpy as np
from openmdao.core.mpi_wrap import MPI
from openmdao.api import Component, Group, DumpRecorder, Problem, ParallelGroup
class Sliced(Component):
def __init__(self):
super(Sliced, self).__init__()
self.add_param('x', 0.)
self.add_output('y', 0.)
def solve_nonlinear(self, params, unknowns, resids):
unknowns['y'] = params['x'] * 2.
class VectorComp(Component):
def __init__(self, size):
super(VectorComp, self).__init__()
self.add_param('xin', np.zeros(size))
self.add_output('x', np.zeros(size))
def solve_nonlinear(self, params, unknowns, resids):
unknowns['x'] = params['xin'] * 2.
class Analysis(Group):
def __init__(self, size):
super(Analysis, self).__init__()
self.add('v', VectorComp(size), promotes=['*'])
par = self.add('par', ParallelGroup())
for i in range(size):
par.add('sec%02d' % i, Sliced())
self.connect('x', 'par.sec%02d.x' % i, src_indices=[i])
if __name__ == '__main__':
if MPI:
from openmdao.core.petsc_impl import PetscImpl as impl
else:
from openmdao.core.basic_impl import BasicImpl as impl
p = Problem(impl=impl, root=Analysis(4))
recorder = DumpRecorder('optimization.log')
# adding specific includes works, but leaving it out results in a crash
# recorder.options['includes'] = ['x']
p.driver.add_recorder(recorder)
p.setup()
p.run()
The error which is raised is:
RuntimeError: Cannot access remote Variable 'par.sec00.x' in this process.
I see that the recorder dumps a file per processor, so shouldn't the BaseRecorder._filter_vectors method filter out params not present on a specific processor? I'm not yet familiar enough with the code to propose a fix, so I hope the OpenMDAO devs can easily figure out what goes wrong.
Manually specifying the includes works since the Sliced parameters are then excluded, but it would be nice that this was not necessary, and dealt with under the hood.
I also want to let you guys know how excited we are about the new framework. It is so much faster that the 0.x version, and the parallel FD feature is much appreciated and works like a charm!
There were some recent changes that broke the dump recorder in parallel. We put a story up for someone to fix it, but in the meantime, you might want to try the SqliteRecorder recorder. It's what I have been using for performance testing on CADRE. You set it up the same way, but then to read the values using an sqlitedict. There is a small example in the docs, but a more practical example is here in the CADRE code:
https://github.com/OpenMDAO/CADRE/blob/master/plot_progress.py

Resources