Python multiprocess: run several instances of a class, keep all child processes in memory - python-3.x

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.

What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

Related

Show the running executors identifications in ProcessPoolExecutor

This is the problem I am having. I wont share code because of condfidentiality but instead I will provide some dummy example.
Assume that we have a class as follows:
class SayHello:
def __init__(self, name, id):
self.name=name
self.id=id
#public func
def doSomething(self, arg1, arg2 ):
DoAHugeTaskWithArgument
Lets say now that in an other modules we have this:
class CallOperations:
def __init__(self):
self.dummydict={1: {"james":20, "peter":30, "victor":40, "john":45, "ali":21, "tom":41, "hector":37}, 2:{"james":23, "peter":31, "victor":44, "john":46, "ali":23, "tom":44, "hector":35} }
def runProcessors(self):
#runprocess
for _, v in self.dummydict.items():
Instances = [SayHello(g,b) for g ,b in v.items()]
with ProcessPoolExecutor(max_workers=2) as executor:
future = [executor.submit(ins.doSomething, 2, 1235) for ins in Instances]
So the problem starts here. I want to know what instances are running doSomething() funtion in their respective process. I want to set a variable = 1 when the function of that instance is running in the process and set it to zero when it is completed.
Each instance has its own name and id. Is there way to find out the name of the running instance in the process?
This problem is making me very confused and can not find a proper solution.
Thank you alot.
If I understand your question correctly, you want to know when an instance of SayHello is executing and when it is not. You can set a variable (1 or 0) by using a Manager - but the usefulness of this can be debated. You might want to use a lock instead.
I had to tweak your code a bit but this is a running example. It picks one of your tasks as the one to monitor in the while loop. It is a dummy loop that never exits but you'll get the idea. It will keep polling the variable of one of your instances and you can see it change when that task is running, and then revert back to zero.
from time import sleep
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
class SayHello:
def __init__(self, name, id):
self.name=name
self.id=id
self.status = Manager().Value("i",0)
#public func
def doSomething(self, arg1, arg2 ):
self.status.value = 1
sleep(5)
self.status.value = 0
class CallOperations:
def __init__(self):
self.dummydict={1: {"james":20, "peter":30, "victor":40, "john":45, "ali":21, "tom":41, "hector":37}, 2:{"james":23, "peter":31, "victor":44, "john":46, "ali":23, "tom":44, "hector":35} }
def runProcessors(self):
#runprocess
for _, v in self.dummydict.items():
Instances = [SayHello(g,b) for g ,b in v.items()]
f = Instances[3]
executor = ProcessPoolExecutor(max_workers=2)
future = [executor.submit(ins.doSomething, 2, 1235) for ins in Instances]
while True:
print(f.status.value)
# Insert break condition here
sleep(0.5)
executor.shutdown()
foo = CallOperations()
foo.runProcessors()
The problem with this is that it can lead to a race condition depending on what you do in your main program. If you want to do any operations on the instance when it is in passive state, it might progress to active just after you check the variable but before you have completed your actions in the main program.
Locks come to rescue here, as you can also create a shared lock Manager().Lock(). If your DoSomething() tries to acquire the lock and your main process does the same when operating on a passive instance, you avoid this problem. Of course your main program could then block the executor from processing if it reserves locks for lengthy operations, as then your two workers would be stuck waiting on locks if the execution processed to those instances where locks are being held by the main program. This case would not be suitable for parallel processing implemented using executors.
EDIT: if you are only interested in the running status, you can check the Future.running() status of your future objects, in this case items in your future array.

Python3 how append represent simple dependency graph for boot sequence

My problem is to create a list of servers, which have to reboot in a sequence. Like:
if server01 has booted, then server02a and server02b may boot, after server02a comes server03, etc. So I created the class Server and tried to append some servers:
#!/usr/bin/env python3
class Server:
def __init__(self, name, nextsrv=[]):
self.name = name
self.nextsrv = nextsrv
print(self.name)
servers = []
servers.append(Server('server01'))
servers.append(Server('serverXX'))
servers[0].nextsrv.append(Server('server02a'))
How to add more instances? The next one, server02b, is not accepted.
How to add servers in nextsrv of server02[ab]?
How to loop over this lists in lists in instances?
I would maintain two different structures: a tree of servers and their dependencies, and the list of currently booting servers. This would involve expanding the server class a bit to allow for more complex graph structures:
class Server:
def __init__(self, name, nextsrv=None):
self.name = name
self.booted = False
self.nextsrv = set()
if nextsrv is not None:
self.add_srvs(nextsrv)
def __eq__(self, other):
return isinstance(other, Server) and self.name == other.name
def __hash__(self):
return hash(self.name)
def boot(self):
# Do some magic here
self.booted = True
def add_srv(self, srv):
self.nextsrv.add(srv)
srv.depends.add(self)
def add_srvs(self, srvs):
for srv in srvs:
self.add_srv(srv)
def has_depends(self):
for srv in self.depends:
if not srv.booted:
return True
return False
I have turned nextsrv into a set, which means that Server needs a __hash__ method. Servers are compared by name only. I also added a backreference to the dependent server so that it becomes easy to check when a server is bootable, i.e., not in the nextsrv list of another unbooted server.
Now you can set up the server tree as you described. I would just make a dict that allows you to do something like:
servers = {}
servers['server03'] = Server('server03')
servers['server02a'] = Server('server02a', servers['server03'])
servers['server02b'] = Server('server02b')
servers['server01'] = Server('server01', [x for x in servers.values() if x.name in ('server02a', 'server02b')])
You could assign each server to a different variable, but I think it is easier to manage a lot of different servers via a dict. It also allows you to do stuff like computing the boot sequence automatically:
from collections import deque
# Startup: find all servers that no-one depends on
boot_candidates = deque((x for x in servers.items if not x.depends))
# Iteration with for will break if we extend the list during iteration
while boot_candidates:
srv = boot_candidates.popleft()
srv.boot()
boot_candidates.extend(x for x in srv.nextsrv if not x.hasdepends())
This solution does not check for cyclic dependencies and other complexities. However, it does have the advantage of being highly paralellizable, which is probably something you should look into, especially since booting a server should consume very few resources on the local machine (unless you have VMs).

QObject and QThread relations

I have a pyqt4 gui which allows me to import multiple .csv files. I've created a loop that goes through this list of tuples that have the following parameters (filename + location of file, filename, bool,bool, set of dates in file)=tup.
I've created several classes that my gui frequently refers to in order to pull parameters off a projects profile. Let's call this class profile(). I also have another class that has a lot of functions based on formatting, such as datetime, text edits,etc...let's call this classMyFormatting(). Then I created a QThread class that is created for each file in the list, and this one is called Import_File(QThread). And lets say this class takes in a few parameters for the __init__(self,tup).
My ideal goal is to be able to make an independent instance of MyFormatting() and profile() for the Import_File(QThread). I am trying to get my head around on how to utilize the QObject capabilities to solve this..but I keep getting the error that the thread is being destroyed while still running.
for tup in importedlist:
importfile = Import_File(tup)
self.connect(importfile,QtCore.SIGNAL('savedfile(PyQt_PyObject()'),self.printstuffpassed)
importfile.start()
I was thinking of having the two classes be declared as
MyFormatting(QObject):
def __init__(self):
QObject.__init__(self)
def func1(self,stuff):
dostuff
def func2(self):
morestuff
profile(QObject):
def __init__(self):
QObject.__init__(self)
def func11(self,stuff):
dostuff
def func22(self):
morestuff
AND for the QThread:
Import_File(QThread):
def __init__(self,tup):
QThread.__init(self)
common_calc_stuff = self.calc(tup[4])
f = open(tup[0] + '.csv', 'w')
self.tup = tup
# this is where I thought of pulling an instance just for this thread
self.MF = MyFormatting()
self.MF_thread = QtCore.QThread()
self.MF.moveToThread(self.MF_thread)
self.MF_thread.start()
self.prof = profile()
self.prof_thread = QtCore.QThread()
self.prof.moveToThread(self.prof_thread)
self.prof_thread.start()
def func33(self,stuff):
dostuff
self.prof.func11(tup[4])
def func44(self):
morestuff
def run(self):
if self.tup[3] == True:
self.func33
self.MF.func2
elif self.tup[3] ==False:
self.func44
if self.tup[2] == True:
self.prof.func22
self.emit(QtCore.SIGNAL('savedfile()',)
Am I totally thinking of it the wrong way? How can I keep to somewhat of the same structure that I have for the coding and still be able to implement the multithreading and not have the same resource tapped at the same time, which I think is the reason why my qui keeps crashing? Or how can I make sure that each instance of those objects get turned off that they don't interfere with the other instances?

OpenMDAO 1.x: recording in parallel

When running an analysis under MPI with distributed components in a ParallelGroup, I get an error when adding a DumpRecorder to the analysis. Below is a small example that demonstrates this (this was run with the latest master branch commit aaa67a4d51f4081e9e41b250b0a76b077f6f0c21 from 28/10/2015):
import numpy as np
from openmdao.core.mpi_wrap import MPI
from openmdao.api import Component, Group, DumpRecorder, Problem, ParallelGroup
class Sliced(Component):
def __init__(self):
super(Sliced, self).__init__()
self.add_param('x', 0.)
self.add_output('y', 0.)
def solve_nonlinear(self, params, unknowns, resids):
unknowns['y'] = params['x'] * 2.
class VectorComp(Component):
def __init__(self, size):
super(VectorComp, self).__init__()
self.add_param('xin', np.zeros(size))
self.add_output('x', np.zeros(size))
def solve_nonlinear(self, params, unknowns, resids):
unknowns['x'] = params['xin'] * 2.
class Analysis(Group):
def __init__(self, size):
super(Analysis, self).__init__()
self.add('v', VectorComp(size), promotes=['*'])
par = self.add('par', ParallelGroup())
for i in range(size):
par.add('sec%02d' % i, Sliced())
self.connect('x', 'par.sec%02d.x' % i, src_indices=[i])
if __name__ == '__main__':
if MPI:
from openmdao.core.petsc_impl import PetscImpl as impl
else:
from openmdao.core.basic_impl import BasicImpl as impl
p = Problem(impl=impl, root=Analysis(4))
recorder = DumpRecorder('optimization.log')
# adding specific includes works, but leaving it out results in a crash
# recorder.options['includes'] = ['x']
p.driver.add_recorder(recorder)
p.setup()
p.run()
The error which is raised is:
RuntimeError: Cannot access remote Variable 'par.sec00.x' in this process.
I see that the recorder dumps a file per processor, so shouldn't the BaseRecorder._filter_vectors method filter out params not present on a specific processor? I'm not yet familiar enough with the code to propose a fix, so I hope the OpenMDAO devs can easily figure out what goes wrong.
Manually specifying the includes works since the Sliced parameters are then excluded, but it would be nice that this was not necessary, and dealt with under the hood.
I also want to let you guys know how excited we are about the new framework. It is so much faster that the 0.x version, and the parallel FD feature is much appreciated and works like a charm!
There were some recent changes that broke the dump recorder in parallel. We put a story up for someone to fix it, but in the meantime, you might want to try the SqliteRecorder recorder. It's what I have been using for performance testing on CADRE. You set it up the same way, but then to read the values using an sqlitedict. There is a small example in the docs, but a more practical example is here in the CADRE code:
https://github.com/OpenMDAO/CADRE/blob/master/plot_progress.py

does setattr() and getattr() slow down the speed dramatically?

Today when I checked some code at the office, I found the following code. It shocked me.
class XXXX():
def __init__(self, k, v):
for i in range(len(k)):
setattr(self, k[i], v[i])
Then I found that most of the classes are written in the same way. That means all the classes are the same class,the only different is their name.
In this project setattr() is used to set attributes and getattr() is used to get attributes In profile log setattr was called 2700 times and getattr was called 3800 times. The time consume was 0.003sec and 0.005sec respectively (whole process: 0.069sec).
Though I do think setattr and getattr drag down the speed, I'm not sure if a rewrite of all the code would make it better.
Does obj.attribute = value run faster than setattr(obj,'attribute',value)?
Yes, getattr and setattr are much slower, at least at the CPU level.
Since __init__ is only called once per object I wouldn't worry about that unless you are creating many, many objects.
If the objects' attributes are accessed many times it could be worth it to rewrite those sections. You should do some careful profiling first, though.
I did a small test. It was roughly 2x slower.
Increment using member took 2.8221039772
Increment using attr took 5.94811701775
Here is my code
import timeit
class Dummy(object):
__slots__ = ['testVal']
def __init__(self):
self.testVal = 0
def increment_using_attr(self):
i = getattr(self, 'testVal')
setattr(self, 'testVal', i+1)
def increment(self):
self.testVal += 1
if __name__ == '__main__':
d = Dummy()
print "Increment using member took {0}".format(timeit.Timer(d.increment).timeit(10000000))
print "Increment using attr took {0}".format(timeit.Timer(d.increment_using_attr).timeit(10000000))
Run on machine Intel(R) Core(TM)2 Quad CPU Q9550 # 2.83GHz

Resources