Recursivity and multithread collisions in Python 3 - python-3.x

Good afternoon
I am trying to modify my recursive function to a multithreaded recursive function to reduce process time.
I have implemented a RLock object on where it was necessary.
My recursive function must prints a nested array with 98 lines but using multithreading sometimes there are collisions and I am not retrieving all nodes.
(With it is works with the synchronous method)
I have tried to reproduce the behavior of my code:
class Foo(object):
def __init__(self, name):
self.name = name
self.nodes = []
self.lock = threading.RLock()
class Git(object):
def __init__(self):
self.lock = threading.RLock()
git = Git()
def getChildrenFiles(node):
with node.lock:
with git.lock:
# sending some commandes to git repository and get some files
pass
myRecursiveFunction(child, False)
def myRecursiveFunction(foo, async=True):
with foo.lock:
# (Simulation) Retrieving Foo children name from local files
foo.nodes.append(Foo(....))
if not foo.nodes:
return
pool = ThreadPool(os.cpu_count())
if async:
results = []
for fooChildren in foo.nodes:
results.append(pool.apply_async(getChildrenFiles, (fooChildren,)))
all(res.get() for res in results)
else:
for fooChildren in foo.nodes:
pool.apply(getChildrenFiles, (fooChildren,))
pool.close()
pool.join()
if __name__ == "__main__":
foo = Foo("MyProject")
myRecursiveFunction(foo)
'''print foo as Tree Like :
Foo-A
Foo-B
Foo-C
Foo-E
Foo-K
'''
Thanks

I have finally found the problem, it iss very specific to my program but I will still try to explain it.
Explanation
In the function getChildrenFiles I am using a lock for the git object and the git object is declared as a global variable.
The problem was that I am sending some command like:
def getChildrenFiles(node):
with node.lock:
with git.lock:
sendCommand(....)
.....
with git.lock:
sendOtherCommand(....)
myRecursiveFunction(child, False)
And between the first command and the second one, an other thread has locked the git object and some files used by the git object in the first thread were changed in the other thread. Then an custom exception was raised.

Related

Python Asyncio and Multithreading

I have created a greatly simplified version of an application below that intends to use Python's asyncio and threading modules. The general structure is as follows:
import asyncio
import threading
class Node:
def __init__(self, loop):
self.loop = loop
self.tasks = set()
async def computation(self, x):
print("Node: computation called with input ", x)
await asyncio.sleep(1)
def schedule_computation(self, x):
print("Node: schedule_computation called with input ", x)
task = self.loop.create_task(self.computation(x))
self.tasks.add(task)
class Router:
def __init__(self, loop):
self.loop = loop
self.nodes = {}
def register_node(self, id):
self.nodes[id] = Node(self.loop)
def schedule_computation(self, node_id, x):
print("Router: schedule_computation called with input ", x)
self.nodes[node_id].schedule_computation(x)
class Client:
def __init__(self, router):
self.router = router
self.counter = 0
def run(self):
while True:
if self.counter == 1000000:
self.router.schedule_computation(1, 5)
self.counter += 1
def main():
loop = asyncio.get_event_loop()
# construct Router instance and register a node
router = Router(loop)
router.register_node(1)
# construct Client instance
client = Client(router)
client_thread = threading.Thread(target=client.run)
client_thread.start()
loop.run_forever()
main()
In practice the Node.computation method is doing some network I/O and thus I'd like to perform said work asynchronously. The Client.run method is synchronous and blocking and I'd like to give this function it's own thread to execute in (in fact I'd like the ability to run this method in a separate process if possible).
Upon executing this application we get the following output:
Router: schedule_computation called with input 5
Node: schedule_computation called with input 5
However, I expect that "Node: computation called with input 5" should print as well because the Node.schedule_computation method creates a task to run on loop. In summary, why does it seem that Node.computation is never scheduled?
Use loop.call_soon_threadsafe
In general, asyncio isn't thread safe
Almost all asyncio objects are not thread safe, which is typically not
a problem unless there is code that works with them from outside of a
Task or a callback. If there’s a need for such code to call a
low-level asyncio API, the loop.call_soon_threadsafe() method should
be used
https://docs.python.org/3/library/asyncio-dev.html#concurrency-and-multithreading
SCHEDULE COMPUTATION
loop.call_soon_threadsafe(self.nodes[node_id].schedule_computation,x)
Node.computation runs on main thread
Not sure if you are aware, but even though you can use call_soon_threadsafe to initiate a coroutine from another thread. The coroutine always runs in the thread the loop was created in. If you want to run coroutines on another thread, then your background thread will need its own EventLoop also.

Python multiprocess: run several instances of a class, keep all child processes in memory

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

Dart: Store heavy object in an isolate and access its method from main isolate without reinstatiating it

is it possible in Dart to instantiate a class in an isolate, and then send message to this isolate to receive a return value from its methods (instead of spawning a new isolate and re instantiate the same class every time)? I have a class with a long initialization, and heavy methods. I want to initialize it once and then access its methods without compromising the performance of the main isolate.
Edit: I mistakenly answered this question thinking python rather than dart. snakes on the brain / snakes on a plane
I am not familiar with dart programming, but it would seem the concurrency model has a lot of similarities (isolated memory, message passing, etc..). I was able to find an example of 2 way message passing with a dart isolate. There's a little difference in how it gets set-up, and the streams are a bit simpler than python Queue's, but in general the idea is the same.
Basically:
Create a port to receive data from the isolate
Create the isolate passing it the port it will send data back on
Within the isolate, create the port it will listen on, and send the other end of it back to main (so main can send messages)
Determine and implement a simple messaging protocol for remote method call on an object contained within the isolate.
This is basically duplicating what a multiprocessing.Manager class does, however it can be helpful to have a simplified example of how it can work:
from multiprocessing import Process, Lock, Queue
from time import sleep
class HeavyObject:
def __init__(self, x):
self._x = x
sleep(5) #heavy init
def heavy_method(self, y):
sleep(.2) #medium weight method
return self._x + y
def HO_server(in_q, out_q):
ho = HeavyObject(5)
#msg format for remote method call: ("method_name", (arg1, arg2, ...), {"kwarg1": 1, "kwarg2": 2, ...})
#pass None to exit worker cleanly
for msg in iter(in_q.get, None): #get a remote call message from the queue
out_q.put(getattr(ho, msg[0])(*msg[1], **msg[2])) #call the method with the args, and put the result back on the queue
class RMC_helper: #remote method caller for convienience
def __init__(self, in_queue, out_queue, lock):
self.in_q = in_queue
self.out_q = out_queue
self.l = lock
self.method = None
def __call__(self, *args, **kwargs):
if self.method is None:
raise Exception("no method to call")
with self.l: #isolate access to queue so results don't pile up and get popped off in possibly wrong order
print("put to queue: ", (self.method, args, kwargs))
self.in_q.put((self.method, args, kwargs))
return self.out_q.get()
def __getattr__(self, name):
if not name.startswith("__"):
self.method = name
return self
else:
super().__getattr__(name)
def child_worker(remote):
print("child", remote.heavy_method(5)) #prints 10
sleep(3) #child works on something else
print("child", remote.heavy_method(2)) #prints 7
if __name__ == "__main__":
in_queue = Queue()
out_queue = Queue()
lock = Lock() #lock is used as to not confuse which reply goes to which request
remote = RMC_helper(in_queue, out_queue, lock)
Server = Process(target=HO_server, args=(in_queue, out_queue))
Server.start()
Worker = Process(target=child_worker, args=(remote, ))
Worker.start()
print("main", remote.heavy_method(3)) #this will *probably* start first due to startup time of child
Worker.join()
with lock:
in_queue.put(None)
Server.join()
print("done")

Django initialising AppConfig multiple times

I wanted to use the ready() hook in my AppConfig to start django-rq scheduler job. However it does so multiple times, every times I start the server. I imagine that's due to threading however I can't seem to find a suitable workaround. This is my AppConfig:
class AnalyticsConfig(AppConfig):
name = 'analytics'
def ready(self):
print("Init scheduler")
from analytics.services import save_hits
scheduler = django_rq.get_scheduler('analytics')
scheduler.schedule(datetime.utcnow(), save_hits, interval=5)
Now when I do runserver, Init scheduler is displayed 3 times. I've done some digging and according to this question I started the server with --noreload which didn't help (I still got Init scheduler x3). I also tried putting
import os
if os.environ.get('RUN_MAIN', None) != 'true':
default_app_config = 'analytics.apps.AnalyticsConfig'
in my __init__.py however RUN_MAIN appears to be None every time.
Afterwards I created a FileLock class, to skip configuration after the first initialization, which looks like this:
class FileLock:
def __get__(self, instance, owner):
return os.access(f"{instance.__class__.__name__}.lock", os.F_OK)
def __set__(self, instance, value):
if not isinstance(value, bool):
raise AttributeError
if value:
f = open(f"{instance.__class__.__name__}.lock", 'w+')
f.close()
else:
os.remove(f"{instance.__class__.__name__}.lock")
def __delete__(self, obj):
raise AttributeError
class AnalyticsConfig(AppConfig):
name = 'analytics'
locked = FileLock()
def ready(self):
from analytics.services import save_hits
if not self.locked:
print("Init scheduler")
scheduler = django_rq.get_scheduler('analytics')
scheduler.schedule(datetime.utcnow(), save_hits, interval=5)
self.locked = True
This does work, however the lock is not destroyed after the app quits. I tried removing the .lock files in settings.py but it also runs multiple times, making this pointless.
My question is: How can I prevent django from calling ready() multiple times, or how otherwise can I teardown the .lock files after django exits or right after it boots?
I'm using python 3.8 and django 3.1.5

QObject and QThread relations

I have a pyqt4 gui which allows me to import multiple .csv files. I've created a loop that goes through this list of tuples that have the following parameters (filename + location of file, filename, bool,bool, set of dates in file)=tup.
I've created several classes that my gui frequently refers to in order to pull parameters off a projects profile. Let's call this class profile(). I also have another class that has a lot of functions based on formatting, such as datetime, text edits,etc...let's call this classMyFormatting(). Then I created a QThread class that is created for each file in the list, and this one is called Import_File(QThread). And lets say this class takes in a few parameters for the __init__(self,tup).
My ideal goal is to be able to make an independent instance of MyFormatting() and profile() for the Import_File(QThread). I am trying to get my head around on how to utilize the QObject capabilities to solve this..but I keep getting the error that the thread is being destroyed while still running.
for tup in importedlist:
importfile = Import_File(tup)
self.connect(importfile,QtCore.SIGNAL('savedfile(PyQt_PyObject()'),self.printstuffpassed)
importfile.start()
I was thinking of having the two classes be declared as
MyFormatting(QObject):
def __init__(self):
QObject.__init__(self)
def func1(self,stuff):
dostuff
def func2(self):
morestuff
profile(QObject):
def __init__(self):
QObject.__init__(self)
def func11(self,stuff):
dostuff
def func22(self):
morestuff
AND for the QThread:
Import_File(QThread):
def __init__(self,tup):
QThread.__init(self)
common_calc_stuff = self.calc(tup[4])
f = open(tup[0] + '.csv', 'w')
self.tup = tup
# this is where I thought of pulling an instance just for this thread
self.MF = MyFormatting()
self.MF_thread = QtCore.QThread()
self.MF.moveToThread(self.MF_thread)
self.MF_thread.start()
self.prof = profile()
self.prof_thread = QtCore.QThread()
self.prof.moveToThread(self.prof_thread)
self.prof_thread.start()
def func33(self,stuff):
dostuff
self.prof.func11(tup[4])
def func44(self):
morestuff
def run(self):
if self.tup[3] == True:
self.func33
self.MF.func2
elif self.tup[3] ==False:
self.func44
if self.tup[2] == True:
self.prof.func22
self.emit(QtCore.SIGNAL('savedfile()',)
Am I totally thinking of it the wrong way? How can I keep to somewhat of the same structure that I have for the coding and still be able to implement the multithreading and not have the same resource tapped at the same time, which I think is the reason why my qui keeps crashing? Or how can I make sure that each instance of those objects get turned off that they don't interfere with the other instances?

Resources