Python - How to use multiprocessing Lock in class instance? - python-3.x

I am using Python 3.7 on Windows.
What I am trying to do:
- lock a method of an instance of a class, when another process has acquired that same lock.
Attempts:
I have already successfully done this, but I don't want a global variable here for the lock, but instead one completely internal to the class
from multiprocessing import Lock, freeze_support,Pool
from time import sleep
def do_work(name):
print(name+' waiting for lock to work...',end='')
sleep(2)
with lock:
print('done!')
print(name+' doing work...',end='')
sleep(5)
print('done!')
def init(olock):
global lock
lock = olock
if __name__ == '__main__':
freeze_support()
args_list = [('a'),('b'),('c')]
lock=Lock()
p=Pool(8,initializer=init,initargs=(lock,))
p.map_async(do_work,args_list)
p.close()
p.join()
When this last chunk of code runs, it takes ~17.3 seconds, because of the lock. Without the lock it takes ~7 seconds.
I have tried to implement this inside a class, but the lock does nothing, and it always runs in ~7 seconds.
class O():
def __init__(self):
self.lock=Lock()
def __getstate__(self): # used to remove multiprocess object(s) from class, so it can be pickled
self_dict=self.__dict__.copy()
del self_dict['lock']
return self_dict
def __setstate__(self,state): # used to remove multiprocess object(s) from class, so it can be pickled
self.__dict__.update(state)
def _do_work(self,name):
print(name+' waiting for lock to work...',end='')
sleep(2)
with self.lock:
print('done!')
print(name+' doing work...',end='')
sleep(5)
print('done!')
if __name__ == '__main__':
freeze_support()
c = O()
pool = Pool(8)
pool.apply_async(c._do_work,('a',))
pool.apply_async(c._do_work,('b',))
pool.apply_async(c._do_work,('c',))
pool.close()
pool.join()
Question:
So, what can I do to lock up this class instance while I call a method which interacts with a resource asynchronously through multiprocessing?

apply_async will pickle function object and send to pool worker process by queue, but as c._do_work is a bound method, the instance will be pickled too, which results in an error. you could wrap it within a plain function:
c = O()
def w(*args):
return c._do_work(*args)
if __name__ == '__main__':
pool = Pool(1)
pool.apply_async(w, ('a',))
...
and you should remove __setstate__/__getstate__.

Related

NameError: name 'lock' is not defined

I have a python program as follows
def send_request(data):
global lock
lock.acquire()
print(data)
lock.release()
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = multiprocessing.Lock()
pool = multiprocessing.Pool(3)
pool.map(send_request, data_list)
pool.close()
pool.join()
Why did this error occur? NameError: name 'lock' is not defined.
updated
In the answer below, #Jean-François Fabre said that the reason is because “When running your subprocesses, python is "forking" and doesn't see the lock declaration, because it doesn't execute the __main__ part in subprocesses.”
But in the following example, the subprocess should not see the lock definition too, but why is the program working fine?
import multiprocessing
def send_request(data):
lock.acquire()
print(data,' ',os.getpid())
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = multiprocessing.Lock()
pool = multiprocessing.Pool(8, initializer=init, initargs=(lock,))
pool.map(send_request, data_list)
pool.close()
pool.join()
In the context of multiprocessing, you have to do more than that.
When running your subprocesses, python is "forking" and doesn't see the lock declaration, because it doesn't execute the __main__ part in subprocesses.
Plus, on Windows (which doesn't have fork, the forking is emulated, leading to different behaviour compared to Unix-like platforms: in a nutshell, fork is able to resume the new process where the old process started, but on Windows, Python has to run a new process from the beginning and take control afterwards, this leads to side effects)
You have to create your lock as a global variable outside the __main__ test (and you can drop global keyword, it will work without it)
import multiprocessing
lock = multiprocessing.Lock()
def send_request(data):
lock.acquire()
print(data)
lock.release()
with those modifications your program prints
data1
data2
data3
as expected.

call method on running process from parent process

I'm trying to write a program that interfaces with hardware via pyserial according to this diagram https://github.com/kiyoshi7/Intrument/blob/master/Idea.gif . my problem is that I don't know how to tell the child process to run a method.
I tried reducing my problem down to the essence of what I am trying to do can call the method request() from the main script. I just dont know how to handle two way communication like this, in examples using queue i just see data shared or i cant understand the examples
import multiprocessing
from time import sleep
class spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
self.Update()
def request(self, x):
print("{} was requested.".format(x))
def Update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
p = multiprocessing.Process(target=spawn, args=(1,1))
p.start()
sleep(5)
p.request(2) #here I'm trying to run the method I want
update thanks to Carcigenicate
import multiprocessing
from time import sleep
from operator import methodcaller
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
while True:
sleep(1.5)
spawn.request(2) # Now you can reference the "spawn"
You're going to need to rearrange things a bit. I would not do the long running (infinite) work from the constructor. That's generally poor practice, and is complicating things here. I would instead initialize the object, then run the loop in the separate process:
from operator import methodcaller
class Spawn:
def __init__(self, _number, _max):
self._number = _number
self._max = _max
# Don't call update here
def request(self, x):
print("{} was requested.".format(x))
def update(self):
while True:
print("Spawned {} of {}".format(self._number, self._max))
sleep(2)
if __name__ == '__main__':
spawn = Spawn(1, 1) # Create the object as normal
p = multiprocessing.Process(target=methodcaller("update"), args=(spawn,)) # Run the loop in the process
p.start()
spawn.request(2) # Now you can reference the "spawn" object to do whatever you like
Unfortunately, since Process requires that it's target argument is pickleable, you can't just use a lambda wrapper like I originally had (whoops). I'm using operator.methodcaller to create a pickleable wrapper. methodcaller("update") returns a function that calls update on whatever is given to it, then we give it spawn to call it on.
You could also create a wrapper function using def:
def wrapper():
spawn.update()
. . .
p = multiprocessing.Process(target=wrapper) # Run the loop in the process
But that only works if it's feasible to have wrapper as a global function. You may need to play around to find out what works best, or use a multiprocessing library that doesn't require pickleable tasks.
Note, please use proper Python naming conventions. Class names start with capitals, and method names are lowercase. I fixed that up in the code I posted.

Python multiprocessing script partial output

I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))

How to know Python Queue in full active?

I am using code as below for multiple thread in python3, I tried Threads in cpu_count() with 2, 3 and 4 times, but I am not sure if all those threads in using, how can I check if there are some queues are never used?
queue = Queue()
for x in range(cpu_count() * 2):
worker = DownloadWorker(queue)
worker.daemon = True
worker.start()
queue.join()
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
link, download_path = self.queue.get()
download_link(link, download_path)
self.queue.task_done()
def downloadImage(imageServer, imageLocal, queue):
queue.put((imageServer, imageLocal))
if you want to know if all your threads are working, you can just print the thread name every time it starts a task:
from threading import Thread
from queue import Queue
import random
import time
class DownloadWorker(Thread):
def __init__(self, queue):
Thread.__init__(self)
self.queue = queue
def run(self):
while True:
self.queue.get()
print('Thread: {}'.format(self.name))
time.sleep(random.random())
queue = Queue()
for i in range(100):
queue.put('data')
queue.task_done()
for x in range(4):
worker = DownloadWorker(queue)
worker.daemon = True
worker.start()
time.sleep(10)
Queue uses threading.Condition internally to block/release threads that called get() and threading.Condition uses a threading.Lock. From the documentation of threading.Lock:
When more than one thread is blocked in acquire() waiting for the
state to turn to unlocked, only one thread proceeds when a release()
call resets the state to unlocked; which one of the waiting threads
proceeds is not defined, and may vary across implementations.
I hope this answers the question.

Threaded result not giving same result as un-threaded result (python)

I have created a program to generate data points of functions that I later plot. The program takes a class which defines the function, creates a data outputting object which when called generates the data to a text file. To make the whole process faster I put the jobs in threads, however when I do, the data generated is not always correct. I have attached a picture to show what I mean:
Here are some of the relevant bits of code:
from queue import Queue
import threading
import time
queueLock = threading.Lock()
workQueue = Queue(10)
def process_data(threadName, q, queue_window, done):
while not done.get():
queueLock.acquire() # check whether or not the queue is locked
if not workQueue.empty():
data = q.get()
# data is the Plot object to be run
queueLock.release()
data.parent_window = queue_window
data.process()
else:
queueLock.release()
time.sleep(1)
class WorkThread(threading.Thread):
def __init__(self, threadID, q, done):
threading.Thread.__init__(self)
self.ID = threadID
self.q = q
self.done = done
def get_qw(self, queue_window):
# gets the queue_window object
self.queue_window = queue_window
def run(self):
# this is called when thread.start() is called
print("Thread {0} started.".format(self.ID))
process_data(self.ID, self.q, self.queue_window, self.done)
print("Thread {0} finished.".format(self.ID))
class Application(Frame):
def __init__(self, etc):
self.threads = []
# does some things
def makeThreads(self):
for i in range(1, int(self.threadNum.get()) +1):
thread = WorkThread(i, workQueue, self.calcsDone)
self.threads.append(thread)
# more code which just processes the function etc, sorts out the gui stuff.
And in a separate class (as I'm using tkinter, so the actual code to get the threads to run is called in a different window) (self.parent is the Application class):
def run_jobs(self):
if self.running == False:
# threads are only initiated when jobs are to be run
self.running = True
self.parent.calcsDone.set(False)
self.parent.threads = [] # just to make sure that it is initially empty, we want new threads each time
self.parent.makeThreads()
self.threads = self.parent.threads
for thread in self.threads:
thread.get_qw(self)
thread.start()
# put the jobs in the workQueue
queueLock.acquire()
for job in self.job_queue:
workQueue.put(job)
queueLock.release()
else:
messagebox.showerror("Error", "Jobs already running")
This is all the code which relates to the threads.
I don't know why when I run the program with multiple threads some data points are incorrect, whilst running it with just 1 single thread the data is all perfect. I tried looking up "threadsafe" processes, but couldn't find anything.
Thanks in advance!

Resources