NameError: name 'lock' is not defined - python-3.x

I have a python program as follows
def send_request(data):
global lock
lock.acquire()
print(data)
lock.release()
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = multiprocessing.Lock()
pool = multiprocessing.Pool(3)
pool.map(send_request, data_list)
pool.close()
pool.join()
Why did this error occur? NameError: name 'lock' is not defined.
updated
In the answer below, #Jean-François Fabre said that the reason is because “When running your subprocesses, python is "forking" and doesn't see the lock declaration, because it doesn't execute the __main__ part in subprocesses.”
But in the following example, the subprocess should not see the lock definition too, but why is the program working fine?
import multiprocessing
def send_request(data):
lock.acquire()
print(data,' ',os.getpid())
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = multiprocessing.Lock()
pool = multiprocessing.Pool(8, initializer=init, initargs=(lock,))
pool.map(send_request, data_list)
pool.close()
pool.join()

In the context of multiprocessing, you have to do more than that.
When running your subprocesses, python is "forking" and doesn't see the lock declaration, because it doesn't execute the __main__ part in subprocesses.
Plus, on Windows (which doesn't have fork, the forking is emulated, leading to different behaviour compared to Unix-like platforms: in a nutshell, fork is able to resume the new process where the old process started, but on Windows, Python has to run a new process from the beginning and take control afterwards, this leads to side effects)
You have to create your lock as a global variable outside the __main__ test (and you can drop global keyword, it will work without it)
import multiprocessing
lock = multiprocessing.Lock()
def send_request(data):
lock.acquire()
print(data)
lock.release()
with those modifications your program prints
data1
data2
data3
as expected.

Related

Runtime error using concurrent.futures.ProcessPoolExecutor

I have seen many YouTube videos for basic tutorials for concurrent.futures.ProcessPoolExecutor. I have also seen posts in SO here and here, GitHub and GitHubMemory, yet no luck.
Problem:
I'm getting the following runtime error:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
I admit it, I do not fully understand this error since this is my very first attempt at multiprocessing in my python code.
Here's my pseudocode:
module.py
import xyz
from multiprocessing import freeze_support
def abc():
return x
def main():
xyz
qwerty
if __name__ == "__main__":
freeze_support()
obj = Object()
main()
classObject.py
import abcd
class Object(object):
def __init__(self):
asdf
cvbn
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
executor.map(self.function_for_multiprocess, var1, var2)
# ****The error points at the code above.👆*👆*👆
def function_for_multiprocess(var1, var2):
doSomething1
doSomething2
self.variable = something
My class file (classObject.py) does not have the "main" guard.
Things I have tried:
Tried adding if __name__ == "__main__": and freeze_support in the classObject.py along with renaming __init__() to main()`
While doing the above, removed the freeze_support from the module.py
I haven't found a different solution from the link provided above. Any insights would be greatly appreciated!
I'm using a MacBook Pro (16-inch, 2019), Processor 2.3 GHz 8-Core Intel Core i9, OS:Big Sur. I don't think that matters but just declaring it if it does.
you need to pass arguments as picklable object, so as list or a tuple.
and you don't need freeze_support()
just change executor.map(self.function_for_multiprocess, var1, var2)
to executor.map(self.function_for_multiprocess, (var1, var2))
from multiprocessing import freeze_support
import concurrent.futures
class Object(object):
def __init__(self, var1=1, var2=2):
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
executor.map(self.function_for_multiprocess, (var1, var2))
def function_for_multiprocess(var1, var2):
print('var1:', var1)
print('var2:', var2)
def abc(x):
return x
def main():
print('abc:', abc(200))
if __name__ == "__main__":
#freeze_support()
obj = Object()
main()

How to transfer data between two separate scripts in Multiprocessing?

I am using multiprocessing to run two python scripts in parallel. p1.y continually updates a certain variable and the latest value of the variable will be displayed by p2.py after every 2seconds. The code for multiprocessing of the two scripts are given below:
import os
from multiprocessing import Process
def script1():
os.system("p1.py")
def script2():
os.system("p2.py")
if __name__ == '__main__':
p = Process(target=script1)
q = Process(target=script2)
p.start()
q.start()
p.join()
q.join()
I am unable to transfer the value of the variable being updated by p1.py to p2.py. How should I approach the problem in a very simple way?

When I keep this statement `time.sleep(0.5)`, why is the output of the program not 1, 2, 3

I have a python program as follows
import multiprocessing
import time
icout=1
def send_request(data):
lock.acquire()
time.sleep(0.5)
global icout
icout=icout+1
print(icout)
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
lock = multiprocessing.Lock()
pool = multiprocessing.Pool(3, initializer=init, initargs=(lock,))
pool.map(send_request, data_list)
pool.close()
pool.join()
When I comment out the statement time.sleep(0.5),it will print 1,2,3.
When I keep the statement time.sleep(0.5),it will print 2,2,2
my question is When I keep this statement time.sleep(0.5), why is the output of the program not 1, 2, 3
I have locked it, it seems that the lock does not work.
So, 2,3,4 notwithstanding here's a way based on a shared memory manager:
import multiprocessing
import time
def send_request(data):
lock.acquire()
try:
time.sleep(0.5)
data['c'] = data['c'] + 1
print(data['c'])
finally:
lock.release()
def init(l):
global lock
lock = l
if __name__ == '__main__':
data_list = ['data1', 'data2', 'data3']
with multiprocessing.Manager() as manager:
d = manager.dict()
d['c'] = 1
lock = manager.Lock()
pool = multiprocessing.Pool(3, initializer=init, initargs=(lock,))
pool.map(send_request, [d,d,d])
pool.close()
pool.join()
The idea is that globals won't work because we're spawning processes: hence we need some shared memory. That's what the manager gives us. See this answer too.

Python - How to use multiprocessing Lock in class instance?

I am using Python 3.7 on Windows.
What I am trying to do:
- lock a method of an instance of a class, when another process has acquired that same lock.
Attempts:
I have already successfully done this, but I don't want a global variable here for the lock, but instead one completely internal to the class
from multiprocessing import Lock, freeze_support,Pool
from time import sleep
def do_work(name):
print(name+' waiting for lock to work...',end='')
sleep(2)
with lock:
print('done!')
print(name+' doing work...',end='')
sleep(5)
print('done!')
def init(olock):
global lock
lock = olock
if __name__ == '__main__':
freeze_support()
args_list = [('a'),('b'),('c')]
lock=Lock()
p=Pool(8,initializer=init,initargs=(lock,))
p.map_async(do_work,args_list)
p.close()
p.join()
When this last chunk of code runs, it takes ~17.3 seconds, because of the lock. Without the lock it takes ~7 seconds.
I have tried to implement this inside a class, but the lock does nothing, and it always runs in ~7 seconds.
class O():
def __init__(self):
self.lock=Lock()
def __getstate__(self): # used to remove multiprocess object(s) from class, so it can be pickled
self_dict=self.__dict__.copy()
del self_dict['lock']
return self_dict
def __setstate__(self,state): # used to remove multiprocess object(s) from class, so it can be pickled
self.__dict__.update(state)
def _do_work(self,name):
print(name+' waiting for lock to work...',end='')
sleep(2)
with self.lock:
print('done!')
print(name+' doing work...',end='')
sleep(5)
print('done!')
if __name__ == '__main__':
freeze_support()
c = O()
pool = Pool(8)
pool.apply_async(c._do_work,('a',))
pool.apply_async(c._do_work,('b',))
pool.apply_async(c._do_work,('c',))
pool.close()
pool.join()
Question:
So, what can I do to lock up this class instance while I call a method which interacts with a resource asynchronously through multiprocessing?
apply_async will pickle function object and send to pool worker process by queue, but as c._do_work is a bound method, the instance will be pickled too, which results in an error. you could wrap it within a plain function:
c = O()
def w(*args):
return c._do_work(*args)
if __name__ == '__main__':
pool = Pool(1)
pool.apply_async(w, ('a',))
...
and you should remove __setstate__/__getstate__.

Python multiprocessing script partial output

I am following the principles laid down in this post to safely output the results which will eventually be written to a file. Unfortunately, the code only print 1 and 2, and not 3 to 6.
import os
import argparse
import pandas as pd
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist):
for par in parlist:
queue.put(par)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
try:
par=queueIn.get(block=False)
res=doCalculation(par)
queueOut.put((res))
queueIn.task_done()
except:
break
def doCalculation(par):
return par
def write(queue):
while True:
try:
par=queue.get(block=False)
print("response:",par)
except:
break
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
On running the code it prints,
$ python3 tst.py
Queue size 6
response: 1
response: 2
Also, is it possible to ensure that the write function always outputs 1,2,3,4,5,6 i.e. in the same order in which the data is fed into the feed queue?
The error is somehow with the task_done() call. If you remove that one, then it works, don't ask me why (IMO that's a bug). But the way it works then is that the queueIn.get(block=False) call throws an exception because the queue is empty. This might be just enough for your use case, a better way though would be to use sentinels (as suggested in the multiprocessing docs, see last example). Here's a little rewrite so your program uses sentinels:
import os
import argparse
import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
def feed(queue, parlist, nthreads):
for par in parlist:
queue.put(par)
for i in range(nthreads):
queue.put(None)
print("Queue size", queue.qsize())
def calc(queueIn, queueOut):
while True:
par=queueIn.get()
if par is None:
break
res=doCalculation(par)
queueOut.put((res))
def doCalculation(par):
return par
def write(queue):
while not queue.empty():
par=queue.get()
print("response:",par)
if __name__ == "__main__":
nthreads = 2
workerQueue = Queue()
writerQueue = Queue()
considerperiod=[1,2,3,4,5,6]
feedProc = Process(target=feed, args=(workerQueue, considerperiod, nthreads))
calcProc = [Process(target=calc, args=(workerQueue, writerQueue)) for i in range(nthreads)]
writProc = Process(target=write, args=(writerQueue,))
feedProc.start()
feedProc.join()
for p in calcProc:
p.start()
for p in calcProc:
p.join()
writProc.start()
writProc.join()
A few things to note:
the sentinel is putting a None into the queue. Note that you need one sentinel for every worker process.
for the write function you don't need to do the sentinel handling as there's only one process and you don't need to handle concurrency (if you would do the empty() and then get() thingie in your calc function you would run into a problem if e.g. there's only one item left in the queue and both workers check empty() at the same time and then both want to do get() and then one of them is locked forever)
you don't need to put feed and write into processes, just put them into your main function as you don't want to run it in parallel anyway.
how can I have the same order in output as in input? [...] I guess multiprocessing.map can do this
Yes map keeps the order. Rewriting your program into something simpler (as you don't need the workerQueue and writerQueue and adding random sleeps to prove that the output is still in order:
from multiprocessing import Pool
import time
import random
def calc(val):
time.sleep(random.random())
return val
if __name__ == "__main__":
considerperiod=[1,2,3,4,5,6]
with Pool(processes=2) as pool:
print(pool.map(calc, considerperiod))

Resources