How to deal with python3 multiprocessing in __main__.py - python-3.x

Il posed question, I did not understand the ture cause of the issue (it seems to have been related to my usage of flask in one of the subprocesses).
PLEASE IGNORE THIS (can't delete due to bounty)
Essentially, I have to start some Processes and or a pool when running a python library as a module.
However, since __name__ == '__main__' is always true in __main__.py this proves to be an issue (see multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html)
I've attempted multiple solutions ranging from: pytgquabr.com:8182/58288945/using-multiprocessing-with-runpy to a file-based mutext to only allow the contents of main to run once but multiprocessing still behaves strangely (e.g. Processes die almost as soon as they start with no error logs).
Any idea of what the "proper" way of going about this is ?

Guarding the __main__ module is only needed if an object defined inside __main__ is used in another process. Looking up this definition is what causes the execution of __main__ in the subprocess.
When using a __main__.py, restrict all definitions used with multiprocessing to other modules. __main__.py should only import and use these.
# my_package/some_module.py
def module_print(*args, **kwargs):
"""Function defined in some module - fine for use inside multiprocess"""
print(*args, **kwargs)
# my_package/__main__.py
import multiprocessing # imports are allowed
from .some_module import module_print
def do_multiprocess():
"""Function defined in __main__ module - fine for use wrapping multiprocess"""
with multiprocessing.Pool(processes=12) as pool:
pool.map(module_print, range(20)) # multiprocessing external function is allowed
do_multiprocess() # directly calling __main__ function is allowed

Related

Lock is not recognized in functions of different python modules

I have a multiprocessing Lock, that I define as
import multiprocessing
lock1 = mp.Lock()
To share this lock among the different child processes I do:
def setup_process(lock1):
global lock_1
lock_1 = lock1
pool = mp.Pool(os.cpu_count() - 1,
initializer=setup_process,
initargs=[lock1])
Now I've noticed that if the processes call the following function, and the function is defined in the same python module (i.e., same file):
def test_func():
print("lock_1:", lock_1)
with lock_1:
print(str(mp.current_process()) + " has the lock in test function.")
I get an output like:
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-1' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-2' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-3' parent=82414 started daemon> has the lock in test function.
However, if test_function is defined in a different file, the Lock is not recognized, and I get:
NameError:
name 'lock_1' is not defined
This seems to happen for every function, where the important distinction is whether the function is defined in this module or in another one. I'm sure I'm missing something very obvious with the global variables, but I'm new to this and I haven't been able to figure it out. How can I make the Locks be recognized everywhere?
Well, I learned something new about python today: global isn't actually truly global. It only is accessible at the module scope.
There are a multitude of ways of sharing your lock with the module in order to allow it to be used, and the docs even suggest a "canonical" way of sharing globals between modules (though I don't feel it's the most appropriate for this situation). To me this situation illustrates one of the short fallings of using globals in the first place, though I have to admit in the specific case of multiprocessing.Pool initializers it seems to be the accepted or even intended use case to use globals to pass data to worker functions. It actually makes sense that globals can't cross module boundaries because that would make the separate module 100% dependent on being executed by a specific script, so it can't really be considered a separate independent library. Instead it could just be included in the same file. I recognize that may be at odds with splitting things up not to create re-usable libraries but simply just to organize code logically in shorter to read segments, but that's apparently a stylistic choice by the designers of python.
To solve your problem, at the end of the day, you are going to have to pass the lock to the other module as an argument, so you might as well make test_func recieve lock_1 as an argument. You may have found however that this will cause a RuntimeError: Lock objects should only be shared between processes through inheritance message, so what to do? Basically, I would keep your initializer, and put test_func in another function which is in the __main__ scope (and therefore has access to your global lock_1) which grabs the lock, and then passes it to the function. Unfortunately we can't use a nicer looking decorator or a wrapper function, because those return a function which only exists in a local scope, and can't be imported when using "spawn" as the start method.
from multiprocessing import Pool, Lock, current_process
def init(l):
global lock_1
lock_1 = l
def local_test_func(shared_lock):
with shared_lock:
print(f"{current_process()} has the lock in local_test_func")
def local_wrapper():
global lock_1
local_test_func(lock_1)
from mymodule import module_test_func #same as local_test_func basically...
def module_wrapper():
global lock_1
module_test_func(lock_1)
if __name__ == "__main__":
l = Lock()
with Pool(initializer=init, initargs=(l,)) as p:
p.apply(local_wrapper)
p.apply(module_wrapper)

How is it possible to execute python code during deserialization?

I was reading about pickling in the context of persisting instances, and ran across this snippet:
Pickle files can be hacked. If you receive a raw pickle file over the network, don't trust it! It could have malicious code in it, that would run arbitrary python when you try to de-pickle it. [1]
My understanding is that pickling turns a data-structure into an array of bytes, and the pickle library also contains methods to take a pickled byte array and rebuild a python instance from it.
I tested some code to see if simply putting code into the class or init method would run it:
import pickle
class A:
print('class')
def __init__(self):
print('instance')
a = A()
print('pickling...')
with open('/home/usrname/Desktop/pfile', 'wb') as pfile:
pickle.dump(a, pfile, pickle.HIGHEST_PROTOCOL)
print('de-pickling...')
with open('/home/usrname/Desktop/pfile', 'rb') as pfile:
a2 = pickle.load(pfile)
However this only yields
class
instance
pickling...
de-pickling...
suggesting that the __ init__ method doesn't actually get run when the instance is unpickled. So I'm still confused how you would make code run during that process.
Really thorough writeup here: https://intoli.com/blog/dangerous-pickles/
From what I understand, it has to do with how pickles are interpreted by the Pickle Machine (PM) and run. You can craft a pickle file that will cause it to evaluate using eval() the statements provided.

Using 'with' with 'next()' in python 3 [duplicate]

I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?
I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want
I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.
The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.
Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.
An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.
See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.
points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)
The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.
Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()
The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm

Why does some widgets don't update on Qt5?

I am trying to create a PyQt5 application, where I have used certain labels for displaying status variables. To update them, I have implemented custom pyqtSignal manually. However, on debugging I find that the value of GUI QLabel have changed but the values don't get reflected on the main window.
Some answers suggested calling QApplication().processEvents() occasionally. However, this instantaneously crashes the application and also freezes the application.
Here's a sample code (all required libraries are imported, it's just the part creating problem, the actual code is huge):
from multiprocessing import Process
def sub(signal):
i = 0
while (True):
if (i % 5 == 0):
signal.update(i)
class CustomSignal(QObject):
signal = pyqtSignal(int)
def update(value):
self.signal.emit(value)
class MainApp(QWidget):
def __init__(self):
super().__init__()
self.label = QLabel("0");
self.customSignal = CustomSignal()
self.subp = Process(target=sub, args=(customSignal,))
self.subp.start()
self.customSignal.signal.connect(self.updateValue)
def updateValue(self, value):
print("old value", self.label.text())
self.label.setText(str(value))
print("new value", self.label.text())
The output of the print statements is as expected. However, the text in label does not change.
The update function in CustomSignal is called by some thread.
I've applied the same method to update progress bar which works fine.
Is there any other fix for this, other than processEvents()?
The OS is Ubuntu 16.04.
The key problem lies in the very concept behind the code.
Processes have their own address space, and don't share data with another processes, unless some inter-process communication algorithm is used. Perhaps, multithreading module was used instead of threading module to bring concurrency to avoid Python's GIL and speedup the program. However, subprocess has cannot access the data of parent process.
I have tested two solutions to this case, and they seem to work.
threading module: No matter threading in Python is inefficient due to GIL, but it's still sufficient to some extent for basic concurrency demands. Note the difference between concurrency and speedup.
QThread: Since you are using PyQt, there's isn't any issue in using QThread, which is a better option because it takes concurrency to multiple cores taking advantage of operating system's system call, rather than Python in the middle.
Try adding
self.label.repaint()
immediately after updating the text, like this:
self.label.setText(str(value))
self.label.repaint()

What is mean when it raises a PicklingError?

Hy there!
I'm new on python 3.
I'm using the pvmomi module to get a dict of vm's from my server. When i try to run my file, with multiprocessing, i get the following Error:
_pickle.PicklingError: Can't pickle : attribute lookup vim.VirtualMachine on pyVmomi.VmomiSupport failed
What does this mean?
Here is a part of my code:
def login(vm):
#do something
if __name__=='__main__':
cpu = mp.cpu_count()
workers = mp.Pool(cpu)
workers.map(login,range(1))
for vm in vmDict:
login(vm)
My biggest problem comes from the for loop. I need this loop to do the jobs for every dictitem but only one pool worker do the job. Now i have configured my code to this below and it raises the PicklingError.
Thanks for help. It drives me crazy!
The stdlib pickle (.py) module imports the builtin C-coded _pickle module. The pickle module can serialize most Python objects and is used to transport Python objects between processes. In particular, pickle is used by multiprocessing (and perhaps by pyvmomi). User-defined classes sometimes define special methods (reduce and reducex, I believe) to help the pickle and unpickle processes.
The exception message says that an attribute lookup failed. Perhaps the pyVmomi object is not properly configured to be pickled. You might check to module doc to see if it says anything about pickle support.

Resources