I have a multiprocessing Lock, that I define as
import multiprocessing
lock1 = mp.Lock()
To share this lock among the different child processes I do:
def setup_process(lock1):
global lock_1
lock_1 = lock1
pool = mp.Pool(os.cpu_count() - 1,
initializer=setup_process,
initargs=[lock1])
Now I've noticed that if the processes call the following function, and the function is defined in the same python module (i.e., same file):
def test_func():
print("lock_1:", lock_1)
with lock_1:
print(str(mp.current_process()) + " has the lock in test function.")
I get an output like:
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-1' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-2' parent=82414 started daemon> has the lock in test function.
lock_1 <Lock(owner=None)>
<ForkProcess name='ForkPoolWorker-3' parent=82414 started daemon> has the lock in test function.
However, if test_function is defined in a different file, the Lock is not recognized, and I get:
NameError:
name 'lock_1' is not defined
This seems to happen for every function, where the important distinction is whether the function is defined in this module or in another one. I'm sure I'm missing something very obvious with the global variables, but I'm new to this and I haven't been able to figure it out. How can I make the Locks be recognized everywhere?
Well, I learned something new about python today: global isn't actually truly global. It only is accessible at the module scope.
There are a multitude of ways of sharing your lock with the module in order to allow it to be used, and the docs even suggest a "canonical" way of sharing globals between modules (though I don't feel it's the most appropriate for this situation). To me this situation illustrates one of the short fallings of using globals in the first place, though I have to admit in the specific case of multiprocessing.Pool initializers it seems to be the accepted or even intended use case to use globals to pass data to worker functions. It actually makes sense that globals can't cross module boundaries because that would make the separate module 100% dependent on being executed by a specific script, so it can't really be considered a separate independent library. Instead it could just be included in the same file. I recognize that may be at odds with splitting things up not to create re-usable libraries but simply just to organize code logically in shorter to read segments, but that's apparently a stylistic choice by the designers of python.
To solve your problem, at the end of the day, you are going to have to pass the lock to the other module as an argument, so you might as well make test_func recieve lock_1 as an argument. You may have found however that this will cause a RuntimeError: Lock objects should only be shared between processes through inheritance message, so what to do? Basically, I would keep your initializer, and put test_func in another function which is in the __main__ scope (and therefore has access to your global lock_1) which grabs the lock, and then passes it to the function. Unfortunately we can't use a nicer looking decorator or a wrapper function, because those return a function which only exists in a local scope, and can't be imported when using "spawn" as the start method.
from multiprocessing import Pool, Lock, current_process
def init(l):
global lock_1
lock_1 = l
def local_test_func(shared_lock):
with shared_lock:
print(f"{current_process()} has the lock in local_test_func")
def local_wrapper():
global lock_1
local_test_func(lock_1)
from mymodule import module_test_func #same as local_test_func basically...
def module_wrapper():
global lock_1
module_test_func(lock_1)
if __name__ == "__main__":
l = Lock()
with Pool(initializer=init, initargs=(l,)) as p:
p.apply(local_wrapper)
p.apply(module_wrapper)
Related
I am measuring the metrics of an encryption algorithm that I designed. I have declared 2 functions and a brief sample is as follows:
import sys, random, timeit, psutil, os, time
from multiprocessing import Process
from subprocess import check_output
pid=0
def cpuUsage():
global running
while pid == 0:
time.sleep(1)
running=true
p = psutil.Process(pid)
while running:
print(f'PID: {pid}\t|\tCPU Usage: {p.memory_info().rss/(1024*1024)} MB')
time.sleep(1)
def Encryption()
global pid, running
pid = os.getpid()
myList=[]
for i in range(1000):
myList.append(random.randint(-sys.maxsize,sys.maxsize)+random.random())
print('Now running timeit function for speed metrics.')
p1 = Process(target=metric_collector())
p1.start()
p1.join()
number=1000
unit='msec'
setup = '''
import homomorphic,random,sys,time,os,timeit
myList={myList}
'''
enc_code='''
for x in range(len(myList)):
myList[x] = encryptMethod(a, b, myList[x], d)
'''
dec_code='''
\nfor x in range(len(myList)):
myList[x] = decryptMethod(myList[x])
'''
time=timeit.timeit(setup=setup,
stmt=(enc_code+dec_code),
number=number)
running=False
print(f'''Average Time:\t\t\t {time/number*.0001} seconds
Total time for {number} Iters:\t\t\t {time} {unit}s
Total Encrypted/Decrypted Values:\t {number*len(myList)}''')
sys.exit()
if __name__ == '__main__':
print('Beginning Metric Evaluation\n...\n')
p2 = Process(target=Encryption())
p2.start()
p2.join()
I am sure there's an implementation error in my code, I'm just having trouble grabbing the PID for the encryption method and I am trying to make the overhead from other calls as minimal as possible so I can get an accurate reading of just the functionality of the methods being called by timeit. If you know a simpler implementation, please let me know. Trying to figure out how to measure all of the metrics has been killing me softly.
I've tried acquiring the pid a few different ways, but I only want to measure performance when timeit is run. Good chance I'll have to break this out separately and run it that way (instead of multiprocessing) to evaluate the function properly, I'm guessing.
There are at least three major problems with your code. The net result is that you are not actually doing any multiprocessing.
The first problem is here, and in a couple of other similar places:
p2 = Process(target=Encryption())
What this code passes to Process is not the function Encryption but the returned value from Encryption(). It is exactly the same as if you had written:
x = Encryption()
p2 = Process(target=x)
What you want is this:
p2 = Process(target=Encryption)
This code tells Python to create a new Process and execute the function Encryption() in that Process.
The second problem has to do with the way Python handles memory for Processes. Each Process lives in its own memory space. Each Process has its own local copy of global variables, so you cannot set a global variable in one Process and have another Process be aware of this change. There are mechanisms to handle this important situation, documented in the multiprocessing module. See the section titled "Sharing state between processes." The bottom line here is that you cannot simply set a global variable inside a Process and expect other Processes to see the change, as you are trying to do with pid. You have to use one of the approaches described in the documentation.
The third problem is this code pattern, which occurs for both p1 and p2.
p2 = Process(target=Encryption)
p2.start()
p2.join()
This tells Python to create a Process and to start it. Then you immediately wait for it to finish, which means that your current Process must stop at that point until the new Process is finished. You never allow two Processes to run at once, so there is no performance benefit. The only reason to use multiprocessing is to run two things at the same time, which you never do. You might as well not bother with multiprocessing at all since it is only making your life more difficult.
Finally I am not sure why you have decided to try to use multiprocessing in the first place. The functions that measure memory usage and execution time are almost certainly very fast, and I would expect them to be much faster than any method of synchronizing one Process to another. If you're worried about errors due to the time used by the diagnostic functions themselves, I doubt that you can make things better by multiprocessing. Why not just start with a simple program and see what results you get?
I am trying to have an integer value which would be assigned to a multiprocess programme and each process has a jit funtion to read and modify the value.
I came accross with multiprocessing.Manager().value which would pass a share value to each process, but numba.jit does not accept this type.
Is there any solution to work around it?
import numba
import multiprocessing
#numba.jit()
def jj (o, ii):
print (o.value)
o.value = ii
print (o.value)
if __name__ == '__main__':
o = multiprocessing.Manager().Value('i', 0 , lock=False)
y1 = multiprocessing.Process(target=jj, args=(o,10))
y1.daemon = True
y2 = multiprocessing.Process(target=jj, args=(o,20))
y2.daemon = True
y1.start()
y2.start()
y1.join()
y2.join()
You cannot modify a CPython object from an njit function so the function will (almost) not benefit from Numba (the only optimization Numba can do is looplifting but it cannot be used here anyway). What you try to archive is not possible with multiprocessing + njitted Numba functions. Numba can be fast because it does not operate on CPython types but native ones but multiprocessing's managers operate on only on CPython types. You can use the very experimental objmode scope of Numba so to execute pure-Python in a Numba function but be aware that this is slow (and it sometimes just crash currently).
Another big issue is that shared CPython objects are protected by the global interpreter lock (GIL) which basically prevent any parallel speed-up inside a process (unless on IO-based codes or similar things). The GIL is designed so to protect the interpreter of race conditions on the internal state of objects. AFAIK, managers can transfer pure-Python objects between processes thanks to pickling (which is slow), but using lock=False is unsafe and can also cause a race condition (not at the interpreter level thanks to the GIL).
Note the Numba function have to be recompiled for each process which is slow (caching can help the subsequent runs but not the first time because of concurrent compilation in multiple processes).
I have multiple locks that lock different parts of my API.
To lock any method I do something like this :
import threading
class DoSomething:
def __init__():
self.lock = threading.Lock()
def run(self):
with self.lock:
# do stuff requiring lock here
And for most use cases this works just fine.
But, I am unsure if what I am doing when requiring multiple locks works or not :
import threading
class DoSomething:
def __init__():
self.lock_database = threading.Lock()
self.lock_logger = threading.Lock()
def run(self):
with self.lock_database and self.lock_logger:
# do stuff requiring lock here
As it is, the code runs just fine but I am unsure if it runs as I want it to.
My question is : are the locks being obtained simultaneously or is the first one acquired and only then the second is also acquired.
Is my previous code as follows ?
with self.lock1:
with self.lock2:
# do stuff here
As it is, the code currently works but, since the chances of my threads requiring the same lock simultaneously is extremely low to begin with, I may end up with a massive headache to debug later
I am asking the question as I am very uncertain on how to test my code to ensure that it is working as intended and am equally interested in having the answer and knowing how I can test it to ensure that it works ( and not end up with the end users testing it for me )
Yes, you can do that, but beware of deadlocks. A deadlock occurs when one thread is unable to make progress because it needs to acquire a lock that some other thread is holding, but the second thread is unable to make progress because it wants the lock that the first thread already is holding.
Your code example locks lock_database first, and lock_logger second. If you can guarantee that any thread that locks them both will always lock them in that same order, then you're safe. A deadlock can never happen that way. But if one thread locks lock_database before trying to lock lock_logger, and some other thread tries to grab them both in the opposite order, that's a deadlock waiting to happen.
Looks easy. And it is, except...
...In a more sophisticated program, where locks are attached to objects that are passed around to different functions, then it may not be so easy because one thread may call some foobar(a, b) function, while another thread calls the same foobar() on the same two objects, except the objects are switched.
I came across the Python with statement for the first time today. I've been using Python lightly for several months and didn't even know of its existence! Given its somewhat obscure status, I thought it would be worth asking:
What is the Python with statement
designed to be used for?
What do
you use it for?
Are there any
gotchas I need to be aware of, or
common anti-patterns associated with
its use? Any cases where it is better use try..finally than with?
Why isn't it used more widely?
Which standard library classes are compatible with it?
I believe this has already been answered by other users before me, so I only add it for the sake of completeness: the with statement simplifies exception handling by encapsulating common preparation and cleanup tasks in so-called context managers. More details can be found in PEP 343. For instance, the open statement is a context manager in itself, which lets you open a file, keep it open as long as the execution is in the context of the with statement where you used it, and close it as soon as you leave the context, no matter whether you have left it because of an exception or during regular control flow. The with statement can thus be used in ways similar to the RAII pattern in C++: some resource is acquired by the with statement and released when you leave the with context.
Some examples are: opening files using with open(filename) as fp:, acquiring locks using with lock: (where lock is an instance of threading.Lock). You can also construct your own context managers using the contextmanager decorator from contextlib. For instance, I often use this when I have to change the current directory temporarily and then return to where I was:
from contextlib import contextmanager
import os
#contextmanager
def working_directory(path):
current_dir = os.getcwd()
os.chdir(path)
try:
yield
finally:
os.chdir(current_dir)
with working_directory("data/stuff"):
# do something within data/stuff
# here I am back again in the original working directory
Here's another example that temporarily redirects sys.stdin, sys.stdout and sys.stderr to some other file handle and restores them later:
from contextlib import contextmanager
import sys
#contextmanager
def redirected(**kwds):
stream_names = ["stdin", "stdout", "stderr"]
old_streams = {}
try:
for sname in stream_names:
stream = kwds.get(sname, None)
if stream is not None and stream != getattr(sys, sname):
old_streams[sname] = getattr(sys, sname)
setattr(sys, sname, stream)
yield
finally:
for sname, stream in old_streams.iteritems():
setattr(sys, sname, stream)
with redirected(stdout=open("/tmp/log.txt", "w")):
# these print statements will go to /tmp/log.txt
print "Test entry 1"
print "Test entry 2"
# back to the normal stdout
print "Back to normal stdout again"
And finally, another example that creates a temporary folder and cleans it up when leaving the context:
from tempfile import mkdtemp
from shutil import rmtree
#contextmanager
def temporary_dir(*args, **kwds):
name = mkdtemp(*args, **kwds)
try:
yield name
finally:
shutil.rmtree(name)
with temporary_dir() as dirname:
# do whatever you want
I would suggest two interesting lectures:
PEP 343 The "with" Statement
Effbot Understanding Python's
"with" statement
1.
The with statement is used to wrap the execution of a block with methods defined by a context manager. This allows common try...except...finally usage patterns to be encapsulated for convenient reuse.
2.
You could do something like:
with open("foo.txt") as foo_file:
data = foo_file.read()
OR
from contextlib import nested
with nested(A(), B(), C()) as (X, Y, Z):
do_something()
OR (Python 3.1)
with open('data') as input_file, open('result', 'w') as output_file:
for line in input_file:
output_file.write(parse(line))
OR
lock = threading.Lock()
with lock:
# Critical section of code
3.
I don't see any Antipattern here.
Quoting Dive into Python:
try..finally is good. with is better.
4.
I guess it's related to programmers's habit to use try..catch..finally statement from other languages.
The Python with statement is built-in language support of the Resource Acquisition Is Initialization idiom commonly used in C++. It is intended to allow safe acquisition and release of operating system resources.
The with statement creates resources within a scope/block. You write your code using the resources within the block. When the block exits the resources are cleanly released regardless of the outcome of the code in the block (that is whether the block exits normally or because of an exception).
Many resources in the Python library that obey the protocol required by the with statement and so can used with it out-of-the-box. However anyone can make resources that can be used in a with statement by implementing the well documented protocol: PEP 0343
Use it whenever you acquire resources in your application that must be explicitly relinquished such as files, network connections, locks and the like.
Again for completeness I'll add my most useful use-case for with statements.
I do a lot of scientific computing and for some activities I need the Decimal library for arbitrary precision calculations. Some part of my code I need high precision and for most other parts I need less precision.
I set my default precision to a low number and then use with to get a more precise answer for some sections:
from decimal import localcontext
with localcontext() as ctx:
ctx.prec = 42 # Perform a high precision calculation
s = calculate_something()
s = +s # Round the final result back to the default precision
I use this a lot with the Hypergeometric Test which requires the division of large numbers resulting form factorials. When you do genomic scale calculations you have to be careful of round-off and overflow errors.
An example of an antipattern might be to use the with inside a loop when it would be more efficient to have the with outside the loop
for example
for row in lines:
with open("outfile","a") as f:
f.write(row)
vs
with open("outfile","a") as f:
for row in lines:
f.write(row)
The first way is opening and closing the file for each row which may cause performance problems compared to the second way with opens and closes the file just once.
See PEP 343 - The 'with' statement, there is an example section at the end.
... new statement "with" to the Python
language to make
it possible to factor out standard uses of try/finally statements.
points 1, 2, and 3 being reasonably well covered:
4: it is relatively new, only available in python2.6+ (or python2.5 using from __future__ import with_statement)
The with statement works with so-called context managers:
http://docs.python.org/release/2.5.2/lib/typecontextmanager.html
The idea is to simplify exception handling by doing the necessary cleanup after leaving the 'with' block. Some of the python built-ins already work as context managers.
Another example for out-of-the-box support, and one that might be a bit baffling at first when you are used to the way built-in open() behaves, are connection objects of popular database modules such as:
sqlite3
psycopg2
cx_oracle
The connection objects are context managers and as such can be used out-of-the-box in a with-statement, however when using the above note that:
When the with-block is finished, either with an exception or without, the connection is not closed. In case the with-block finishes with an exception, the transaction is rolled back, otherwise the transaction is commited.
This means that the programmer has to take care to close the connection themselves, but allows to acquire a connection, and use it in multiple with-statements, as shown in the psycopg2 docs:
conn = psycopg2.connect(DSN)
with conn:
with conn.cursor() as curs:
curs.execute(SQL1)
with conn:
with conn.cursor() as curs:
curs.execute(SQL2)
conn.close()
In the example above, you'll note that the cursor objects of psycopg2 also are context managers. From the relevant documentation on the behavior:
When a cursor exits the with-block it is closed, releasing any resource eventually associated with it. The state of the transaction is not affected.
In python generally “with” statement is used to open a file, process the data present in the file, and also to close the file without calling a close() method. “with” statement makes the exception handling simpler by providing cleanup activities.
General form of with:
with open(“file name”, “mode”) as file_var:
processing statements
note: no need to close the file by calling close() upon file_var.close()
The answers here are great, but just to add a simple one that helped me:
with open("foo.txt") as file:
data = file.read()
open returns a file
Since 2.6 python added the methods __enter__ and __exit__ to file.
with is like a for loop that calls __enter__, runs the loop once and then calls __exit__
with works with any instance that has __enter__ and __exit__
a file is locked and not re-usable by other processes until it's closed, __exit__ closes it.
source: http://web.archive.org/web/20180310054708/http://effbot.org/zone/python-with-statement.htm
The problem I have is that in my IronPython application threads are being created but never cleaned up, even when the method they run has exited. In my application I start threads in two ways: a) by using Python-style threads (sub-classes of threading.Thread that do something in their run() method), and b) by using the .NET 'ThreadStart' approach. The Python-style threads behave as expected, and after their 'run()' exits they get cleaned up. The .NET style threads never get cleaned up, even after they have exited. You can call del, Abort, whatever you want, and it has no effect on them.
The following IronPython script demonstrates the issue:
import System
import threading
import time
import logging
def do_beeps():
logging.debug("starting do_beeps")
t_start = time.clock()
while time.clock() - t_start < 10:
System.Console.Beep()
System.Threading.Thread.CurrentThread.Join(1000)
logging.debug("exiting do_beeps")
class PythonStyleThread(threading.Thread):
def __init__(self, thread_name="PythonStyleThread"):
super(PythonStyleThread, self).__init__(name=thread_name)
def run(self):
do_beeps()
class ThreadStarter():
def start(self):
t = System.Threading.Thread(System.Threading.ThreadStart(do_beeps))
t.IsBackground = True
t.Name = "ThreadStartStyleThread"
t.Start()
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', level=logging.DEBUG, datefmt='%H:%M:%S')
# Start some ThreadStarter threads:
for _ in range(5):
ts = ThreadStarter()
ts.start()
System.Threading.Thread.CurrentThread.Join(200)
# Start some Python-style threads:
for _ in range(5):
pt = PythonStyleThread()
pt.start()
System.Threading.Thread.CurrentThread.Join(200)
# Do something on the main thread:
for _ in range(30):
print(".")
System.Threading.Thread.CurrentThread.Join(1000)
When this is debugged in PyDev, what I see is that all the threads appear as expected in the 'debug' view as they are created:
but whereas the Python-style ones disappear after they've finished, the .NET / ThreadStart ones stay until the main thread exits.
As can be seen in the image, in the debugger the problematic threads appear with names 'Dummy-4', 'Dummy-5' etc, whereas the Pythonic ones appear with the name I've given them ('PythonStyleThread'). Looking in the threading.py file in my IronPython installation I see there is a class called "_DummyThread", a subclass of Thread, that sets its 'name' as 'name=_newname("Dummy-%d")', so it looks like by using ThreadStart I'm ending up with _DummyThreads. The comment for the class also says:
# Dummy thread class to represent threads not started here.
# These aren't garbage collected when they die, nor can they be waited for.
which would explain why I can't get rid of them.
But I don't want 'DummyThread's. I just want normal ones, that behave nicely and get garbage-collected when they've finished doing their thing.
Now, a slightly odd thing about all of this is that unless I set up the logger I don't see the DummyThread entries in the debugger at all (although they still run). This may be a funny of the PyDev debugger, or it may relevant. Is there any sane reason why logging should have any bearing on this? Can I solve my problem just by not logging in my thread?
Here, it says:
"There is the possibility that "dummy thread objects" are created. These are thread objects corresponding to "alien threads", which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be joined. They are never deleted, since it is impossible to detect the termination of alien threads."
Which makes me wonder why I've had the misfortune of ending up with them?
While I have a workaround in that I can use Python-style threading.Thread subclasses everywhere I currently use .NET 'ThreadStart' threads, I am not keen to do this as the reason I was using .NET style threads in certain places was because they give me an Abort method (whereas the Python ones don't). I know Aborting threads is a Bad Thing, but the application is a unit-testing framework, and I a) need to run unit-tests in a thread, and b) have no control over their contents (they are written by third-parties), so I have no means of periodically checking for a 'please shut me down nicely' flag on these threads, and in extremis may need to kill them rudely.
So a) why am I getting DummyThreads, b) has this got anything to do with logging and c) what can I do about it?
Thanks.