I have run into a strange behaviour at multiprocessing.
When i try to use a global variable in a function which is called from multiprocessing it does not see a global variable.
Example:
import multiprocessing
def func(useless_variable):
print(variable)
useless_list = [1,2,3,4,5,6]
p = multiprocessing.Pool(processes=multiprocessing.cpu_count())
variable = "asd"
func(useless_list)
for x in p.imap_unordered(func, useless_list):
pass
Output:
asd
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "pywork/asd.py", line 4, in func
print(variable)
NameError: name 'variable' is not defined
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "pywork/asd.py", line 11, in <module>
for x in p.imap_unordered(func, useless_list):
File "/usr/lib/python3.4/multiprocessing/pool.py", line 689, in next
raise value
NameError: name 'variable' is not defined
As you see the first time i just simply call func it print asd as expected. However when i call the very same function with multiprocessing it says the variable variable does not exists, even after i clearly printed it just before.
Does multiprocessing ignore global variables? How can i work this around?
multiprocessing Pools fork (or spawn in a way intended to mimic forking on Windows) its worker processes at the moment the Pool is created. forking maps the parent memory as copy-on-write in the children, but it doesn't create a persistent tie between them; after the fork, changes made in the parent are not visible in the children, and vice versa. You can't use any variables defined after the Pool was created, and changes made to variables from before the Pool was created will not be reflected in the workers.
Typically, with a Pool, you want to avoid mutable global state entirely; have all the data needed passed to the function you're imap-ing (or whatever) as arguments (which are serialized and sent to the children, so the state is correct), and have the function return any new data instead of mutating globals, which serializes it and sends it back to the parent process to use as it sees fit.
Managerss are an option, but not usually the correct option with Pools; you usually want to stick to workers only looking at read only globals from before the Pool was created, or working with arguments and returning new values, not using global state at all.
When you spam a process all context is copyed, you need to get use of managers for exachanging objects between them, check the official documentations, for managing state check this.
Related
I am working on a space invaders type game and I have it working in a manner such that it takes one bullet hit to kill an invader. What I am trying to do now is to change it so that it takes three bullet hits to kill an invader.
My current code is located here: https://github.com/clipovich/alien_invasion.git
If you run the code as posted, the game works fine to kill an alien using a single bullet hit.
Beginning with a working single bullet to kill game, here is what I have changed: (note: you can reproduce my changes by simply uncommenting lines 66 and 67 in the alien_invasion.py file.
# if self.aliens.settings.alien_hit_limit >= 0:
# self.aliens.remove()
In the settings.py file, I have created
self.alien_hit_limit = 3
In the alien_invasion.py file, I have added:
if self.aliens.settings.alien_hit_limit >= 0:
self.aliens.remove()
to
def _check_bullet_alien_collisions(self):
and now I receive the following error message:
~/py/alien_invasion$ python3.8 alien_invasion.py pygame 1.9.6 Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last): File "alien_invasion.py", line
210, in
ai.run_game() File "alien_invasion.py", line 42, in run_game
self._update_bullets() File "alien_invasion.py", line 57, in _update_bullets
self._check_bullet_alien_collisions() File "alien_invasion.py", line 66, in _check_bullet_alien_collisions
if self.aliens.settings.alien_hit_limit >= 0: AttributeError: 'Group' object has no attribute 'settings'
I think my problem may be that I am trying to perform actions on individual aliens but using a group method. I am not sure though.
When fixed, the end result would be that an alien disappears after being hit three times, not once.
What do I need to do to correct the code?
You are accessing settings from "Group" submodule from pygame module. But you are attempting to access a attribute from "Settings" class from the settings.py file. So if you want to access "alien_hit_limit" attribute from "settings" object you should use "if self.settings.alien_hit_limit >= 0:" instead of "if self.aliens.settings.alien_hit_limit >= 0:" in line 66, to resolve the error you specified.
First of all, it's not useful for having alien_hit_limit as a global attribute. Rather you should assign hit_point for each alien in the alien.py file:
Next you should change the _check_bullet_alien_collisions() method:
The pygame.sprite.groupcollide method return the list of the collided objects from the group you specified first in the function. (self.aliens). So it will return the list of aliens that have been hit by the bullet (in this case only one alien is returned hit by a bullet). So you reduce it's hitpoint when it's it and check if hitpoints have reduced to zero; in that case remove that alien from the alien list. By the way third parameter of the pygame.sprite.groupcollide method specify whether the object from the first group(in this case self.aliens) should be destroyed and the fourth parameter specify whether the object from the second group(in this case self.bullets) should be destroyed. So we do not destroy the alien right way so set the third parameter false but we want the bullet to be destroyed; so we set the fourth parameter to true.
So basically, I am learning Python (therefore I am new, so be gentle lol). I work in IT and wanted to make a program that has all the basic stuff that I do everyday.
The main program opens up and shows a few options for tools and such. I wanted to add a To Do List to the options.
When my To Do list is called the GUI will appear, however, whenever the buttons are clicked, I get the NameError. I assume the main program just doesn't understand the defined functions that I have assigned to the buttons on the To Do list.
I am curious as to why. Of course I would love a solution, however, I am genuinely curious and interested as to why the interpreter doesn't see or "understand" the defined functions.
I called the To Do List using
toDoBtn = tk.Button(self, text = "To Do List",
command=lambda: exec(open("ToDo.py").read()))
The error I get is
Traceback (most recent call last):
File "C:\Users\brannon.harper\AppData\Local\Programs\Python\Python37\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "<string>", line 49, in insertTask
NameError: name 'inputError' is not defined
I will add the beggining part of ToDo.py, however, I feel as if the issue is how I am calling the script, not how the script is written.
from tkinter import *
from tkinter import messagebox
tasks_list = []
counter = 1
# Function for checking input error when
# empty input is given in task field
def inputError() :
if enterTaskField.get() == "" :
messagebox.showerror("Input Error")
return 0
return 1
The ToDo.py script ends with
if __name__ == "__main__" :
# create a GUI window
gui = Tk()
#### I just design the GUI here.
#### After it is designed It ends with
gui.mainloop()
Thanks so much for your time, and this is my first post, so if I did something wrong or didn't follow the "standard entry" please correct me and let me know for the future!
The function inputError isn't defined because exec can't perform any operation that would bind local variables such as importing, assigning variables, or function/class definitions etc.
This is explained more in this SO post, or in the documentation here
I have a python object
<GlobalParams.GlobalParams object at 0x7f8efe809080>
which contains various numpy arrays, parameter values etc. which I am using in various functions calling as for example:
myParams = GlobalParams(input_script) #reads in various parameters from an input script and assigns these to myParams
myParams.data #calls the data array from myParams
I am trying to parallelise my code and would like to broadcast the myParams object so that it is available to the other child processes. I have done this previously for individual numpy arrays, values etc. in the form:
points = comm.bcast(points, root = 0)
However, I don't want to have to do this individually for all the contents of myParams. I would like to broadcast the object in its entirety so that it can be accessed on other cores. I have tried the obvious:
myParams = comm.bcast(myParams, root=0)
but this returns the error:
myParams = comm.bcast(myParams, root=0)
File "MPI/Comm.pyx", line 1276, in mpi4py.MPI.Comm.bcast (src/mpi4py.MPI.c:108819)
File "MPI/msgpickle.pxi", line 612, in mpi4py.MPI.PyMPI_bcast (src/mpi4py.MPI.c:47005)
File "MPI/msgpickle.pxi", line 112, in mpi4py.MPI.Pickle.dump (src/mpi4py.MPI.c:40704)
TypeError: cannot serialize '_io.TextIOWrapper' object
What is the appropriate way to share this object with the other cores? Presumably this is a common requirement in python, but I can't find any documentation on this. Most examples look at broadcasting a single variable/array.
This doesn't look like an MPI problem; it looks like a problem with object serialisation for broadcast, which internally is using the Pickle module.
Specifically in this case, it can't serialise a _io.TextIOWrapper - so I suggest hunting down where in your class this is used.
Once you work out which field(s) can't be serialised, you can remove them, broadcast, then reassemble them on each individual rank, using some method that you need to design yourself (recreateUnpicklableThing() in the example below). You could do that by adding these methods to your class for Pickle to call before and after broadcast:
def __getstate__(self):
members = self.__dict__.copy()
# remove things that can't be pickled, using its name
del members['someUnpicklableThing']
return members
def __setstate__(self, members):
self.__dict__.update(members)
# On unpickle, manually recreate the things that you couldn't pickle
# (this method recreates self.someUnpickleableThing using some metadata
# carefully chosen by you that Pickle can serialise).
self.recreateUnpicklableThing(self.dataForSettingUpSometing)
See here for more on how these methods work
https://docs.python.org/2/library/pickle.html
I have some float numbers to be stored in a big (500K x 500K) matrix. I am storing them in chunks, by using arrays of variable sizes (in accordance to some specific conditions).
I have a parallellised code (Python3.3 and h5py) which produces the arrays and put them in a shared queue, and one dedicated process that pops from the queue and writes them one-by-one in the HDF5 matrix. It works as expected approximately 90% of the time.
Occasionally, I got writing errors for specific arrays. If I run it multiple times, the faulty arrays change all the times.
Here's the code:
def writer(in_q):
# Open HDF5 archive
hdf5_file = h5py.File("./google_matrix_test.hdf5")
hdf5_scores = hdf5_file['scores']
while True:
# Get some data
try:
data = in_q.get(timeout=5)
except:
hdf5_file.flush()
print('HDF5 archive updated.')
break
# Process the data
try:
hdf5_scores[data[0], data[1]:data[2]+1] = numpy.matrix(data[3:])
except:
# Print faulty chunk's info
print('E: ' + str(data[0:3]))
in_q.put(data) # <- doesn't solve
in_q.task_done()
def compute():
jobs_queue = JoinableQueue()
scores_queue = JoinableQueue()
processes = []
processes.append(Process(target=producer, args=(jobs_queue, data,)))
processes.append(Process(target=writer, args=(scores_queue,)))
for i in range(10):
processes.append(Process(target=consumer, args=(jobs_queue,scores_queue,)))
for p in processes:
p.start()
processes[1].join()
scores_queue.join()
Here's the error:
Process Process-2:
Traceback (most recent call last):
File "/local/software/python3.3/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/local/software/python3.3/lib/python3.3/multiprocessing/process.py", line 95, in run
self._target(*self._args, **self._kwargs)
File "./compute_scores_multiprocess.py", line 104, in writer
hdf5_scores[data[0], data[1]:data[2]+1] = numpy.matrix(data[3:])
File "/local/software/python3.3/lib/python3.3/site-packages/h5py/_hl/dataset.py", line 551, in __setitem__
self.id.write(mspace, fspace, val, mtype)
File "h5d.pyx", line 217, in h5py.h5d.DatasetID.write (h5py/h5d.c:2925)
File "_proxy.pyx", line 120, in h5py._proxy.dset_rw (h5py/_proxy.c:1491)
File "_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (h5py/_proxy.c:1301)
OSError: can't write data (Dataset: Write failed)
If I insert a pause of two seconds (time.sleep(2)) among writing tasks then the problem seems solved (although I cannot waste 2 seconds per writing since I need to write more than 250.000 times). If I capture the writing exception and put the faulty array in the queue, the script will never stop (presumably).
I am using CentOS (2.6.32-279.11.1.el6.x86_64). Any insight?
Thanks a lot.
When using the multiprocessing module with HDF5, the only big restriction is that you can't have any files open (even read-only) when fork() is called. In other words, if you open a file in the master process to write, and then Python spins off a subprocess for computation, there may be problems. It has to do with how fork() works and the choices HDF5 itself makes about how to handle file descriptors.
My advice is to double-check your application to make sure you're creating any Pools, etc. before opening the master file for writing.
I've got a class that wraps some file handling functionality I need. Another class creates an instance of the filehandler and uses it for an indeterminate amount of time. Eventually, the caller is destroyed, which destroys the only reference to the filehandler.
What is the best way to have the filehandler close the file?
I currently use __del__(self) but after seeing several different questions and articles, I'm under the impression this is considered a bad thing.
class fileHandler:
def __init__(self, dbf):
self.logger = logging.getLogger('fileHandler')
self.thefile = open(dbf, 'rb')
def __del__(self):
self.thefile.close()
That's the relevent bit of the handler. The whole point of the class is to abstract away details of working with the underlying file object, and also to avoid reading the entire file into memory unnecessarily. However, part of handling the underlying file is closing it when the object falls out of scope.
The caller is not supposed to know or care about the details involved in the filehandler. It is the filehandler's job to release any necessary resources involved when it falls out of scope. That's one of the reasons it was abstracted in the first place. So, I seem to be faced with moving the filehandler code into the calling object, or dealing with a leaky abstraction.
Thoughts?
__del__ is not, by itself, a bad thing. You just have to be extra careful to not create reference cycles in objects that have __del__ defined. If you do find yourself needing to create cycles (parent refers to child which refers back to parent) then you will want to use the weakref module.
So, __del__ is okay, just be wary of cylic references.
Garbage collection: The important point here is that when an object goes out of scope, it can be garbage collected, and in fact, it will be garbage collected... but when? There is no guarantee on the when, and different Python implementations have different characteristics in this area. So for managing resources, you are better off being explicit and either adding .close() on your filehandler or, if your usage is compatible, adding __enter__ and __exit__ methods.
The __enter__ and __exit__ methods are described here. One really nice thing about them is that __exit__ is called even when exceptions occur, so you can count or your resources being closed gracefully.
Your code, enhanced for __enter__/__exit__:
class fileHandler:
def __init__(self, dbf):
self.logger = logging.getLogger('fileHandler')
self.thefilename = dbf
def __enter__(self):
self.thefile = open(self.thefilename, 'rb')
return self
def __exit__(self, *args):
self.thefile.close()
Note that the file is being opened in __enter__ instead of __init__ -- this allows you to create the filehandler object once, and then use it whenever you need to in a with without recreating it:
fh = filehandler('some_dbf')
with fh:
#file is now opened
#do some stuff
#file is now closed
#blah blah
#need the file again, so
with fh:
# file is open again, do some stuff with it
#etc, etc
As you've written it the class doesn't make the file close any more reliably. If you simple drop the filehandler instance on the floor then the file won't close until the object is destroyed. This might be immediately or might not be until the object is garbage collected, but just dropping a plain file object on the floor would close it just as quickly. If the only reference to thefile is from inside your class object then when filehandler is garbage collected thefile will be also be garbage collected and therefore closed at the same time.
The correct way to use files is to use the with statement:
with open(dbf, 'rb') as thefile:
do_something_with(thefile)
that will guarantee that thefile is always closed whenever the with clause exits. If you want to wrap your file inside another object you can do that too by defining __enter__ and __exit__ methods:
class FileHandler:
def __init__(self, dbf):
self.logger = logging.getLogger('fileHandler')
self.thefile = open(dbf, 'rb')
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, traceback):
self.thefile.close()
and then you can do:
with FileHandler(dbf) as fh:
do_something_with(fh)
and be sure the file gets closed promptly.