How can i use multiprocessing in class - python-3.x

I'm making program for study python.
it is gui web crawler.
i success multi work gui and main classes using QThread
but i have a problem.
in main class, get picture addresses using webdriver first, and make a list called data.
after that use pool() and map() to start download pictures using download_image method in Main class.
i searched and tried many things.
imap and lambda etc.
here is my code
(i import multiprocess as mul)
(and my python version is 3.7)
# crawler and downloader class
class Main(QThread, QObject):
def __init__(self, path, brand, model, grade):
QThread.__init__(self)
self.path = path
# this is download method
def download_image(self, var):
a = var.split("/")
filename = a[-1]
download_path = self.path + filename
urllib.request.urlretreieve(var, download_path)
# this is start method when button clicked in Gui
def core(self):
#sample url data list
data = ['url.com', 'url2.com', 'url3.com', ...]
download_p = Pool(mul.cpu_count())
download_p.map(self.download_image, data)
download_p.close()
download_p.join()
print("end")
def run(self):
self.core()
class Gui(QMainWindow, design.Ui_MainWindow):
def __init__(self):
(and gui code here)
if i set download_p.map(self.download_image, data)
i get this error -> [ TypeError: can't pickle Main objects ]
if i set download_p.map(self.download_image, self.data)
(and also set self.data = [urls...])
i get same TypeError
if i set download_p.map(self.download_image, self, data)
i get this error -> [TypeError : 'Main' object is not iterable
I'm not good at English and Python too
but i want to resolve this problem so i decide ask here
really thanks for looking this newbie's question...

Related

How to create the attribute of a class object instance on multiprocessing in python?

I am trying to create attributes of an instance in parallel to learn more about multiprocessing.
My objective is to avoid creating the attributes in a sequential way, assuming that they are independent of each other. I read that multiprocessing creates its own space and that is possible to establish a connection between the process.
I think that this connection can help me to share the same object among the workers, but I did not find any post that could show a way to implement this. If I try to create the attributes in parallel I'm not able to access them on the main when the process concludes. Can someone help me with that? What do I need to do?
Below I provide a MRE about what I'm trying to get by using the MPIRE package. Hope that it can illustrate my question.
from mpire import WorkerPool
import os
class B:
def __init__(self):
pass
class A:
def __init__(self):
self.model = B()
def do_something(self, var):
if var == 'var1':
self.model.var1 = var
elif var == 'var2':
self.model.var2 = var
else:
print('other var.')
def do_something2(self, model, var):
if var == 'var1':
model.var1 = var
print(f"Worker {os.getpid()} is processing do_something2({var})")
elif var == 'var2':
model.var2 = var
print(f"Worker {os.getpid()} is processing do_something2({var})")
else:
print(f"Worker {os.getpid()} is processing do_something2({var})")
def init_var(self):
self.do_something('var1')
self.do_something('test')
print(self.model.var1)
print(vars(self.model).keys())
# Trying to create the attributes in parallel
print('')
self.model = B()
self.__sets_list = ['var1', 'var2', 'var3']
with WorkerPool(n_jobs=3, start_method='fork') as pool:
model = self.model
pool.set_shared_objects(model)
pool.map(self.do_something2,self.__sets_list)
print(self.model.var1)
print(vars(self.model).keys())
def main(): # this main will be in another file that call different classes
obj = A()
obj.init_var()
if __name__ == '__main__':
main = main()
It generates the following output:
python src/test_change_object.py
other var.
var1
dict_keys(['var1'])
Worker 20040 is processing do_something2(var1)
Worker 20041 is processing do_something2(var2)
Worker 20042 is processing do_something2(var3)
Traceback (most recent call last):
File "/mnt/c/git/bioactives/src/test_change_object.py", line 59, in
main = main()
File "/mnt/c/git/bioactives/src/test_change_object.py", line 55, in main
obj.init_var()
File "/mnt/c/git/bioactives/src/test_change_object.py", line 49, in init_var
print(self.model.var1)
AttributeError: 'B' object has no attribute 'var1'
I appreciate any help. Tkx
Would a solution without using mpire work? You could achieve what you are after, i.e. sharing state of complex objects, by using multiprocessing primitives.
TL;DR
This code works:
import os
from multiprocessing import Pool
from multiprocessing.managers import BaseManager, NamespaceProxy
import types
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
def wrapper(*args, **kwargs):
return self._callmethod(name, args, kwargs)
return wrapper
return result
class B:
def __init__(self):
pass
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
class A:
def __init__(self):
self.model = B.create()
def do_something(self, var):
if var == 'var1':
self.model.var1 = var
elif var == 'var2':
self.model.var2 = var
else:
print('other var.')
def do_something2(self, model, var):
if var == 'var1':
model.var1 = var
print(f"Worker {os.getpid()} is processing do_something2({var})")
elif var == 'var2':
model.var2 = var
print(f"Worker {os.getpid()} is processing do_something2({var})")
else:
print(f"Worker {os.getpid()} is processing do_something2({var})")
def init_var(self):
self.do_something('var1')
self.do_something('test')
print(self.model.var1)
print(vars(self.model).keys())
# Trying to create the attributes in parallel
print('')
self.model = B.create()
self.__sets_list = [(self.model, 'var1'), (self.model, 'var2'), (self.model, 'var3')]
with Pool(3) as pool:
# model = self.model
# pool.set_shared_objects(model)
pool.starmap(self.do_something2, self.__sets_list)
print(self.model.var1)
print(vars(self.model).keys())
def main(): # this main will be in another file that call different classes
obj = A()
obj.init_var()
if __name__ == '__main__':
main = main()
Longer, detailed explanation
Here is what I think is happening. Even though you are setting self.model as a shared object among your workers, the fact that you alter it within the workers force a copy being made (i.e, the shared objects are not writable). From the documentation for shared objects in mpire:
For the start method fork these shared objects are treated as copy-on-write, which means they are only copied once changes are made to them. Otherwise they share the same memory address
Therefore, it suggests that shared objects with method fork is only useful for cases where you would only be reading from the objects. The documentation also provides such a use case
This is convenient if you want to let workers access a large dataset that wouldn’t fit in memory when copied multiple times.
Take this with a grain of salt though, since again, I have not used mpire. Hopefully someone with more experience with the library can provide further clarifications.
Anyway, moving on, you can achieve this using multiprocessing managers. Managers allow you to share complex objects (an object of class B in this context) between processes and workers. You can use them to also share nested dictionaries, lists, etc. They do this by spawning a server process, where the shared object is actually stored, and allow other processes to access the object through proxies (more on this later), and by pickling/unpickling any arguments and return values passed to and from the server process. As a sidenote, using pickling/unpickling also leads to restrictive structures. For example, in our context, it would mean that any function arguments and instance variables you make for class B should be picklable.
Coming back, I mentioned that we can access the server process through proxies. Proxies are basically just wrappers which mimic the properties and functions of the original object. Most utilize specific dunder methods like __setattr__ and __getattr__, an example given below (from here):
class Proxy(object):
def __init__(self, original):
self.original = original
def __getattr__(self, attr):
return getattr(self.original, attr)
class MyObj(object):
def bar(self):
print 'bar'
obj = MyObj()
proxy = Proxy(obj)
proxy.bar() # 'bar'
obj.bar() # 'bar'
A huge plus of using proxies is that they are picklable, which is important when dealing with shared objects. Under the hood, manager creates a proxy for you whenever you create a shared object through it. However, this default proxy (called AutoProxy) does not share the namespace of the object. This will not work for us since we are using the class B's namespace and want that to be shared as well. Therefore, we create our own proxy by inheriting another, undocumented proxy provided by multiprocessing: NamespaceProxy. As the name suggests, this one does share the namespace, but conversely, does not share any instance methods. This is why we need to create our own proxy which is the best of both worlds:
from multiprocessing.managers import NamespaceProxy
import types
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
def wrapper(*args, **kwargs):
return self._callmethod(name, args, kwargs)
return wrapper
return result
More info on why this works. Keep in mind that these proxies do not share private or protected attributes/functions (check this question).
After we have achieved this, the rest is just some boilerplate-ish code which uses this proxy by default to create shareable complex objects for particular datatypes. In our context this means that code for class B will become this:
from multiprocessing import Manager, Queue, Pool
from multiprocessing.managers import BaseManager
class B:
def __init__(self):
pass
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
In the above code, the create function is a general class constructor which automatically uses our new proxy and managers to share the object. It can be used for any class, not only B, to do so. The only thing now left is to change usage of mpire pool to multiprocessing pool in init_var. Note how we use B.create() instead of simply using B() to create objects of class B!:
def init_var(self):
self.do_something('var1')
self.do_something('test')
print(self.model.var1)
print(vars(self.model).keys())
# Trying to create the attributes in parallel
print('')
self.model = B.create()
self.__sets_list = [(self.model, 'var1'), (self.model, 'var2'), (self.model, 'var3')]
with Pool(3) as pool:
# model = self.model
# pool.set_shared_objects(model)
pool.starmap(self.do_something2, self.__sets_list)
print(self.model.var1)
print(vars(self.model).keys())
Note : I have only tested this on Windows multiprocessing which does not use "fork" method but rather "spawn" method to start a process. More information here

PySide6 QThread still freezing main GUI

I am currently trying to implement some threading functionality in my PySide6 GUI application. I followed a tutorial to try to get started (link is here), and I cannot seem to get it to work. Although that tutorial uses PyQt not PySide, the classes and structure is still similar, and it does seem to launch on another thread. Still though, it freezes the main GUI, which is not desired when this actually faces users.
Here is a sample of my code:
class Worker(QObject):
finished = Signal(str)
progress = Signal(int)
def run(self, file):
"""Long-running task." that calls a separate class for computation""
b = SeparateClass()
b.doComputation()
self.finished.emit()
class DataPlotting(QMainWindow):
def __init__(self):
self.thread = QThread()
self.worker = Worker()
self.report_builder = QPushButton('Call class that threads')
self.report_builder.setEnabled(False)
self.report_builder.clicked.connect(self.qthread_test)
def qthread_test(self):
file = 'some_file.txt'
self.worker.moveToThread(self.thread)
self.thread.started.connect(self.worker.run(file))
self.worker.finished.connect(self.thread.quit)
self.worker.finished.connect(self.worker.deleteLater)
self.thread.finished.connect(self.thread.deleteLater)
self.thread.start()
return
This does accomplish the work that is in the Worker class and spit out the desired results, but it freezes the GUI. I am not really sure what I am doing wrong, as this approach is what has been suggested to prevent freezing GUIs for heavy computation.
Is there something that I am straight up missing? Or am I going about this the wrong way? Any help or guidance is appreciated
I am assuming that you make the appropriate calls to the super class during __init__ for your subclasses of QMainWindow and the QObject.
When your code executes self.thread.started.connect(self.worker.run(file)) that line it runs the function self.worker.run(file) immediately and assigns the result of that function, which is None, as the connected slot to the thread.started signal. Instead of passing the file path as a parameter you can assign it to the worker instance and have the run method grab the path from self during execution.
For example you can try something like this:
class Worker(QObject):
finished = Signal(str)
progress = Signal(int)
def run(self):
"""Long-running task." that calls a separate class for computation"""
file = self.some_file
b = SeparateClass()
b.doComputation()
self.finished.emit()
class DataPlotting(QMainWindow):
def __init__(self):
self.report_builder = QPushButton('Call class that threads')
self.report_builder.setEnabled(False)
self.report_builder.clicked.connect(self.qthread_test)
self.threads = []
def qthread_test(self):
worker = Worker()
thread = QThread()
worker.some_file = 'some_file.txt'
worker.moveToThread(thread)
thread.started.connect(worker.run)
worker.finished.connect(thread.quit)
worker.finished.connect(worker.deleteLater)
thread.finished.connect(thread.deleteLater)
thread.start()
self.threads.append(thread)
return

Unable to access class attribute in another function

import rospy
from sensor_msgs.msg import Imu
class ImuData:
def __init__(self):
#self.data = None
pass
def get_observation(self):
rospy.Subscriber('/imu', Imu, self.imu_callback)
imuData = self.data
print(imuData)
def imu_callback(self, msg):
self.data = msg.orientation
print(self.data)
if __name__ == '__main__':
rospy.init_node('gett_imu', anonymous= True)
idd = ImuData()
idd.get_observation()
In the above code, I would like to access self.data defined in imu_callback from get_observation function. The problem is I get error saying that ImuData has no attribute data.
How do I solve this issue?
Note: I feel that the question has to do with the python classes and not with Ros and rospy.
A couple of things are going on here. One, that was mentioned in the comment, is that you should be initializing your attributes inside __init__. The error your seeing is partially because of Python and the fact that self.data has not actually been initialized yet.
The second issue you have is where you setup the subscriber. This should also be done in __init__ and only once. Sensors will be publishing at a fairly constant rate, thus it takes time to actually receive any data on the topic. Also if you plan to call get_observation more than once you would create a new subscription, which you do not want.
Take the following code as a fixed example:
def __init__(self):
rospy.Subscriber('/imu', Imu, self.imu_callback)
self.data = None
def get_observation(self):
imuData = self.data
print(imuData)

How to initialize python watchdog pattern matching event handler

I'm using the Python Watchdog to monitor a directory for new files being created. Several different types of files are created in said directory but I only need to monitor a single file type, hence I use the Watchdog PatternMatchingEventHandler, where I specify the pattern to monitor using the patterns keyword.
To correctly execute the code under the hood (not displayed here) I need to initialize an empty dataframe in my event-handler, and I am having trouble getting this to work. If I remove the __init__ in the code below, everything works just fine btw.
I used the code in this answer as inspiration for my own.
The code I have set up looks as follows:
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler
import time
import pandas as pd
import numpy as np
from multiprocessing import Pool
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self):
#Initializing an empty dataframe for storage purposes.
data_54 = pd.DataFrame(columns = ['Barcode','DUT','Step12','Step11','Np1','Np2','TimestampEQ54'])
#Converting to INT for later purposes
data_54[['Barcode','DUT']]=data_54[['Barcode','DUT']].astype(np.int64)
self.data = data_54
def on_created(self,event):
if event.is_directory:
return True
elif event.event_type == 'created':
#Take action here when a file is created.
print('Found new files:')
print(event.src_path)
time.sleep(0.1)
#Creating process pool to return data
pool1 = Pool(processes=4)
#Pass file to parsing function and return parsed result.
result_54 = pool1.starmap(parse_eq54,[(event.src_path,self.data)])
#returns the dataframe rather than the list of dataframes returned by starmap
self.data = result_54[0]
print('Data read: ')
print(self.data)
def monitorEquipment(equipment):
'''Uses the Watchdog package to monitor the data directory for new files.
See the HandlerEQ54 and HandlerEQ51 classes in multiprocessing_handlers for actual monitoring code. Monitors each equipment.'''
print('equipment')
if equipment.upper() == 'EQ54':
event_handler = HandlerEQ54(patterns=["*.log"])
filepath = '/path/to/first/file/source/'
# set up observer
observer = Observer()
observer.schedule(event_handler, path=filepath, recursive=True)
observer.daemon=True
observer.start()
print('Observer started')
# monitor
try:
while True:
time.sleep(5)
except KeyboardInterrupt:
observer.unschedule_all()
observer.stop()
observer.join()
However, when I execute monitorEquipment I receive the following error message:
TypeError: __init__() got an unexpected keyword argument 'patterns'
Evidently I'm doing something wrong when I'm initializing my handler class, but I'm drawing a blank as to what that is (which probably reflects my less-than-optimal understanding of classes). Can someone advice me on how to correctly initialize the empty dataframe in my HandlerEQ54 class, to not get the error I do?
Looks like you are missing the patterns argument from your __init__ method, you'll also need a super() call to the __init__ method of the parent class (PatternMatchingEventHandler), so you can pass the patterns argument upwards.
it should look something like this:
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self, patterns=None):
super(HandlerEQ54, self).__init__(patterns=patterns)
...
event_handler = HandlerEQ54(patterns=["*.log"])
or, for a more generic case and to support all of PatternMatchingEventHandler's arguments:
class HandlerEQ54(PatternMatchingEventHandler):
def __init__(self, *args, **kwargs):
super(HandlerEQ54, self).__init__(*args, **kwargs)
...
event_handler = HandlerEQ54(patterns=["*.log"])

Python - Mock class init that instantiates another class inside

I have the following python file board.py:
def __init__(self, language):
self.foo = Foo(language)
self.words = Aux(self.foo)
And I'm creating this test_file:
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.board = Board('pt')
def test_total_time(self):
self.board.total_time(True)
#some assert
But I'm getting a FileNotFoundError because Aux.___init____() calls a self.foo.method() that opens a file and reads from it.
Is there a way to mock self.foo.method(), or the class Aux?
You will want to patch the module. If you give me the name of the test file and the class you are testing. I can finish this answer for you.
In the test file:
import unittest
def BoardTestCase(unittest.TestCase):
#classmethod
def setUpClass(cls):
super().setUpClass()
cls.aux_mock = unittest.mock.patch('file_undertest.Aux')
cls.board = Board('pt')
def test_total_time(self):
self.board.total_time(True)
#some assert
I would suggest using pytest instead of the standard library unittest. Your tests will be written as functions meaning you can reuse the Board class only when needed to. You can set more robust fixtures (Board class test cases) and the mocker extension is more intuitive if you spend the 15 minutes to wrap your head around it.

Resources