QObject and QThread relations - pyqt4

I have a pyqt4 gui which allows me to import multiple .csv files. I've created a loop that goes through this list of tuples that have the following parameters (filename + location of file, filename, bool,bool, set of dates in file)=tup.
I've created several classes that my gui frequently refers to in order to pull parameters off a projects profile. Let's call this class profile(). I also have another class that has a lot of functions based on formatting, such as datetime, text edits,etc...let's call this classMyFormatting(). Then I created a QThread class that is created for each file in the list, and this one is called Import_File(QThread). And lets say this class takes in a few parameters for the __init__(self,tup).
My ideal goal is to be able to make an independent instance of MyFormatting() and profile() for the Import_File(QThread). I am trying to get my head around on how to utilize the QObject capabilities to solve this..but I keep getting the error that the thread is being destroyed while still running.
for tup in importedlist:
importfile = Import_File(tup)
self.connect(importfile,QtCore.SIGNAL('savedfile(PyQt_PyObject()'),self.printstuffpassed)
importfile.start()
I was thinking of having the two classes be declared as
MyFormatting(QObject):
def __init__(self):
QObject.__init__(self)
def func1(self,stuff):
dostuff
def func2(self):
morestuff
profile(QObject):
def __init__(self):
QObject.__init__(self)
def func11(self,stuff):
dostuff
def func22(self):
morestuff
AND for the QThread:
Import_File(QThread):
def __init__(self,tup):
QThread.__init(self)
common_calc_stuff = self.calc(tup[4])
f = open(tup[0] + '.csv', 'w')
self.tup = tup
# this is where I thought of pulling an instance just for this thread
self.MF = MyFormatting()
self.MF_thread = QtCore.QThread()
self.MF.moveToThread(self.MF_thread)
self.MF_thread.start()
self.prof = profile()
self.prof_thread = QtCore.QThread()
self.prof.moveToThread(self.prof_thread)
self.prof_thread.start()
def func33(self,stuff):
dostuff
self.prof.func11(tup[4])
def func44(self):
morestuff
def run(self):
if self.tup[3] == True:
self.func33
self.MF.func2
elif self.tup[3] ==False:
self.func44
if self.tup[2] == True:
self.prof.func22
self.emit(QtCore.SIGNAL('savedfile()',)
Am I totally thinking of it the wrong way? How can I keep to somewhat of the same structure that I have for the coding and still be able to implement the multithreading and not have the same resource tapped at the same time, which I think is the reason why my qui keeps crashing? Or how can I make sure that each instance of those objects get turned off that they don't interfere with the other instances?

Related

Python multiprocess: run several instances of a class, keep all child processes in memory

First, I'd like to thank the StackOverflow community for the tremendous help it provided me over the years, without me having to ask a single question.
I could not find anything that I can relate to my problem, though it is probably due to my lack of understanding of the subject, rather than the absence of a response on the website. My apologies in advance if this is a duplicate.
I am relatively new to multiprocess; some time ago I succeeded in using multiprocessing.pools in a very simple way, where I didn't need any feedback between the child processes.
Now I am facing a much more complicated problem, and I am just lost in the documentation about multiprocessing. I hence ask for you help, your kindness and your patience.
I am trying to build a parallel tempering monte-carlo algorithm, from a class.
The basic class very roughly goes as follows:
import numpy as np
class monte_carlo:
def __init__(self):
self.x=np.ones((1000,3))
self.E=np.mean(self.x)
self.Elist=[]
def simulation(self,temperature):
self.T=temperature
for i in range(3000):
self.MC_step()
if i%10==0:
self.Elist.append(self.E)
return
def MC_step(self):
x=self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1,1,3))
temp_E=np.mean(self.x)
if np.random.random()<np.exp((self.E-temp_E)/self.T):
self.E=temp_E
self.x=x
return
Obviously, I simplified a great deal (actual class is 500 lines long!), and built fake functions for simplicity: __init__ takes a bunch of parameters as arguments, there are many more lists of measurement else than self.Elist, and also many arrays derived from self.X that I use to compute them. The key point is that each instance of the class contains a lot of informations that I want to keep in memory, and that I don't want to copy over and over again, to avoid dramatic slowing down. Else I would just use the multiprocessing.pool module.
Now, the parallelization I want to do, in pseudo-code:
def proba(dE,pT):
return np.exp(-dE/pT)
Tlist=[1.1,1.2,1.3]
N=len(Tlist)
G=[]
for _ in range(N):
G.append(monte_carlo())
for _ in range(5):
for i in range(N): # this loop should be ran in multiprocess
G[i].simulation(Tlist[i])
for i in range(N//2):
dE=G[i].E-G[i+1].E
pT=G[i].T + G[i+1].T
p=proba(dE,pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = G[i].T
G[i].T = G[i+1].T
G[i+1].T = T_temp
Synthesis: I want to run several instances of my monte-carlo class in parallel child processes, with different values for a parameter T, then periodically pause everything to change the different T's, and run again the child processes/class instances, from where they paused.
Doing this, I want each class-instance/child-process to stay independent from one another, save its current state with all internal variables while it is paused, and do as few copies as possible. This last point is critical, as the arrays inside the class are quite big (some are 1000x1000), and a copy will therefore very quickly become quite time-costly.
Thanks in advance, and sorry if I am not clear...
Edit:
I am using a distant machine with many (64) CPUs, running on Debian GNU/Linux 10 (buster).
Edit2:
I made a mistake in my original post: in the end, the temperatures must be exchanged between the class-instances, and not inside the global Tlist.
Edit3: Charchit answer works perfectly for the test code, on both my personal machine and the distant machine I am usually using for running my codes. I hence check this as the accepted answer.
However, I want to report here that, inserting the actual, more complicated code, instead of the oversimplified monte_carlo class, the distant machine gives me some strange errors:
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gtk-WARNING **: ##:##:##:###: Locale not supported by C library.
Using the fallback 'C' locale.
Unable to init server: Could not connect: Connection refused
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###:
gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(CMC_temper_all.py:55509): Gdk-CRITICAL **: ##:##:##:###: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
The "##:##:##:###" are (or seems like) IP adresses.
Without the call to set_start_method('spawn') this error shows only once, in the very beginning, while when I use this method, it seems to show at every occurrence of result.get()...
The strangest thing is that the code seems otherwise to work fine, does not crash, produces the datafiles I then ask it to, etc...
I think this would deserve to publish a new question, but I put it here nonetheless in case someone has a quick answer.
If not, I will resort to add one by one the variables, methods, etc... that are present in my actual code but not in the test example, to try and find the origin of the bug. My best guess for now is that the memory space required by each child-process with the actual code, is too large for the distant machine to accept it, due to some restrictions implemented by the admin.
What you are looking for is sharing state between processes. As per the documentation, you can either create shared memory, which is restrictive about the data it can store and is not thread-safe, but offers better speed and performance; or you can use server processes through managers. The latter is what we are going to use since you want to share whole objects of user-defined datatypes. Keep in mind that using managers will impact speed of your code depending on the complexity of the arguments that you pass and receive, to and from the managed objects.
Managers, proxies and pickling
As mentioned, managers create server processes to store objects, and allow access to them through proxies. I have answered a question with better details on how they work, and how to create a suitable proxy here. We are going to use the same proxy defined in the linked answer, with some variations. Namely, I have replaced the factory functions inside the __getattr__ to something that can be pickled using pickle. This means that you can run instance methods of managed objects created with this proxy without resorting to using multiprocess. The result is this modified proxy:
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
Solution
Now we only need to make sure that when we are creating objects of monte_carlo, we do so using managers and the above proxy. For that, we create a class constructor called create. All objects for monte_carlo should be created with this function. With that, the final code looks like this:
from multiprocessing import Pool
from multiprocessing.managers import NamespaceProxy, BaseManager
import types
import numpy as np
class A:
def __init__(self, name, method):
self.name = name
self.method = method
def get(self, *args, **kwargs):
return self.method(self.name, args, kwargs)
class ObjProxy(NamespaceProxy):
"""Returns a proxy instance for any user defined data-type. The proxy instance will have the namespace and
functions of the data-type (except private/protected callables/attributes). Furthermore, the proxy will be
pickable and can its state can be shared among different processes. """
def __getattr__(self, name):
result = super().__getattr__(name)
if isinstance(result, types.MethodType):
return A(name, self._callmethod).get
return result
class monte_carlo:
def __init__(self, ):
self.x = np.ones((1000, 3))
self.E = np.mean(self.x)
self.Elist = []
self.T = None
def simulation(self, temperature):
self.T = temperature
for i in range(3000):
self.MC_step()
if i % 10 == 0:
self.Elist.append(self.E)
return
def MC_step(self):
x = self.x.copy()
k = np.random.randint(1000)
x[k] = (x[k] + np.random.uniform(-1, 1, 3))
temp_E = np.mean(self.x)
if np.random.random() < np.exp((self.E - temp_E) / self.T):
self.E = temp_E
self.x = x
return
#classmethod
def create(cls, *args, **kwargs):
# Register class
class_str = cls.__name__
BaseManager.register(class_str, cls, ObjProxy, exposed=tuple(dir(cls)))
# Start a manager process
manager = BaseManager()
manager.start()
# Create and return this proxy instance. Using this proxy allows sharing of state between processes.
inst = eval("manager.{}(*args, **kwargs)".format(class_str))
return inst
def proba(dE,pT):
return np.exp(-dE/pT)
if __name__ == "__main__":
Tlist = [1.1, 1.2, 1.3]
N = len(Tlist)
G = []
# Create our managed instances
for _ in range(N):
G.append(monte_carlo.create())
for _ in range(5):
# Run simulations in the manager server
results = []
with Pool(8) as pool:
for i in range(N): # this loop should be ran in multiprocess
results.append(pool.apply_async(G[i].simulation, (Tlist[i], )))
# Wait for the simulations to complete
for result in results:
result.get()
for i in range(N // 2):
dE = G[i].E - G[i + 1].E
pT = G[i].T + G[i + 1].T
p = proba(dE, pT) # (proba is a function, giving a probability depending on dE)
if np.random.random() < p:
T_temp = Tlist[i]
Tlist[i] = Tlist[i + 1]
Tlist[i + 1] = T_temp
print(Tlist)
This meets the criteria you wanted. It does not create any copies at all, rather, all arguments to the simulation method call are serialized inside the pool and sent to the manager server where the object is actually stored. It gets executed there, and the results (if any) are serialized and returned in the main process. All of this, with only using the builtins!
Output
[1.2, 1.1, 1.3]
Edit
Since you are using Linux, I encourage you to use multiprocessing.set_start_method inside the if __name__ ... clause to set the start method to "spawn". Doing this will ensure that the child processes do not have access to variables defined inside the clause.

python: multiple functions or abstract classes when dealing with data flow requirement

I have more of a design question, but I am not sure how to handle that. I have a script preprocessing.py where I read a .csv file of text column that I would like to preprocess by removing punctuations, characters, ...etc.
What I have done now is that I have written a class with several functions as follows:
class Preprocessing(object):
def __init__(self, file):
self.my_data = pd.read_csv(file)
def remove_punctuation(self):
self.my_data['text'] = self.my_data['text'].str.replace('#','')
def remove_hyphen(self):
self.my_data['text'] = self.my_data['text'].str.replace('-','')
def remove_words(self):
self.my_data['text'] = self.my_data['text'].str.replace('reference','')
def save_data(self):
self.my_data.to_csv('my_data.csv')
def preprocessing(file_my):
f = Preprocessing(file_my)
f.remove_punctuation()
f.remove_hyphen()
f.remove_words()
f.save_data()
return f
if __name__ == '__main__':
preprocessing('/path/to/file.csv')
although it works fine, i would like to be able to expand the code easily and have smaller classes instead of having one large class. So i decided to use abstract class:
import pandas as pd
from abc import ABC, abstractmethod
my_data = pd.read_csv('/Users/kgz/Desktop/german_web_scraping/file.csv')
class Preprocessing(ABC):
#abstractmethod
def processor(self):
pass
class RemovePunctuation(Preprocessing):
def processor(self):
return my_data['text'].str.replace('#', '')
class RemoveHyphen(Preprocessing):
def processor(self):
return my_data['text'].str.replace('-', '')
class Removewords(Preprocessing):
def processor(self):
return my_data['text'].str.replace('reference', '')
final_result = [cls().processor() for cls in Preprocessing.__subclasses__()]
print(final_result)
So now each class is responsible for one task but there are a few issues I do not know how to handle since I am new to abstract classes. first, I am reading the file outside the classes, and I am not sure if that is good practice? if not, should i pass it as an argument to the processor function or have another class who is responsible to read the data.
Second, having one class with several functions allowed for a flow, so every transformation happened in order (i.e, first punctuation is removes, then hyphen is removed,...etc) but I do not know how to handle this order and dependency in abstract classes.

pyqt threading, appropriate pattern?

I working on a gui application which plots real time data as it comes in. There is a main class which is a plot window. The window shows the plot of the data itself as well as a number of buttons, checkboxes, textboxes, etc. that a user can interact with to modify the plot output. All of these widgets configure the "settings" for the plot. The plotting of new data itself is resource intensive. The loading and plotting of the data may take around a second. I want the user to be able to modify the settings in the GUI without worrying about the lag of loading and plotting so I want to execute the loading and plotting in a separate thread.
I've come up with, what seems to me, to be a weird way to accomplish this but which works. It is something like this:
from PyQt5 import QtCore, QtWidgets
from PyQt5.QtCore import QThread, pyqtSignal
class PlotWorker(QtCore.QObject):
def __init__(self, plotwindow):
self.plotwindow = plotwindow
self.plot_thread = QThread()
self.plot_thread.start()
self.moveToThread(self.plot_thread)
def worker_update:
self.plotwindow.updating = True
self.plotwindow.plot()
self.plotwindow.updating = False
class PlotWindow(Ui_PlotWindow, QtWidgets.QMainWindow):
update_signal = pyqtSignal()
def __init__(self):
self.setupUi(self)
self.settings = self.settings_lineEdit.text()
self.worker = PlotWorker(self)
self.update_signal.connect(self.worker.worker_update)
self.update_settings_pushButton.clicked.connect(self.update_settings)
self.updating = False
self.refresh_timer = QtCore.QTimer(self)
self.refresh_timer.timeout.connect(self.update)
self.update_settings()
self.plot()
self.refresh_time.start(1)
def update_setting(self):
self.settings = self.settings_lineEdit.text()
def update(self)
if self.updating:
print('Already updating, cannot update until previous update complete')
elif not self.updating:
self.update_signal.emit()
def plot(self):
# Plot the data, slow. Uses self.settings
self.load()
...
def load(self):
# Load the data from the appropriate directory/file, slow
...
This gets the rough gist of what I am going for. Ui_PlotWindow implements setupUi. In practices there self.settings stands for a long list of settings variables which can be manipulated by the user. The check on self.updating ensures that requests to update the plot which arrive while the plot is currently updating are terminated rather than added to a queue in the worker thread event loop.
Though my code works, I feel like the pattern I am using with the worker thread is a bit strange. Basically ALL of the information needed is in the PlotWindow class, but since I want to call one of the methods of the PlotWindow class in a separate thread I feel I need a different QObject which can live in a different QThread to house the slot which will run the expensive method.
It just feels a bit roundabout to have a whole seperate class and object just to call a function which already exists in the first class.. However, I am new to threading applications and perhaps this is a normal pattern and not to be worried about?
I would appreciate any advice for how I might be able to make the flow of this function more clear.
The answer is to use a different pattern for threading. Instead of the self.thread = and self.moveToThread for the worker the worker should simply BE a QThread. The thread can be "started" in which case its run method is initiated in a new thread. If a request is made in the main thread to "start" the worker thread again while it is already running nothing will happen. Below is a revision of the above code which implements this pattern.
WARNING: This code is still a bad idea because the plotting involving redrawing GUI objects (the plots). This should not happen in a thread different than the main thread. For posterity I'll state that code in the format in the original question did in fact run but I think that was by luck that the plotting worked in the worker thread. The best pattern would be to have any data loading and processing happen in the worker thread but to then finally emit the data to a plotting method of the main thread to update the GUI with the new data.
class PlotWorker(QtCore.QThread):
def __init__(self, plotwindow):
self.plotwindow = plotwindow
def run(self):
self.plotwindow.plot()
class PlotWindow(Ui_PlotWindow, QtWidgets.QMainWindow):
update_signal = pyqtSignal()
def __init__(self):
self.setupUi(self)
self.settings = self.settings_lineEdit.text()
self.worker = PlotWorker(self)
self.update_settings_pushButton.clicked.connect(self.update_settings)
self.refresh_timer = QtCore.QTimer(self)
self.refresh_timer.timeout.connect(self.update)
self.update_settings()
self.plot()
self.refresh_time.start(1)
def update_setting(self):
self.settings = self.settings_lineEdit.text()
def update(self)
if self.worker.started():
print('Already updating, cannot update until previous update complete')
elif not self.worker.started():
self.worker.start()
def plot(self):
# Plot the data, slow. Uses self.settings
self.load()
...
def load(self):
# Load the data from the appropriate directory/file, slow
...

How to better structure my classes and properties?

I am trying to programme a board game using python and object orientated programming. However, the structure of my classes and properties seems a bit tortuous even if I have made it do what I want. It doesn't seem very elegant and I'm sure it will lead to things getting too complicated and making the code hard to follow as I progress. I will try to give a simplified example of how I've structured things:
class GameState:
def __init__(self, n_players):
self.n_players = n_players
self.players = []
for player in range(self.n_players):
self.players.append(Player(player))
// other properties
// other functions
class Player:
def __init__(self, name):
self.name = name
self.building = Building()
self.workers = 3
// other properties
class Building:
def __init__(self):
self.worker = False
def action(self, player):
if self.worker == True:
player.workers += 1
But now if I want a player to use the building action I have to do something like the below. It feels like I should be able to structure things better to avoid having to pass an instance of a Player class to the Building action function.
game = GameState(4)
game.players[0].building.action(game.players[0])
The idea is that each player will have an instance of the Building class.
It's hard to suggest alternatives without knowing the exact format of your game. If each building is only ever associated with a single player at a time, I'd add that to the initialisation and reference it directly.
class Building:
def __init__(self, player):
self.worker = False
self.player = player
def action(self):
if self.worker == True:
self.player.workers += 1
Alternatively, if each player will only ever have one building associated with them, then the action function should probably be in the player class rather than the building.

Need to call class method from different class without initialization of the first class or some other way around it

I have a small problem with my code.
There are two classes. First one creates a window with a Options button. Upon clicking the button, the second class is called and creates another window with an Ok button. Let's say there is also a checkbox, which changes the background color to black or something like that. After clicking the button, whatever changes were made in the options are stored into a file and the second window is closed.
All of this works fine. My problem is that now I need to call method update_init from the first class that will apply those changes to the MainWindow. The code below shows my first solution to this problem, but from what I understand, by using second mainloop I create second thread, which should be avoided.
class MainWindow:
def __init__(self, master):
self.master = master
self.options_btn = tk.Button(self.master, text="Options", command=self.open_options)
self.options_btn.pack()
self.options_window = None
def open_options(self):
options_master = tk.Toplevel()
self.options_window = OptionsWindow(options_master)
options_master.mainloop()
lst = meta_load() # loads changes from a file
self.update_init(lst)
def update_init(self, lst):
#code
class OptionsWindow:
def __init__(self, master):
self.master = master
self.ok_btn = tk.Button(self.master, text="OK", command=self.update_meta)
self.ok_btn.pack()
def update_meta(self):
meta_save(12) # saves changes into a file
self.master.destroy()
main_master = tk.Tk()
main_master.minsize(width=1280, height=720)
b = MainWindow(main_master)
main_master.mainloop()
My second solution was to just put both classes into one, but the code is quite messy if I do so.
Can I somehow call the method update_init (which is in the MainWindow class) from the OptionsWindow class without initializing new MainWindow class window? Or is there any other way to deal with this? I would appreciate any help.
I am sorry if this is too specific, I've tried to make it as general as possible, but it's a very specific problem and I couldn't find much information about it anywhere on the internet.
In general you can call a class method from anywhere you want and pass anything to it without initialisation of that class's instance, thanks to objective nature of python, but beware of self dependencies! Although, I don't think that's a good practice.
class A:
def __init__(self):
self.foo = 'foo'
def return_foo(self):
return self.foo
class B:
def __init__(self):
self.bar = 'bar'
print('Ha-ha Im inited!')
def return_bar(self):
try:
return self.bar
except AttributeError:
return 'bar'
def test():
a = A()
# b = B()
return_bar = getattr(B, 'return_bar', None)
if callable(return_bar):
print('%s%s' % (a.return_foo(), return_bar(None)))
test()
Links:
getattr
callable

Resources