Python - Multiprocessing vs Multithreading for file-based work - multithreading

I have created a GUI (using wxPython) that allows the user to search for files and their content. The program consists of three parts:
The main GUI window
The searching mechanism (seperate class)
The output window that displays the results of the file search (seperate class)
Currently I'm using pythons threading module to run the searching mechanism (2) in a separate thread, so that the main GUI can still work flawlessly. I'm passing the results during runtime to the output window (3) using Queue. This works fine for less performance requiring file-reading-actions, but as soon as the searching mechanism requires more performance, the main GUI window (1) starts lagging.
This is roughly the schematic:
import threading
import os
import wx
class MainWindow(wx.Frame): # this is point (1)
def __init__(self, parent, ...):
# Initialize frame and panels etc.
self.button_search = wx.Button(self, label="Search")
self.button_search.Bind(wx.EVT_BUTTON, self.onSearch)
def onSearch(self, event):
"""Run filesearch in separate Thread""" # Here is the kernel of my question
filesearch = FileSearch().Run()
searcher = threading.Thread(target=filesearch, args=(args,))
searcher.start()
class FileSearch(): # this is point (2)
def __init__(self):
...
def Run(self, args):
"""Performs file search"""
for root, dirs, files in os.walk(...):
for file in files:
...
def DetectEncoding(self):
"""Detects encoding of file for reading"""
...
class OutputWindow(wx.Frame): # this is point (3)
def __init__(self, parent, ...):
# Initialize output window
...
def AppendItem(self, data):
"""Appends a fileitem to the list"""
...
My questions:
Is python's multiprocessing module better suited for this specific performance requiring job?
If yes, which way of interprocess communication (IPC) should I choose to send the results from the searching mechanism class (2) to the output window (3) and how should implement it schematically?

Related

How to force os.stat re-read file stats by same path

I have a code that is architecturally close to posted below (unfortunately i can't post full version cause it's proprietary). I have an self-updating executable and i'm trying to test this feature. We assume that full path to this file will be in A.some_path after executing input. My problem is that assertion failed, because on second call os.stat still returning the previous file stats (i suppose it thinks that nothing could changed so it's unnecessary). I have tried to launch this manually and self-updating works completely fine and the file is really removing and recreating with stats changing. Is there any guaranteed way to force os.stat re-read file stats by the same path, or alternative option to make it works (except recreating an A object)?
from pathlib import Path
import unittest
import os
class A:
some_path = Path()
def __init__(self, _some_path):
self.some_path = Path(_some_path)
def get_path(self):
return self.some_path
class TestKit(unittest.TestCase):
def setUp(self):
pass
def check_body(self, a):
some_path = a.get_path()
modification_time = os.stat(some_path).st_mtime
# Launching self-updating executable
self.assertTrue(modification_time < os.stat(some_path).st_mtime)
def check(self):
a = A(input('Enter the file path\n'))
self.check_body(a)
def Tests():
suite = unittest.TestSuite()
suite.addTest(TestKit('check'))
return suite
def main():
tests_suite = Tests()
unittest.TextTestRunner().run(tests_suite)
if __name__ == "__main__":
main()
I have found the origins of the problem: i've tried to launch self-updating via os.system which wait till the process done. But first: during the self-updating we launch several detached proccesses and actually should wait unitl all them have ended, and the second: even the signal that the proccess ends doesn't mean that OS really completely realease the file, and looks like on assertTrue we are not yet done with all our routines. For my task i simply used sleep, but normal solution should analyze the existing proccesses in the system and wait for them to finish, or at least there should be several attempts with awaiting.

Multithreading with Watchdog

I need to run a function on all new files in a folder.
I've chosen watchdog to detect event handling, as it is rather straightforward to use. However, as the operation on each file takes roughly 30-40 seconds, the process takes relatively long whenever large quantities (ex 1000 files) of files have been added to the folder.
I have heard of multi threading, and I believe that this is the answer to my issue: Instead of running the function one by one on each item that is added - running the function (do_smth) on as many files as possible, given the restriction of my RAM. How should I go about it?
Please a minimal reproducable example of my code below:
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
folder_to_watch= "/trigger_folder"
class EventHandler(FileSystemEventHandler):
def do_smth(self):
print("do_something")
time.sleep(2)
def on_created(self, event): # when file is created
print("Got event for file %s" % event.src_path)
time.sleep(1)
self.do_smth()
observer = Observer()
event_handler = EventHandler() # create event handler
# set observer to use created handler in directory
observer.schedule(event_handler, path=folder_to_watch)
observer.start()
# sleep until keyboard interrupt, then stop + rejoin the observer
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Edit 1:
The do_stmh function in reality checks if the new file is an image file, and if so opens it via cv2, takes its height and width, and saves it to a .csv file among some other operations (that unfortunately take longer).

Python unittest framework, unfortunately no build-in timeout possibility

I'm using the python unittest framework to perform unit tests. The python version i'm using is 3.6. I'm using the windows OS.
My production code is currently a little bit unstable (as i added some new functionality) and tends to hangup in some internal asynchronous loops. I'm working on fixing those hangups, currently.
However, the tracking of those bugs is hindered by the corresponding testcases hanging up, too.
I would merely like that the corresponding testcases stop after e.g. 500ms if they not run through and be marked as FAILED in order to let all the other testcases beeing continued.
Unfortunately, the unittest framework does not support timeouts (If i had known in advance ...). Therefore i'm searching for a workaround of this. If some package would add that missing functionality to the unittest framework i would be willing to try. What i don't want is that my production code relies on too much non standard packages, for the unittests this would be OK.
I'm a little lost on howto adding such functionality to the unittests.
Therefore i just tried out some code from here: How to limit execution time of a function call?. As they said somewhere that threading should not be used to implement timeouts i tried using multiprocessing, too.
Please note that the solutions proposed here How to specify test timeout for python unittest? do not work, too. They are designed for linux (using SIGALRM).
import multiprocessing
# import threading
import time
import unittest
class InterruptableProcess(multiprocessing.Process):
# class InterruptableThread(threading.Thread):
def __init__(self, func, *args, **kwargs):
super().__init__()
self._func = func
self._args = args
self._kwargs = kwargs
self._result = None
def run(self):
self._result = self._func(*self._args, **self._kwargs)
#property
def result(self):
return self._result
class timeout:
def __init__(self, sec):
self._sec = sec
def __call__(self, f):
def wrapped_f(*args, **kwargs):
it = InterruptableProcess(f, *args, **kwargs)
# it = InterruptableThread(f, *args, **kwargs)
it.start()
it.join(self._sec)
if not it.is_alive():
return it.result
#it.terminate()
raise TimeoutError('execution expired')
return wrapped_f
class MyTests(unittest.TestCase):
def setUp(self):
# some initialization
pass
def tearDown(self):
# some cleanup
pass
#timeout(0.5)
def test_XYZ(self):
# looong running
self.assertEqual(...)
The code behaves very differently using threads vs. processes.
In the first case it runs through but continues execution of the function despite timeout (which is unwanted). In the second case it complains about unpickable objects.
In both cases i would like to know howto do proper cleanup, e.g. call the unittest.TestCase.tearDown method on timeout from within the decorator class.

PyQt5 GUI freeze caused by Windows focus-follows-mouse

When Windows focus-follows-mouse-without-raising-the-window is enabled by either of the two methods linked to below, I consistently get PyQt5 GUI 'freezes' where you have to type any character in the terminal that you ran python from in order to unfreeze the GUI; complete description and test case (Windows 10, Python 3.6.1, PyQt5) is here: pyqt5 click in terminal causes GUI freeze
To enable the focus-follows-mouse-without-raise behavior, try either of these - they both work in Windows 10:
downloadable program ('X-Mouse' though that name is used by other programs):
https://joelpurra.com/projects/X-Mouse_Controls/
registry hack description:
https://sinewalker.wordpress.com/2010/03/10/ms-windows-focus-follows-mouse-registry-hacks/
So - a few questions:
can anyone reproduce the issue? It seems 100% reproducible for me, but it would be great to hear the same from someone else.
is there a way to change the python code to detect-and-circumvent focus-follows-mouse, or just to be immune to it, i.e. maybe by ensuring the GUI application always takes focus back again when you - for example - click in a dialog or qmessagebox owned by the main GUI window, or by some other means? (Is the object hierarchy set up optimally, and if not, maybe this could all be resolved by correcting the ownership structure?)
The brute-force solution seems to work, though I'd like to leave this question open to see if someone knows of a more optimal solution; it took a fair amount of searching to figure out the right way; mainly by taking a look a the open-source code for X-Mouse. Basically, this method takes effect immediately, whereas the registry hack doesn't take effect until reboot.
New version of pyqt_freeze_testcase.py (the file from the referenced stackoverflow question); the changes are only additions, noted between lines of hash marks:
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.QtWidgets import *
import sys
####################### added begin:
import win32gui
import win32con
####################### added end
# import the UI file created with pyuic5
from minimal_ui import Ui_Dialog
class MyWindow(QDialog,Ui_Dialog):
def __init__(self,parent):
QDialog.__init__(self)
self.parent=parent
self.ui=Ui_Dialog()
self.ui.setupUi(self)
################################# added begin:
self.initialWindowTracking=False
try:
self.initialWindowTracking=win32gui.SystemParametersInfo(win32con.SPI_GETACTIVEWINDOWTRACKING)
except:
pass
if self.initialWindowTracking:
print("Window Tracking was initially enabled. Disabling it for now; will re-enable on exit.")
win32gui.SystemParametersInfo(win32con.SPI_SETACTIVEWINDOWTRACKING,False)
################################# added end
def showMsg(self):
self.really1=QMessageBox(QMessageBox.Warning,"Really?","Really do stuff?",
QMessageBox.Yes|QMessageBox.No,self,Qt.WindowTitleHint|Qt.WindowCloseButtonHint|Qt.Dialog|Qt.MSWindowsFixedSizeDialogHint|Qt.WindowStaysOnTopHint)
self.really1.show()
self.really1.raise_()
if self.really1.exec_()==QMessageBox.No:
print("nope")
return
print("yep")
################################## added begin:
def closeEvent(self,event):
if self.initialWindowTracking:
print("restoring initial window tracking behavior ("+str(self.initialWindowTracking)+")")
win32gui.SystemParametersInfo(win32con.SPI_SETACTIVEWINDOWTRACKING,self.initialWindowTracking)
################################## added end
def main():
app = QApplication(sys.argv)
w = MyWindow(app)
w.show()
sys.exit(app.exec_())
if __name__ == "__main__":
main()

Why do I get NSAutoreleasePool double release when using Python/Pyglet on OS X

I'm using Python 3.5 and Pyglet 1.2.4 on OS X 10.11.5. I am very new to this setup.
I am trying to see if I can use event handling to capture keystrokes (without echoing them to the screen) and return them to the main program one at a time by separate invocations of the pyglet.app.run method. In other words I am trying to use Piglet event handling as if it were a callable function for this purpose.
Below is my test program. It sets up the Pyglet event mechanism and then calls it four times. It works as desired but causes the system messages shown below.
import pyglet
from pyglet.window import key
event_loop = pyglet.app.EventLoop()
window = pyglet.window.Window(width=400, height=300, caption="TestWindow")
#window.event
def on_draw():
window.clear()
#window.event
def on_key_press(symbol, modifiers):
global key_pressed
if symbol == key.A:
key_pressed = "a"
else:
key_pressed = 'unknown'
pyglet.app.exit()
# Main Program
pyglet.app.run()
print(key_pressed)
pyglet.app.run()
print(key_pressed)
pyglet.app.run()
print(key_pressed)
pyglet.app.run()
print(key_pressed)
print("Quitting NOW!")
Here is the output with blank lines inserted for readability. The first message is different and appears even if I comment out the four calls to piglet.app.run. The double release messages do not occur after every call to event handling and do not appear in a consistent manner from one test run to the next.
/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5 "/Users/home/PycharmProjects/Test Event Handling/.idea/Test Event Handling 03B.py"
2016-07-28 16:49:59.401 Python[11419:4185158]ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/8q/bhzsqtz900s742c17gkj_y740000gr/T/org.python.python.savedState
a
2016-07-28 16:50:02.841 Python[11419:4185158] *** -[NSAutoreleasePool drain]: This pool has already been drained, do not release it (double release).
2016-07-28 16:50:03.848 Python[11419:4185158] *** -[NSAutoreleasePool drain]: This pool has already been drained, do not release it (double release).
a
a
2016-07-28 16:50:04.632 Python[11419:4185158] *** -[NSAutoreleasePool drain]: This pool has already been drained, do not release it (double release).
a
Quitting NOW!
Process finished with exit code 0
Basic question: Why is this happening and what can I do about it?
Alternate question: Is there a better way to detect and get a users keystrokes without echoing them to the screen. I will be using Python and Pyglet for graphics so I was trying this using Pyglet's event handling.
Try to play with this simple example. It uses the built-in pyglet event handler to send the key pressed to a function that can then handle it. It shows that pyglet.app itself is the loop. You don't need to create any other.
#!/usr/bin/env python
import pyglet
class Win(pyglet.window.Window):
def __init__(self):
super(Win, self).__init__()
def on_draw(self):
self.clear()
# display your output here....
def on_key_press(self, symbol, modifiers):
if symbol== pyglet.window.key.ESCAPE:
exit(0)
else:
self.do_something(symbol)
# etc....
def do_something(symbol):
print symbol
# here you can test the input and then redraw
window = Win()
pyglet.app.run()

Resources