IronPython: StartThread creates 'DummyThreads' that don't get cleaned up - multithreading

The problem I have is that in my IronPython application threads are being created but never cleaned up, even when the method they run has exited. In my application I start threads in two ways: a) by using Python-style threads (sub-classes of threading.Thread that do something in their run() method), and b) by using the .NET 'ThreadStart' approach. The Python-style threads behave as expected, and after their 'run()' exits they get cleaned up. The .NET style threads never get cleaned up, even after they have exited. You can call del, Abort, whatever you want, and it has no effect on them.
The following IronPython script demonstrates the issue:
import System
import threading
import time
import logging
def do_beeps():
logging.debug("starting do_beeps")
t_start = time.clock()
while time.clock() - t_start < 10:
System.Console.Beep()
System.Threading.Thread.CurrentThread.Join(1000)
logging.debug("exiting do_beeps")
class PythonStyleThread(threading.Thread):
def __init__(self, thread_name="PythonStyleThread"):
super(PythonStyleThread, self).__init__(name=thread_name)
def run(self):
do_beeps()
class ThreadStarter():
def start(self):
t = System.Threading.Thread(System.Threading.ThreadStart(do_beeps))
t.IsBackground = True
t.Name = "ThreadStartStyleThread"
t.Start()
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(levelname)s: %(message)s', level=logging.DEBUG, datefmt='%H:%M:%S')
# Start some ThreadStarter threads:
for _ in range(5):
ts = ThreadStarter()
ts.start()
System.Threading.Thread.CurrentThread.Join(200)
# Start some Python-style threads:
for _ in range(5):
pt = PythonStyleThread()
pt.start()
System.Threading.Thread.CurrentThread.Join(200)
# Do something on the main thread:
for _ in range(30):
print(".")
System.Threading.Thread.CurrentThread.Join(1000)
When this is debugged in PyDev, what I see is that all the threads appear as expected in the 'debug' view as they are created:
but whereas the Python-style ones disappear after they've finished, the .NET / ThreadStart ones stay until the main thread exits.
As can be seen in the image, in the debugger the problematic threads appear with names 'Dummy-4', 'Dummy-5' etc, whereas the Pythonic ones appear with the name I've given them ('PythonStyleThread'). Looking in the threading.py file in my IronPython installation I see there is a class called "_DummyThread", a subclass of Thread, that sets its 'name' as 'name=_newname("Dummy-%d")', so it looks like by using ThreadStart I'm ending up with _DummyThreads. The comment for the class also says:
# Dummy thread class to represent threads not started here.
# These aren't garbage collected when they die, nor can they be waited for.
which would explain why I can't get rid of them.
But I don't want 'DummyThread's. I just want normal ones, that behave nicely and get garbage-collected when they've finished doing their thing.
Now, a slightly odd thing about all of this is that unless I set up the logger I don't see the DummyThread entries in the debugger at all (although they still run). This may be a funny of the PyDev debugger, or it may relevant. Is there any sane reason why logging should have any bearing on this? Can I solve my problem just by not logging in my thread?
Here, it says:
"There is the possibility that "dummy thread objects" are created. These are thread objects corresponding to "alien threads", which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be joined. They are never deleted, since it is impossible to detect the termination of alien threads."
Which makes me wonder why I've had the misfortune of ending up with them?
While I have a workaround in that I can use Python-style threading.Thread subclasses everywhere I currently use .NET 'ThreadStart' threads, I am not keen to do this as the reason I was using .NET style threads in certain places was because they give me an Abort method (whereas the Python ones don't). I know Aborting threads is a Bad Thing, but the application is a unit-testing framework, and I a) need to run unit-tests in a thread, and b) have no control over their contents (they are written by third-parties), so I have no means of periodically checking for a 'please shut me down nicely' flag on these threads, and in extremis may need to kill them rudely.
So a) why am I getting DummyThreads, b) has this got anything to do with logging and c) what can I do about it?
Thanks.

Related

Python3 : can I lock multiple locks simultaneously?

I have multiple locks that lock different parts of my API.
To lock any method I do something like this :
import threading
class DoSomething:
def __init__():
self.lock = threading.Lock()
def run(self):
with self.lock:
# do stuff requiring lock here
And for most use cases this works just fine.
But, I am unsure if what I am doing when requiring multiple locks works or not :
import threading
class DoSomething:
def __init__():
self.lock_database = threading.Lock()
self.lock_logger = threading.Lock()
def run(self):
with self.lock_database and self.lock_logger:
# do stuff requiring lock here
As it is, the code runs just fine but I am unsure if it runs as I want it to.
My question is : are the locks being obtained simultaneously or is the first one acquired and only then the second is also acquired.
Is my previous code as follows ?
with self.lock1:
with self.lock2:
# do stuff here
As it is, the code currently works but, since the chances of my threads requiring the same lock simultaneously is extremely low to begin with, I may end up with a massive headache to debug later
I am asking the question as I am very uncertain on how to test my code to ensure that it is working as intended and am equally interested in having the answer and knowing how I can test it to ensure that it works ( and not end up with the end users testing it for me )
Yes, you can do that, but beware of deadlocks. A deadlock occurs when one thread is unable to make progress because it needs to acquire a lock that some other thread is holding, but the second thread is unable to make progress because it wants the lock that the first thread already is holding.
Your code example locks lock_database first, and lock_logger second. If you can guarantee that any thread that locks them both will always lock them in that same order, then you're safe. A deadlock can never happen that way. But if one thread locks lock_database before trying to lock lock_logger, and some other thread tries to grab them both in the opposite order, that's a deadlock waiting to happen.
Looks easy. And it is, except...
...In a more sophisticated program, where locks are attached to objects that are passed around to different functions, then it may not be so easy because one thread may call some foobar(a, b) function, while another thread calls the same foobar() on the same two objects, except the objects are switched.

Process finishes but cannot be joined?

To accelerate a certain task, I'm subclassing Process to create a worker that will process data coming in samples. Some managing class will feed it data and read the outputs (using two Queue instances). For asynchronous operation I'm using put_nowait and get_nowait. At the end I'm sending a special exit code to my process, upon which it breaks its internal loop. However... it never happens. Here's a minimal reproducible example:
import multiprocessing as mp
class Worker(mp.Process):
def __init__(self, in_queue, out_queue):
super(Worker, self).__init__()
self.input_queue = in_queue
self.output_queue = out_queue
def run(self):
while True:
received = self.input_queue.get(block=True)
if received is None:
break
self.output_queue.put_nowait(received)
print("\tWORKER DEAD")
class Processor():
def __init__(self):
# prepare
in_queue = mp.Queue()
out_queue = mp.Queue()
worker = Worker(in_queue, out_queue)
# get to work
worker.start()
in_queue.put_nowait(list(range(10**5))) # XXX
# clean up
print("NOTIFYING")
in_queue.put_nowait(None)
#out_queue.get() # XXX
print("JOINING")
worker.join()
Processor()
This code never completes, hanging permanently like this:
NOTIFYING
JOINING
WORKER DEAD
Why?
I've marked two lines with XXX. In the first one, if I send less data (say, 10**4), everything will finish normally (processes join as expected). Similarly in the second, if I get() after notifying the workers to finish. I know I'm missing something but nothing in the documentation seems relevant.
Documentation mentions that
When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences [...] After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.
https://docs.python.org/3.7/library/multiprocessing.html#pipes-and-queues
and additionally that
whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate.
https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing-programming
This means that the behaviour you describe is caused probably by a racing condition between self.output_queue.put_nowait(received) in the worker and joining the worker with worker.join() in the Processers __init__. If joining was faster than feeding it into the queue, everything finishes fine. If it was too slow, there is an item in the queue, and the worker would not join.
Uncommenting the out_queue.get() in the main process would empty the queue, which allows joining. But as it is important for the queue to return if the queue would already be empty, using a time-out might be an option to try to wait out the racing condition, e.g out_qeue.get(timeout=10).
Possibly important might also be to protect the main routine, especially for Windows (python multiprocessing on windows, if __name__ == "__main__")

Terminating a QThread that wraps a function (i.e. can't poll a "wants_to_end" flag)

I'm trying to create a thread for a GUI that wraps a long-running function. My problem is thus phrased in terms of PyQt and QThreads, but I imagine the same concept could apply to standard python threads too, and would appreciate any suggestions generally.
Typically, to allow a thread to be exited while running, I understand that including a "wants_to_end" flag that is periodically checked within the thread is a good practice - e.g.:
Pseudocode (in my thread):
def run(self):
i = 0
while (not self.wants_to_end) and (i < 100):
function_step(i) # where this is some long-running function that includes many streps
i += 1
However, as my GUI is to wrap a pre-written long-running function, I cannot simply insert such a "wants_to_end" flag poll into the long running code.
Is there another way to forcibly terminate my worker thread from my main GUI (i.e. enabling me to include a button in the GUI to stop the processing)?
My simple example case is:
class Worker(QObject):
finished = pyqtSignal()
def __init__(self, parent=None, **kwargs):
super().__init__(parent)
self.kwargs = kwargs
#pyqtSlot()
def run(self):
result = SomeLongComplicatedProcess(**self.kwargs)
self.finished.emit(result)
with usage within my MainWindow GUI:
self.thread = QThread()
self.worker = Worker(arg_a=1, arg_b=2)
self.worker.finished.connect(self.doSomethingInGUI)
self.worker.moveToThread(self.thread)
self.thread.started.connect(self.worker.run)
self.thread.start()
If the long-running function blocks, the only way to forcibly stop the thread is via its terminate() method (it may also be necessary to call wait() as well). However, there is no guarantee that this will always work, and the docs also state the following:
Warning: This function is dangerous and its use is discouraged. The
thread can be terminated at any point in its code path. Threads can be
terminated while modifying data. There is no chance for the thread to
clean up after itself, unlock any held mutexes, etc. In short, use
this function only if absolutely necessary.
A much cleaner solution is to use a separate process, rather than a separate thread. In python, this could mean using the multiprocessing module. But if you aren't familiar with that, it might be simpler to run the function as a script via QProcess (which provides signals that should allow easier integration with your GUI). You can then simply kill() the worker process whenever necessary. However, if that solution is somehow unsatisfactory, there are many other IPC approaches that might better suit your requirements.

Why does some widgets don't update on Qt5?

I am trying to create a PyQt5 application, where I have used certain labels for displaying status variables. To update them, I have implemented custom pyqtSignal manually. However, on debugging I find that the value of GUI QLabel have changed but the values don't get reflected on the main window.
Some answers suggested calling QApplication().processEvents() occasionally. However, this instantaneously crashes the application and also freezes the application.
Here's a sample code (all required libraries are imported, it's just the part creating problem, the actual code is huge):
from multiprocessing import Process
def sub(signal):
i = 0
while (True):
if (i % 5 == 0):
signal.update(i)
class CustomSignal(QObject):
signal = pyqtSignal(int)
def update(value):
self.signal.emit(value)
class MainApp(QWidget):
def __init__(self):
super().__init__()
self.label = QLabel("0");
self.customSignal = CustomSignal()
self.subp = Process(target=sub, args=(customSignal,))
self.subp.start()
self.customSignal.signal.connect(self.updateValue)
def updateValue(self, value):
print("old value", self.label.text())
self.label.setText(str(value))
print("new value", self.label.text())
The output of the print statements is as expected. However, the text in label does not change.
The update function in CustomSignal is called by some thread.
I've applied the same method to update progress bar which works fine.
Is there any other fix for this, other than processEvents()?
The OS is Ubuntu 16.04.
The key problem lies in the very concept behind the code.
Processes have their own address space, and don't share data with another processes, unless some inter-process communication algorithm is used. Perhaps, multithreading module was used instead of threading module to bring concurrency to avoid Python's GIL and speedup the program. However, subprocess has cannot access the data of parent process.
I have tested two solutions to this case, and they seem to work.
threading module: No matter threading in Python is inefficient due to GIL, but it's still sufficient to some extent for basic concurrency demands. Note the difference between concurrency and speedup.
QThread: Since you are using PyQt, there's isn't any issue in using QThread, which is a better option because it takes concurrency to multiple cores taking advantage of operating system's system call, rather than Python in the middle.
Try adding
self.label.repaint()
immediately after updating the text, like this:
self.label.setText(str(value))
self.label.repaint()

PySide timers/threading crash

I've written a PySide Windows application that uses libvlc to show a video, log keystrokes, and write aggregated information about those keystrokes to a file. I'm experiencing two bugs that are causing the application to crash (other question here -> https://stackoverflow.com/questions/18326943/pyside-qlistwidget-crash).
The application writes the keystroke file at every five minute interval on the video. Users can change the playback speed, so that five minute interval may take more or less than five minutes; it's not controlled by a timer.
The video continues playing while the file is written, so I've created an object inheriting from threading.Thread for the file creation - IntervalFile. Some information about the file to be written is passed in the constructor; IntervalFile doesn't access its parent (the main QWidget) at all. This is the only threading object I use in the app. There are no timer declared anywhere.
Intermittently, the application will crash and I'll get the following message: "QObject::killTimers: timers cannot be stopped from another thread".
The code that creates IntervalFile is (part of CustomWidget, inherited from QWidget):
def doIntervalChange(self):
...
ifile = IntervalFile(5, filepath, dbpath) # db is sqlite, with new connection created within IntervalFile
ifile.start()
#end of def
doIntervalChange is called from within QWidget using a signal.
IntervalFile is:
class IntervalFile(threading.Thread):
def __init__(self, interval, filepath, dbpath):
# declaration of variables
threading.Thread.__init__(self)
def run(self):
shutil.copy('db.local', self.dbPath) # because db is still being used in main QWidget
self.localDB = local(self.dbPath) # creates connection to sqlite db, with sql within the object to make db calls easier
# query db for keystroke data
# write file
self.localDB.close()
self.localDB = None
os.remove(self.dbPath) # don't need this copy anymore
When ifile.start() is commented out, I don't see the killTimers crash. Any suggestions? Note that the crash seems random; sometimes I can use the app (just continuely pressing the same keystroke over and over) for an hour without it crashing, sometimes it's within the first couple of intervals. Because of this difficulty reproducing the crashes, I think these lines of code are the issue, but I'm not 100% sure.
I'm pretty sure you need to hold a reference to your thread object. When your doIntervalChange() method finishes, nothing is holding a reference to the thread object (ifile) any more and so it can be garbage collected. Presumably this is why the crash happens randomly (if the thread finishes it's task before the object is garbage collected, then you don't have a problem).
Not exactly sure what is creating the QTimers, but I'm fairly certain that won't affect my proposed solution!
So in doIntervalChange() save a reference to ifile in a list, and periodically clean up the list when threads have finished execution. Have a look at this for an idea (and if a better way to clean up threads shows up in that post, implement that!): Is there a more elegant way to clean up thread references in python? Do I even have to worry about them?

Resources