Pickle load on off thread blocks main thread - multithreading

Calling ObClass's prep() blocks the main thread until the pickle finishes. Why? How can I unpickle data in the background?
Try this at home:
def PrepFn(ob):
ob.lock.acquire(1)
try:
print "begin load"
f = open(ob.filename, "rb")
ob.data = cPickle.load(f)
print "end load"
except Exception as msg:
print(str(msg))
ob.lock.release()
f.close()
class ObClass:
def __init__(self, filename):
self.lock = threading.Lock()
self.filename = filename
self.data = None
def prep(self):
thread.start_new_thread(PrepFn, (self,))
def get(self):
self.lock.acquire(1)
self.lock.release()
return self.data
def make_data(filename):
print "generating data"
data = np.asarray(np.random.normal(size=(10000, 1000)))
print "writing data to disk"
f = open(filename, "wb")
cPickle.dump(data, f)
f.close()
def test(filename):
x = ObClass(filename)
x.prep()
for i in xrange(1000):
print i
print "get data"
data = x.get()
print "got data"
To see it in action, do
filename = "test.pkl"
test.make_data(filename)
test.test(filename)
For me, this goes:
0
1
2
begin load
3
4
[...]
83
followed by a long pause, followed by
end load
84
85
86
[...]
996
997
998
999
get data
got data

Python has Global Interpreter Lock (GIL), which means everything done by interpreter in one process has to be confined in one CPU core.
When you start the IO thread, it is being scheduled but not started immediately. Hence the delay.
When your thread starts, it triggers IO interrupt. IO is done by external C routine, so your IO thread can release the GIL. This then enables your main thread to run and to keep printing until 83.
Then your IO call from C routine returns the data stream, which is caught by your Python IO thread. When the Python IO thread runs and parses the data stream into Python object, your main thread needs to wait, which causes the pause. (cPickle usually takes double RAM to unfold object, so if you monitor top, you can see the real-time execution of object unfolding)
When your IO thread finishes parsing data, your main thread starts again to print to the end and call get.

Related

How to stop child threads if keyboard exception occurs in python?

I'm facing problem with the thread concept i.e I have a function which will create 10 threads to do task. If any key board interruption occurs, those created threads are still executing and i would like to stop those threads and revert back the changes.
The following code sinppet is the sample approach:
def store_to_db(self,keys_size,master_key,action_flag,key_status):
for iteration in range(10):
t = threading.Thread(target=self.store_worker, args=())
t.start()
threads.append(t)
for t in threads:
t.join()
def store_worker():
print "DOING"
The idea to make this work is:
you need a "thread pool" where threads are checking against if their do_run attribute is falsy.
you need a "sentinel thread" outside that pool which checks the thread status in the pool and adjusts the do_run attribute of the "thread pool" thread on demand.
Example code:
import threading
import random
import time
import msvcrt as ms
def main_logic():
# take 10 worker threads
threads = []
for i in range(10):
t = threading.Thread(target=lengthy_process_with_brake, args=(i,))
# start and append
t.start()
threads.append(t)
# start the thread which allows you to stop all threads defined above
s = threading.Thread(target=sentinel, args=(threads,))
s.start()
# join worker threads
for t in threads:
t.join()
def sentinel(threads):
# this one runs until threads defined in "threads" are running or keyboard is pressed
while True:
# number of threads are running
running = [x for x in threads if x.isAlive()]
# if kb is pressed
if ms.kbhit():
# tell threads to stop
for t in running:
t.do_run = False
# if all threads stopped, exit the loop
if not running:
break
# you don't want a high cpu load for nothing
time.sleep(0.05)
def lengthy_process_with_brake(worker_id):
# grab current thread
t = threading.currentThread()
# start msg
print(f"{worker_id} STARTED")
# exit condition
zzz = random.random() * 20
stop_time = time.time() + zzz
# imagine an iteration here like "for item in items:"
while time.time() < stop_time:
# the brake
if not getattr(t, "do_run", True):
print(f"{worker_id} IS ESCAPING")
return
# the task
time.sleep(0.03)
# exit msg
print(f"{worker_id} DONE")
# exit msg
print(f"{worker_id} DONE")
main_logic()
This solution does not 'kill' threads, just tell them to stop iterating or whatever they do.
EDIT:
I just noticed that "Keyboard exception" was in the title and not "any key". Keyboard Exception handling is a bit different, here is a good solution for that. The point is almost the same: you tell the thread to return if a condition is met.

Why is this queue.join call blocking indefinitely?

I'm playing about with a personal project in python3.6 and I've run into the following issue which results in the my_queue.join() call blocking indefinitely. Note this isn't my actual code but a minimal example demonstrating the issue.
import threading
import queue
def foo(stop_event, my_queue):
while not stop_event.is_set():
try:
item = my_queue.get(timeout=0.1)
print(item) #Actual logic goes here
except queue.Empty:
pass
print('DONE')
stop_event = threading.Event()
my_queue = queue.Queue()
thread = threading.Thread(target=foo, args=(stop_event, my_queue))
thread.start()
my_queue.put(1)
my_queue.put(2)
my_queue.put(3)
print('ALL PUT')
my_queue.join()
print('ALL PROCESSED')
stop_event.set()
print('ALL COMPLETE')
I get the following output (it's actually been consistent, but I understand that the output order may differ due to threading):
ALL PUT
1
2
3
No matter how long I wait I never see ALL PROCESSED output to the console, so why is my_queue.join() blocking indefinitely when all the items have been processed?
From the docs:
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
You're never calling q.task_done() inside your foo function. The foo function should be something like the example:
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()

PyQt signal not emitted in a pyqtSlot after moveToThread

So I want to learn using moveToThread and see the effect of calling onTimeout() of class GenericWorker from a different thread (main thread in this case). The weird thing is that the finish_sig in GenericWorker never gets emitted (should happen at the last line of onTimeout() ). Since it connects to terminate_thread() in Sender class, it should at least print out a terminate_thread in the console, but nothing happens at all.
My original purpose for using it is to emit a signal to quit the thread after onTimeout() is done. But now I can only do t.quit() from main to quit the thread.
Thank you all for spending time taking care of my question!
from PyQt4.QtCore import *
from PyQt4.QtGui import *
import threading
from time import sleep
import sys
class GenericWorker(QObject):
finish_sig = pyqtSignal() # this one never gets emitted!
#pyqtSlot(str, str)
def onTimeout(self, cmd1, cmd2):
print 'onTimeout get called from thread ID: '
print QThread.currentThreadId()
print 'received cmd 1: ' + cmd1
print 'received cmd 2: ' + cmd2
self.finish_sig.emit() # supposed to emit here!
class Sender(QObject):
send_sig = pyqtSignal(str, str)
terminate_sig = pyqtSignal()
def emit_sig(self, cmd):
print 'emit_sig thread ID: '
print QThread.currentThreadId()
sleep(1)
self.send_sig.emit(cmd, '2nd_cmd')
def terminate_thread(self):
print 'terminate_thread'
self.terminate_sig.emit()
if __name__ == "__main__":
app = QApplication(sys.argv)
print 'Main thread ID: '
print QThread.currentThreadId()
t = QThread()
my_worker = GenericWorker()
my_worker.moveToThread(t)
t.start()
my_sender = Sender()
my_sender.send_sig.connect(my_worker.onTimeout)
my_sender.terminate_sig.connect(t.quit)
my_worker.finish_sig.connect(my_sender.terminate_thread)
# my_worker.finish_sig.connect(t.quit)
my_sender.emit_sig('hello')
sleep(1)
# my_sender.terminate_thread()
# t.quit() # this one works
# t.wait()
exit(1)
sys.exit(app.exec_())
The output:
Main thread ID:
46965006517856
emit_sig thread ID:
46965006517856
onTimeout get called from thread ID:
1111861568
received cmd 1: hello
received cmd 2: 2nd_cmd
QThread: Destroyed while thread is still running
UPDATE:
After referring to #tmoreau and #ekhumoro's answers, there are two key problems with this code:
The exit(1) is not a proper way to exit, I need to remove this line.
I don't have a way to exit the QApplication, what I need to do is to add t.finish.connect(app.quit) to exit the application. (By the way, the last line sys.exit(app.exec_()) seems not taking care of the exiting of the QApplication)
In sum, there are basically three things that I need to exit: QThread, QApplication and sys, what I missed is to exit QApplication. Let me know if my understanding is right or not...
Your issue is that you exit the program before it's complete.
my_sender.emit_sig('hello')
sleep(1)
exit(1)
sys.exit(app.exec_())
exit() ends your program, even if the thread has not finished running, hence the error:
QThread: Destroyed while thread is still running
If you remove sleep(1), you'll see the program stops even earlier:
Main thread ID:
46965006517856
emit_sig thread ID:
46965006517856
QThread: Destroyed while thread is still running
Here's more or less what's happening in parallel:
# main thread #worker thread
my_sender.emit_sig('hello') #slot onTimeout is called
sleep(1) #print "onTimeout get called..."
exit(1) #emit finish_sig
sys.exit(app.exec_())
# slot terminate_thread is called #thread ends (t.quit)
If you remove exit(1), your program will work, because you create an event loop with app.exec_(). The event loop means your program is always waiting to catch signals, and will not stop even if there's nothing left to do. So the thread has plenty of time to end :)
In Qt, you usually stop the event loop by closing your main window. Therefore, a cleaner way to implement your thread is:
class window(QWidget):
def __init__(self,parent=None):
super(window,self).__init__(parent)
t=QThread(self)
self.my_worker = GenericWorker()
self.my_worker.moveToThread(t)
t.start()
self.my_sender = Sender()
self.my_sender.send_sig.connect(self.my_worker.onTimeout)
self.my_sender.terminate_sig.connect(t.quit)
self.my_worker.finish_sig.connect(self.my_sender.terminate_thread)
self.my_sender.emit_sig('hello')
if __name__ == "__main__":
app = QApplication(sys.argv)
win=window()
win.show()
sys.exit(app.exec_())
You need self to keep a reference to the thread and classes. Otherwise they are destroyed when __init__ ends.

Jython threading with thread -> queue -> thread

I'm running Jython 2.5.3 on Ubuntu 12.04 with the OpenJDK 64-bit 1.7.0_55 JVM.
I'm trying to create a simple threaded application to optimize data processing and loading. I have populator threads that read records from a database and mangles them a bit before putting them onto a queue. The queue is read by consumer threads that store the data in a different database. Here is the outline of my code:
import sys
import time
import threading
import Queue
class PopulatorThread(threading.Thread):
def __init__(self, mod, mods, queue):
super(PopulatorThread, self).__init__()
self.mod = mod
self.mods = mods
self.queue = queue
def run(self):
# Create db connection
# ...
try:
# Select one segment of records using 'id % mods = mod'
# Process these records & slap them onto the queue
# ...
except:
con.rollback()
raise
finally:
print "Made it to 'finally' in populator %d" % self.mod
con.close()
class ConsumerThread(threading.Thread):
def __init__(self, mod, queue):
super(ConsumerThread, self).__init__()
self.mod = mod
self.queue = queue
def run(self):
# Create db connection
# ...
try:
while True:
item = queue.get()
if not item: break
# Put records from the queue into
# a different database
# ...
queue.task_done()
except:
con.rollback()
raise
finally:
print "Made it to 'finally' in consumer %d" % self.mod
con.close()
def main(argv):
tread1Count = 3
tread2Count = 4
# This is the notefactsselector data queue
nfsQueue = Queue.Queue()
# Start consumer/writer threads
j = 0
treads2 = []
while j < tread2Count:
treads2.append(ConsumerThread(j, nfsQueue))
treads2[-1].start()
j += 1
# Start reader/populator threads
i = 0
treads1 = []
while i < tread1Count:
treads1.append(PopulatorThread(i, tread1Count, nfsQueue))
treads1[-1].start()
i += 1
# Wait for reader/populator threads
print "Waiting to join %d populator threads" % len(treads1)
i = 0
for tread in treads1:
print "Waiting to join a populator thread %d" % i
tread.join()
i += 1
#Add one sentinel value to queue for each write thread
print "Adding sentinel values to end of queue"
for tread in treads2:
nfsQueue.put(None)
# Wait for consumer/writer threads
print "Waiting to join consumer/writer threads"
for tread in treads2:
print "Waiting on a consumer/writer"
tread.join()
# Wait for Queue
print "Waiting to join queue with %d items" % nfsQueue.qsize()
nfsQueue.join()
print "Queue has been joined"
if __name__ == '__main__':
main(sys.argv)
I have simplified the database implementation somewhat to save space.
When I run the code, the populator and consumer threads seem to
reach the end, since I get the "Made it to finally in ..." messages.
I get the "Waiting to join n populator threads" message, and eventually the
"Waiting to join a populator thread n" messages.
I get the "Waiting to join consumer/writer threads" message as well as each of the "Waiting on a consumer/writer" messages I expect.
I get the "Waiting to join queue with 0 items" message I expect, but not the "Queue has been joined" message; apparently the program is blocking while waiting for the queue, and it never terminates.
I suspect I have my thread initializations or thread joinings in the wrong order somehow, but I have little experience with concurrent programming, so my intuitions about how to do things aren't well developed. I find plenty of Python/Jython examples of queues populated by while loops and read by threads, but none so far about queues populated by one set of threads and read by a different set.
The populator and consumer threads appear to finish.
The program seems to be blocking finally waiting for the Queue object to terminate.
Thanks to any who have suggestions and lessons for me!
Are you calling task_done() on each item in the queue when you are done processing it? If you don't tell the queue explicitly that each task is done, it'll never return from join().
PS: You don't see "Waiting to join a populator thread %d" because you forgot the print in front of it :)

How to terminate a Python3 thread correctly while it's reading a stream

I'm using a thread to read Strings from a stream (/dev/tty1) while processing other things in the main loop. I would like the Thread to terminate together with the main program when pressing CTRL-C.
from threading import Thread
class myReader(Thread):
def run(self):
with open('/dev/tty1', encoding='ascii') as myStream:
for myString in myStream:
print(myString)
def quit(self):
pass # stop reading, close stream, terminate the thread
myReader = Reader()
myReader.start()
while(True):
try:
pass # do lots of stuff
KeyboardInterrupt:
myReader.quit()
raise
The usual solution - a boolean variable inside the run() loop - doesn't work here. What's the recommended way to deal with this?
I can just set the Daemon flag, but then I won't be able to use a quit() method which might prove valuable later (to do some clean-up). Any ideas?
AFAIK, there is no built-in mechanism for that in Python 3 (just as in Python 2). Have you tried the proven Python 2 approach with PyThreadState_SetAsyncExc, documented here and here, or the alternative tracing approach here?
Here's a slightly modified version of the PyThreadState_SetAsyncExc approach from above:
import threading
import inspect
import ctypes
def _async_raise(tid, exctype):
"""raises the exception, performs cleanup if needed"""
if not inspect.isclass(exctype):
exctype = type(exctype)
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), ctypes.py_object(exctype))
if res == 0:
raise ValueError("invalid thread id")
elif res != 1:
# """if it returns a number greater than one, you're in trouble,
# and you should call it again with exc=NULL to revert the effect"""
ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, None)
raise SystemError("PyThreadState_SetAsyncExc failed")
def stop_thread(thread):
_async_raise(thread.ident, SystemExit)
Make your thread a daemon thread. When all non-daemon threads have exited, the program exits. So when Ctrl-C is passed to your program and the main thread exits, there's no need to explicitly kill the reader.
myReader = Reader()
myReader.daemon = True
myReader.start()

Resources