Two queues: the script doesn't exit - multithreading

I wrote a script that uses 2 queues and 3 types of worker: producer, consumer (CPU-bound task), writer (I need to write the results sequentially).
This is the simplified version of my code:
from queue import Queue
from threading import Thread
def compute_single_score(data):
#do lots of calculations
return 0.0
def producer(out_q, data_to_compute):
while stuff:
data = data_to_compute.popitem()
out_q.put(data)
out_q.put(_sentinel)
def consumer(in_q, out_q):
while True:
data = in_q.get()
if data is _sentinel:
in_q.put(_sentinel)
break
out_q.put([data[0], compute_single_score(*data)])
in_q.task_done()
def writer(in_q):
while True:
data = in_q.get()
if data is _sentinel:
in_q.put(_sentinel)
break
in_q.task_done()
if __name__ == '__main__':
_sentinel = object()
jobs_queue = Queue()
scores_queue = Queue()
t1 = Thread(target=producer, args=(jobs_queue, data_to_compute,))
t2 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t3 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t4 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t5 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t6 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t7 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t8 = Thread(target=consumer, args=(jobs_queue,scores_queue,))
t9 = Thread(target=writer, args=(scores_queue,))
t1.start(); t2.start(); t3.start(); t4.start(); t5.start(); t6.start(); t7.start(); t8.start(); t9.start()
jobs_queue.join()
scores_queue.join()
print('File written')
It immediately prints out 'File written', instead waiting for the queues to be empty. Consequently the script doesn't exit although all the calculations are performed. Two threads seem to remain active.
Thanks a lot for your support.

It does wait for queues to be empty. But since putting things in queue happens in threads then it reaches .join() line faster then .put() happens. So when it does reach .join() queues are empty.
Now I'm not sure what you are trying to achieve simply because a producer has a while stuff loop. I assume that you want to continue processing until this condition is true. In particular you have to wait until t1 thread quits, i.e.
t1.start(); t2.start(); t3.start(); t4.start(); t5.start(); t6.start(); t7.start(); t8.start(); t9.start()
t1.join() # <-- this is important
jobs_queue.join()
scores_queue.join()
print('File written')
Otherwise you won't be able to synchronize it.
Side note 1: due to GIL there is no point in creating CPU bound threads. If your threads are not doing any IO (and they don't) then it will perform better when single-threaded. Well at least multiple consumer threads are pointless.
Side note 2: Do not use commas. It's not pythonic. Instead do this:
threads = []
threads.append(Thread(target=producer, args=(jobs_queue, data_to_compute,)))
threads.append(Thread(target=writer, args=(scores_queue,)))
for i in range(10):
threads.append(Thread(target=consumer, args=(jobs_queue,scores_queue,)))
for t in threads:
t.start()
threads[0].join()
Side note 3: You should handle case when queues are empty. data = in_q.get() will block forever meaning that your script won't quit (unless threads are marked as daemon). You should do for example:
try:
data = in_q.get(timeout=1)
except queue.Empty:
# handle empty queue here, perhaps quit if t1 is not alive
# otherwise just continue the loop
if not t1.is_alive(): # <-- you have to pass t1 to the thread
break
else:
continue
and then join all threads at the end (see side note 2) of the main thread:
for t in threads:
t.start()
for t in threads:
t.join()
print('File written')
And now you don't even have to join queues.

This is the code I used in the end (according to the requirements illustrated before):
from multiprocessing import JoinableQueue
from multiprocessing import Process
def compute_single_score(data):
#do lots of calculations
return 0.0
def producer(out_q, data_to_compute):
while stuff:
data = data_to_compute.popitem()
out_q.put(data)
def consumer(in_q, out_q):
while True:
try:
data = in_q.get(timeout=5)
except:
break
out_q.put([data[0], compute_single_score(*data)])
in_q.task_done()
def writer(in_q):
while True:
try:
data = in_q.get(timeout=5)
except:
break
#write
in_q.task_done()
if __name__ == '__main__':
jobs_queue = JoinableQueue()
scores_queue = JoinableQueue()
processes = []
processes.append(Process(target=producer, args=(jobs_queue, data_to_compute,)))
processes.append(Process(target=writer, args=(scores_queue,)))
for i in range(10):
processes.append(Process(target=consumer, args=(jobs_queue,scores_queue,)))
for p in processes:
p.start()
processes[1].join()
scores_queue.join()
print('File written')
I hope it will be of help for somebody else.

Related

How to pass data between 3 threads that contain while True loops in Python?

Im trying to generate data in two threads and get that data in a separate thread that prints the data.
3 threads, 2 threads generate data , 1 thread consumes the data generated.
The Problem: not getting both generated data into the consumer thread
How can I pass data generated in 2 threads and deliver it in the consumer thread?
#from threading import Thread
import concurrent.futures
import time
# A thread that produces data
def producer(out_q):
while True:
# Produce some data
global data
data = data + 2
out_q.put(data)
# Another thread that produces data
def ac(out2_q):
while True:
global x
x = x + 898934567
out2_q.put(data)
# A thread that consumes data
def consumer(in_q):
while True:
# Get BOTH produced data from 2 threads
data = in_q.get()
# Process the data
time.sleep(.4)
print(data, end=' ', flush=True)
x=0
data = 0
q = Queue()
with concurrent.futures.ThreadPoolExecutor() as executor:
t1 = executor.submit(consumer, q)
t2 = executor.submit(producer,q)
t3 = executor.submit(ac, q)```
I recommend to go with threading.Thread in this case. Please see the code below and follow comments. Feel free to ask questions.
from threading import Thread, Event
from queue import Queue
import time
def producer_one(q: Queue, e: Event):
while not e.is_set():
q.put("one")
time.sleep(1)
print("Producer # one stopped")
def producer_two(q: Queue, e: Event):
while not e.is_set():
q.put("two")
time.sleep(2)
print("Producer # two stopped")
def consumer(q: Queue):
while True:
item = q.get()
print(item)
q.task_done() # is used to unblock queue - all tasks were done
time.sleep(2)
# will never be printed ! - since it is daemon thread
print("All work is done by consumer!")
if __name__ == '__main__':
_q = Queue() # "connects" threads
_e = Event() # is used to stop producers from the Main Thread
# create threads block
producer_th1 = Thread(target=producer_one, args=(_q, _e, ))
producer_th2 = Thread(target=producer_two, args=(_q, _e, ))
# daemon means that thread will be stopped when main thread stops
consumer_th = Thread(target=consumer, args=(_q, ), daemon=True)
try:
# starts block:
producer_th1.start()
producer_th2.start()
consumer_th.start()
time.sleep(20)
_e.set() # ask producers to stop
except KeyboardInterrupt:
_e.set() # ask producer threads to stop
print("Asked Producer Threads to stop")
finally:
producer_th1.join() # main thread is block until producer_th1 is not stopped
producer_th2.join() # main thread is block until producer_th2 is not stopped
_q.join() # now wait consumer to finish all tasks from queue
print("Queue is empty and program will be finished soon")
time.sleep(2) # just wait 2 seconds to show that consumer stops with main thread
print("All done!")

Getting returning value from multithreading in python 3

I'm trying to get one or several returning values from a thread in a multithreading process. The code I show get cycled with no way to interrupt it with Ctrl-C, Ctrl+D.
import queue as Queue
import threading
class myThread (threading.Thread):
def __init__(self, threadID, name, region):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.region = region
def run(self):
GetSales(self.region)
def GetSales(strReg):
print("Thread-" + strReg)
return "Returning-" + strReg
def Main():
RegionList = []
RegionList.append("EMEA")
RegionList.append("AP")
RegionList.append("AM")
# Create threads
threads = []
x = 0
for region in RegionList:
x += 1
rthread = myThread(x, "Thread-" + region, region) # Create new thread
rthread.start() # Start new thread
threads.append(rthread) # Add new thread to threads list
que = Queue.Queue()
# Wait for all threads to complete
for t in threads:
t.join()
result = que.get()
print(t.name + " -> Done")
Main()
If I comment line "result = que.get()" the program runs with no issues.
What you are looking for is future and async management.
Firstly, your program loop indefinitely because of the line que.get(), because there is nothing in the queue, it wait that something happen, which will never happen. You don't use it.
What you want to do is an async task and get the result :
import asyncio
async def yourExpensiveTask():
// some long calculation
return 42
async main():
tasks = []
tasks += [asyncio.create_task(yourExpensiveTask())]
tasks += [asyncio.create_task(yourExpensiveTask())]
for task in tasks:
result = await task
print(result)
See also https://docs.python.org/3/library/asyncio-task.html

How to stop a specific Thread among others?

I'm using threads for a project which look like this :
thread1 = Thread(target=function, args=('x','y',1,2))
thread2 = Thread(target=function, args=('a','b',1,2))
thread1.start()
thread2.start()
Everything is working but I wanted to add an option to my code. To kill my threads i'm currently using While X==True in my targeted function. So when I want to kill a Thread i have to pass While==False.
The issue is doing that kill all the threads who use this function.
So how can i kill only thread1 without doing the same for thread2 if both were running together and using the same targeted function ?
Thank you !
Below a simplified example of what i'm actually doing
def test_thread(freq):
starttime=time.time()
while RUN==True:
try:
if 1==1:
print('1')
sleep(freq - ((time.time() - starttime) % freq))
except Exception as Ex:
print(Ex)
pass
RUN = True
run_test = Thread(target=test_thread, args=(20))
run_test.start()
You could pass a different, mutable object as an argument to each of the two threads:
class Stopper:
def __init__(self):
self.flag = True
def ok_to_keep_going(self):
return self.flag
def stop_now(self):
self.flag = False
def test_thread(freq, stopper):
...
while stopper.ok_to_keep_going():
...
if __name__ == '__main__':
t1_stopper = Stopper()
t2_stopper = Stopper()
t1 = Thread(target=test_thread, args=(T1_FREQ, t1_stopper))
t2 = Thread(target=test_thread, args=(T2_FREQ, t2_stopper))
t1.start()
t2.start()
Now you can stop thread 1 by calling t1_stopper.stop_now(), or stop thread 2 by calling t2_stopper.stop_now()
Or, for fewer lines of code:
def test_thread(freq, stopper):
...
while stopper[0]:
...
if __name__ == '__main__':
t1_stopper = [True]
t2_stopper = [True]
t1 = Thread(target=test_thread, args=(T1_FREQ, t1_stopper))
t2 = Thread(target=test_thread, args=(T2_FREQ, t2_stopper))
t1.start()
t2.start()
Now you stop thread t1 by setting t1_stopper[0]=False.

using time.sleep() in Thread python3

Im trying to make a simple thread in python3 where the test1 will run until a certain amount of number and then sleep while the test2 will still be running and also when it reaches a certain number it will go to sleep.
My code goes like this:
def test2(count):
if count == 8:
print("sleep for 4 sec")
time.sleep(3.0)
print("test2 thread = {}".format(count))
def test1(count):
if count == 5:
print("sleep for 5 sec")
time.sleep(3.0)
print("test1 thread = {}".format(count))
for num in range(0,10):
t1 = threading.Thread(target=test1, args=(num,))
t2 = threading.Thread(target=test2, args=(num,))
t1.start()
t2.start()
Also, i been coding python before but without using thread and now i wanted to have a go on it and hope this will end well :)
ohh, and additionally the output doesn't matter if they overlap.
The threading.Thread() creates new thread and t1.start() just dispatch it.
This code:
for num in range(0,10):
t1 = threading.Thread(target=test1, args=(num,))
t2 = threading.Thread(target=test2, args=(num,))
t1.start()
t2.start()
actually creates and start 2 new threads per iteration. At the end you have 20 threads + master thread.
Also when you start thread you should wait until it ends or run it as daemon thread. With daemon thread you are saying I don't care what you do and when you end.
Basic thread usage can looks like this:
import threading
def do_stuff():
print("Stuff on thread {}".format(threading.get_ident()))
print("Main thread {}".format(threading.get_ident()))
t = threading.Thread(target=do_stuff) # Specify what should be running in new thread
t.start() # Dispatch thread
t.join() # Wait until the thread is done
Note: threading.get_ident() gives you unique identifier of the thread where this function is called.
Now from you example if you want start 2 independent threads you can do this:
import threading
import time
def test2():
for count in range(0, 10):
if count == 8:
print("test2: sleep for 4 sec")
time.sleep(3.0)
print("test2: thread = {}".format(count))
def test1():
for count in range(0, 10):
if count == 5:
print("test 1: sleep for 5 sec")
time.sleep(3.0)
print("test1: thread = {}".format(count))
t1 = threading.Thread(target=test1)
t2 = threading.Thread(target=test2)
t1.start()
t2.start()
t1.join()
t2.join()
But you might want to synchronize those threads and send them some item at the "same" time.
import threading
# Create threads
t1 = threading.Thread(target=test1)
t2 = threading.Thread(target=test2)
# Run threads
t1.start()
t2.start()
# Go through some list or whatever
for num in range(0,10):
# send num to t1
# send num to t2
# wait for t1 and t2
pass
# Wait until threads are finished with their jobs
t1.join()
t2.join()
For sending value to other thread we can user queue.Queue. You can safely put there value in one thread and second thread can read it or wait until there is something (or multiple thread can write and multiple thread can read).
import threading
import time
import queue
def test2(q):
while True:
count = q.get() # Get data from the q2 queue
if count == 8:
print("test2: sleep for 4 sec")
time.sleep(3.0)
print("test2: thread = {}".format(count))
def test1(q):
while True:
count = q.get() # Get data from the q1 queue
if count == 5:
print("test 1: sleep for 5 sec")
time.sleep(3.0)
print("test1: thread = {}".format(count))
# Creates queues
q1 = queue.Queue()
q2 = queue.Queue()
# Create threads
t1 = threading.Thread(target=test1, args=(q1, ))
t2 = threading.Thread(target=test2, args=(q2, ))
# Run threads
t1.start()
t2.start()
# Go through some list or whatever
for num in range(0, 10):
# send num to t1
q1.put(num)
# send num to t2
q2.put(num)
# wait for t1 and t2
# ???
# Wait until threads are finished with their jobs
t1.join()
t2.join()
Oh wait... how can we know that threads are done with their work and we can send another value? Well we can use Queue again. Create new pair and sending e.g. True at the end of the test? function and then wait read in main loop from those queues. But for sending state information we should use threading.Event.
import threading
import time
import queue
def test2(q, e):
while True:
count = q.get() # Get data from the q2 queue
if count == 8:
print("test2: sleep for 4 sec")
time.sleep(3.0)
print("test2: thread = {}".format(count))
e.set() # Inform master the processing of given value is done
def test1(q, e):
while True:
count = q.get() # Get data from the q1 queue
if count == 5:
print("test 1: sleep for 5 sec")
time.sleep(3.0)
print("test1: thread = {}".format(count))
e.set() # Inform master the processing of given value is done
# Creates queues
q1 = queue.Queue()
q2 = queue.Queue()
# Create events
e1 = threading.Event()
e2 = threading.Event()
# Create threads
t1 = threading.Thread(target=test1, args=(q1, e1))
t2 = threading.Thread(target=test2, args=(q2, e2))
# Run threads
t1.start()
t2.start()
# Go through some list or whatever
for num in range(0, 10):
# send num to t1
q1.put(num)
# send num to t2
q2.put(num)
# wait for t1
e1.wait()
# wait for t2
e2.wait()
# Wait until threads are finished with their jobs
t1.join()
t2.join()
Now we are almost there but the script never ends. It's because the test? functions (threads) waits in infinite loop for data (from queues q1/q2). We need some way how to tell them "Ok, that's all folks". For that we can say None value in queues means end. The result following:
import threading
import time
import queue
def test2(q, e):
while True:
count = q.get() # Get data from the q2 queue
if count is None: # Exit on None value
return
if count == 8:
print("test2: sleep for 4 sec")
time.sleep(3.0)
print("test2: thread = {}".format(count))
e.set() # Inform master the processing of given value is done
def test1(q, e):
while True:
count = q.get() # Get data from the q1 queue
if count is None: # Exit on None value
return
if count == 5:
print("test 1: sleep for 5 sec")
time.sleep(3.0)
print("test1: thread = {}".format(count))
e.set() # Inform master the processing of given value is done
# Creates queues
q1 = queue.Queue()
q2 = queue.Queue()
# Create events
e1 = threading.Event()
e2 = threading.Event()
# Create threads
t1 = threading.Thread(target=test1, args=(q1, e1))
t2 = threading.Thread(target=test2, args=(q2, e2))
# Run threads
t1.start()
t2.start()
# Go through some list or whatever
for num in range(0, 10):
# send num to t1
q1.put(num)
# send num to t2
q2.put(num)
# wait for t1
e1.wait()
# wait for t2
e2.wait()
# Inform threads to exit
q1.put(None)
q2.put(None)
# Wait until threads are finished with their jobs
t1.join()
t2.join()
Note: instead of using parameters in threads "main" functions you can use global variables, because global variables or class attributes are shared across all threads. But usually it is bad practice.
Be aware of gotchas coming with threading, for example exception handling is not so easy. Imagine that function test1 raises exception before calling e.set(). Then the master thread never ends waiting on e1.wait().
Also CPython (the most common implementation of the Python) has something called GIL, which basically (with some exceptions) allows running only 1 thread at a time and the others are sleeping.
Threading documentation
Queue documentation

Mutex with queue ordered threads

I have a list of threads. The following code releases the mutex lock at the end of the block using the 'with' statement. This is very useful as it allows the user to cycle through each thread and choose to stop it or keep it running.
import threading
#subclass with state
class Mythread(threading.Thread):
def __init__(self,myId, astr, mutex):
self.myId = myId
self.astr = astr
self.mutex = mutex
threading.Thread.__init__(self)
def run(self):
while True:
with self.mutex:
print('[%s] => %s' % (self.myId, self.astr))
ans=raw_input("Enter s to stop thread...")
if ans == 's':
break
stdoutmutex = threading.Lock()
threads = []
for i,j in zip(range(7),['A', 'B', 'C','D','E','F','G']):
thread = Mythread(i,j,stdoutmutex)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
To ensure the threads are cycled through in the order as found in the 'threads' list, I've used the Queue module to control the order of the thread locks:
thread = q.get()
with thread.mutex:
The modified script:
import threading, Queue
#subclass with state
class Mythread(threading.Thread):
def __init__(self,myId, astr, mutex):
self.myId = myId
self.astr = astr
self.mutex = mutex
threading.Thread.__init__(self)
def run(self):
while True:
thread = q.get()
with thread.mutex:
print('[%s] => %s' % (self.myId, self.astr))
ans=raw_input("Enter s to stop thread...")
if ans == 's':
q.task_done()
break
else:
q.put(thread)
stdoutmutex = threading.Lock()
threads = []
q = Queue.Queue()
for i,j in zip(range(7),['A', 'B', 'C','D','E','F','G']):
thread = Mythread(i,j,stdoutmutex)
threads.append(thread)
for thread in threads:
q.put(thread)
thread.start()
for thread in threads:
thread.join()
This appears to work as the correct thread order A,B,C... is sent to the standard output. However, can it be verified that the Queue is working and it isn't just a coincidence?

Resources