join() threads without holding the main thread in python - multithreading

I have a code that calls to threads over a loop, something like this:
def SubmitData(data):
# creating the relevant command to execute
command = CreateCommand(data)
subprocess.call(command)
def Main():
while(True):
# generating some data
data = GetData()
MyThread = threading.Thread(target=SubmitData,args=(data,))
MyThread.start()
obviously, I don't use join() on the threads.
My question is how to join() those threads without making the main thread wait for them?
Do I even need to join() them? what will happend if I won't join() them?
some important points:
the while loop is suppose to for a very long time (couple of days)
the command itself is not very long (few seconds)
I'm using threading for Performance so if someone have a better idea instead, I would like to try it out.

Popen() doesn't block. Unless CreateCommand() blocks, you could call SubmitData() in the main thread:
from subprocess import Popen
processes = []
while True:
processes = [p for p in processes if p.poll() is None] # leave only running
processes.append(Popen(CreateCommand(GetData()))) # start a new one
Do I even need to join() them? what will happend if I won't join() them?
No. You don't need to join them. All non-daemonic threads are joined automatically when the main thread exits.

Related

How to use Queue for multiprocessing with Python?

This program works fine, It should output: 0 1 2 3.
from multiprocessing import Process, Queue
NTHREADS = 4
def foo(queue, id):
queue.put(id)
if __name__ == '__main__':
queue = Queue()
procs = []
for id in range(NTHREADS):
procs.append(Process(target=foo, args=(queue, id)))
for proc in procs:
proc.start()
for proc in procs:
proc.join()
while not queue.empty():
print(queue.get())
But not with this one.
I think it stalls after join().
from multiprocessing import Process, Queue
from PIL import Image
NTHREADS = 4
def foo(queue):
img = Image.new('RGB', (200,200), color=(255,0,0))
queue.put(img)
if __name__ == '__main__':
queue = Queue()
procs = []
for i in range(NTHREADS):
procs.append(Process(target=foo, args=(queue,)))
for proc in procs:
proc.start()
for proc in procs:
proc.join()
while not queue.empty():
print(queue.get().size)
Why? How can I reach the end? How can I get my image?
I'd like to work on 4 images in parallel and then merge them into one final image.
Queues are complicated beasts under the covers. When an (pickle of an) object is put on a queue, parts of it are fed into the underlying OS interprocess communication mechanism, but the rest is left in an in-memory Python buffer, to avoid overwhelming the OS facilities. The stuff in the memory buffer is fed into the OS mechanism as the receiving end makes room for more by taking stuff off the queue.
A consequence is that a worker process cannot end before its memory buffers (feeding into queues) are empty.
In your first program, pickles of integers are so tiny that memory buffers don't come into play. A worker feeds the entire pickle to the OS in one gulp, and the worker can exit then.
But in your second program, the pickles are much larger. A worker sends part of the pickle to the OS, then waits for the main program to take it off the OS mechanism, so it can feed the next part of the pickle. Since your program never takes anything off the queue before calling .join(), the workers wait forever.
So, in general, this is the rule: never attempt to .join() until all queues have been drained.
Note this from the docs:
Warning: As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe. This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed.
Also, queue.empty() is a poor way to test for this. That can only tell you if data is on the queue at the instant it happens to execute. In parallel processing, that's at best a probabilistic approximation to the truth. In your second example, you know exactly how many items you expect to get from the queue, so this way would be reliable:
for proc in procs:
proc.start()
for i in range(NTHREADS):
print(queue.get().size)
for proc in procs: # join AFTER queue is drained
proc.join()

Process finishes but cannot be joined?

To accelerate a certain task, I'm subclassing Process to create a worker that will process data coming in samples. Some managing class will feed it data and read the outputs (using two Queue instances). For asynchronous operation I'm using put_nowait and get_nowait. At the end I'm sending a special exit code to my process, upon which it breaks its internal loop. However... it never happens. Here's a minimal reproducible example:
import multiprocessing as mp
class Worker(mp.Process):
def __init__(self, in_queue, out_queue):
super(Worker, self).__init__()
self.input_queue = in_queue
self.output_queue = out_queue
def run(self):
while True:
received = self.input_queue.get(block=True)
if received is None:
break
self.output_queue.put_nowait(received)
print("\tWORKER DEAD")
class Processor():
def __init__(self):
# prepare
in_queue = mp.Queue()
out_queue = mp.Queue()
worker = Worker(in_queue, out_queue)
# get to work
worker.start()
in_queue.put_nowait(list(range(10**5))) # XXX
# clean up
print("NOTIFYING")
in_queue.put_nowait(None)
#out_queue.get() # XXX
print("JOINING")
worker.join()
Processor()
This code never completes, hanging permanently like this:
NOTIFYING
JOINING
WORKER DEAD
Why?
I've marked two lines with XXX. In the first one, if I send less data (say, 10**4), everything will finish normally (processes join as expected). Similarly in the second, if I get() after notifying the workers to finish. I know I'm missing something but nothing in the documentation seems relevant.
Documentation mentions that
When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences [...] After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.
https://docs.python.org/3.7/library/multiprocessing.html#pipes-and-queues
and additionally that
whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate.
https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing-programming
This means that the behaviour you describe is caused probably by a racing condition between self.output_queue.put_nowait(received) in the worker and joining the worker with worker.join() in the Processers __init__. If joining was faster than feeding it into the queue, everything finishes fine. If it was too slow, there is an item in the queue, and the worker would not join.
Uncommenting the out_queue.get() in the main process would empty the queue, which allows joining. But as it is important for the queue to return if the queue would already be empty, using a time-out might be an option to try to wait out the racing condition, e.g out_qeue.get(timeout=10).
Possibly important might also be to protect the main routine, especially for Windows (python multiprocessing on windows, if __name__ == "__main__")

Thread synchronized time read

i have multiple threads running an infinite while true without them knowing of each other's existence.
Inside their respective loops i need them to check the time and do something based on it before the next iteration, something like this:
Thread:
while True:
now = timedate.now()
# do something
time.sleep(0.2)
these threads are started in my main program in such a way:
Main:
t1.start()
t2.start()
t3.start()
...
...
while True:
#main program does something
Onto the problem, i need all the threads running to receive the same time when they check for it.
I was thinking maybe about creating a class with a lock on it and a variable to store the time, the first thread that acquires the lock saves the time in it so that the following threads can read it but to me this seems quinda a hacky way of doing things (plus i wouldn't know how to check when all the threads have read the time to update it).
What would be the best way, if possible, to implement this?

Why is my thread the MainThread?

I'm creating 5 threads for handling various tasks such as reading from sensors (Raspberry Pi), TCP connections and recently recording audio (pyAudio).
I am instantiating all threads in main() identically e.g.:
if __name__ == '__main__':
main()
def main():
global network_thread
network_thread = threading.Thread(name="NET-CONN", target=network_thread_run, args=())
network_thread.start()
I keep a global reference so I can kill the threads at shutdown with join().
Now, I have added thread #5:
global audio_thread
audio_thread = threading.Thread(name="AUDIO", target=audio_thread_run(), args=())
audio_thread.start()
...but my logging indicates it's running on the MainThread. I also double-checked inside the audio_thread_run() function and it is indeed running on MainThread:
if threading.current_thread() is threading.main_thread():
logger.warning("Audio thread is the same as MainThread!")
Why is this thread running on the MainThread? Have I hit a limit on the Pi?
Let's have a look at the two places where you create threads, modified slightly so they'll fit on one line, and with white-space inserted so they line up:
net_thread = threading.Thread(name="NET", target=net_run , args=())
aud_thread = threading.Thread(name="AUD", target=aud_run(), args=())
# Hmmm, what's this suspicious-looking thing here? ---->^^
Enough fun :-) The problem is that you are actually calling the audio_thread_run() function directly from your main thread and presumably, if it ever returned, you would then try to use the result as a callable to start a thread.
If you actually got rid of the thread start stuff altogether, it would boil down to the much simpler:
audio_thread_run()
which will very much run that function from the context of the main thread.
What you need to do is to remove the parentheses so it matches what you've down with the network threads:
audio_thread = threading.Thread(name="AUDIO", target=audio_thread_run, args=())

Queue/thread not affecting main process

I'm trying to utilize threading and queueing (based on a recommendation) to pause the main process.
My program basically iterates through images, opening and closing them utilizing a 3-second time-loop for each iteration.
I'm trying to use threading to interject a time.sleep(20) if a certain condition is met (x == True). The condition is being met (evident by the output of the print statement), but time.sleep(20) is not affecting the main process.
I plan to subsitute time.sleep(20) with a more complex process but for simpliclity I've used it here.
import time
import subprocess
import pickle
import keyboard
import threading
from threading import Thread
import multiprocessing
import queue
import time
with open('C:\\Users\Moondra\\Bioteck.pickle', 'rb') as file:
bio = pickle.load(file)
q = queue.LifoQueue(0)
def keyboard_press(): # This is just receiving boolean values based on key presses
while True:
q.put(keyboard.is_pressed('down'))
x = q.get()
print(x)
if x == True:
time.sleep(20)
t = Thread(target = keyboard_press, args= ())
t.start()
if __name__ == "__main__":
for i in bio[:5]:
p = subprocess.Popen(["C:\Program Files\IrfanView\i_view64.exe",'C:\\Users\Moondra\\Bioteck_charts\{}.png'.format(i)])
time.sleep(3)
p.kill()
So why isn't my thread affecting my main process?
Thank you.
Update:
So It seems I have to use flags and use flag as a global variable within my function. I would like to avoid using global but it's not working without globalizing flag within my function.
Second, I don't know how to restart the thread.
Once the thread returns the flag as false, the thread sort of just stalls.
I tried starting the thread again, with t.start, but I received the error:
RuntimeError: threads can only be started once
Here is updated code:
def keyboard_press():
while True:
global flag
q.put(keyboard.is_pressed('down'))
x = q.get()
print(x)
if x == True:
flag = False
#print('keyboard_flag is',flag)
return flag
if __name__ == "__main__":
flag = True
q = queue.LifoQueue(0)
t = Thread(target = keyboard_press, args= ())
t.start()
for i in bio[:5]:
p = subprocess.Popen(["C:\Program Files\IrfanView\i_view64.exe",'C:\\Users\Moondra\\Bioteck_charts\{}.png'.format(i)])
time.sleep(3)
print ('flag is',flag)
if flag == True:
p.kill()
else:
time.sleep(20)
p.kill()
flag = True
#t.start() #doesn't seem to work.
why isn't my thread affecting my main process?
Because you have not written any code to be executed by the keyboard_press() thread that would affect the main process.
It looks like you're trying to create a slide show that shows one image every three seconds, and you want it to pause for an extra twenty seconds when somebody presses a key. Is that right?
So, you've got one thread (the main thread) that runs the slide show, and you've got another that polls the keyboard, but your two threads don't communicate with one another.
You put a time.sleep(20) call in your keyboard thread. But that only pauses the keyboard thread. It doesn't do anything at all to the main thread.
What you need, is for the keyboard thread to set a variable that the main thread looks at after it wakes up from its three second sleep. The main thread can look at the variable, and see if a longer sleep has been requested, and if so, sleep for twenty more seconds.
Of course, after the longer sleep, you will want the main thread to re-set the variable so that it won't always sleep for twenty seconds after the first time the keyboard is touched.
P.S.: I am not a Python expert. I know that in other programming environments (e.g., Java), you also have to worry about "memory visibility." That is, when a variable is changed by one thread, there is no guarantee of when (if ever) some other thread will see the change...
...Unless, the threads use some kind of synchronization when they access the variable.
Based on what I have read (It's on the Internet! It must be true!), Python either does not have that problem now, or it did not have that problem in the recent past. I'm not sure which.
If memory consistency actually is an issue, then you will either have to use a mutex when you access the shared variable, or else you will have to make the threads communicate through some kind of a synchronized object such as a queue.

Resources