python multiprocessing.Queue not putting through all the values - python-3.x

I have some lists of multiprocessing.Queues to communicate between two processes. I want to send a "None" as a last value on each one of the Queues to indicate to the second process the end of the data stream, but this does not seem to always work (I get the None in some of the Queues but not in each one of them) unless I add at least one print() after one of the put() instruction.
Clarification: It works sometimes without the print, but not always. Also, when I put the print instructions, this works so far 100% of the time.
I have also tried setting block=True for the put() method, but this does not seem to make any difference.
I found this solution wile trying to debug the problem, to find out if I'm having problems while putting the values in the Queue or while getting them, but when I put a print() on the put() side, the code always works.
EDIT:
A simplified but complete version that reproduces in part the problem: I have identified two potentially problematic parts, marked in the code as CODEBLOCK1 and CODEBLOCK2: If I uncomment either one of these, the code works as expected.
minimal_example.py:
import multiprocessing, processes
def MainProcess():
multiprocessing.set_start_method("spawn")
metricsQueue = multiprocessing.Queue() # Virtually infinite size
# Define and start the parallel processes
process1 = multiprocessing.Process(target=processes.Process1,
args=(metricsQueue,))
process2 = multiprocessing.Process(target=processes.Process2,
args=(metricsQueue,))
process1.start()
process2.start()
process1.join()
process2.join()
# Script entry point
if __name__ == '__main__':
MainProcess()
processes.py:
import random, queue
def Process1(metricsQueue):
print("Start of process 1")
# Cancel join for the queues, so that upon killing this process, the main process does not block on join if there
# are still elements on the queues -> We don't mind losing data if the process is killed.
# Start of CODEBLOCK1
metricsQueue.cancel_join_thread()
# End of CODEBLOCK1
longData = random.sample(range(10205, 26512), 992)
# Start of CODEBLOCK2
# Put a big number of data in the queue
for data in longData:
try:
metricsQueue.put(data, block=False)
except queue.Full:
print("Error")
# End of CODEBLOCK2
# Once finished, push a None through all queues to mark the end of the process
try:
metricsQueue.put(None, block=False)
print("put None in metricsQueue")
except queue.Full:
print("Error")
print("End of process 1")
def Process2(metricsQueue):
print("Start of process 2")
newMetricsPoint = 0
recoveredMetrics = []
while (newMetricsPoint is not None):
# Metrics point
try:
newMetricsPoint = metricsQueue.get(block=False)
except queue.Empty:
pass
else:
if (newMetricsPoint is not None):
recoveredMetrics.append(newMetricsPoint)
print(f"got {len(recoveredMetrics)} points so far")
else:
print("get None from metricsQueue")
print("End of process 2")
This code give as a result something like this, and the second process will never end, because stuck in the wile loop:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
If I comment either CODEBLOCK1 OR CODEBLOCK2, the code will work as expected:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
get None from metricsQueue 0
End of process 2

We don't mind losing data if the process is killed.
This assumption is not correct. The closing signal None is part of the data; losing it prevents the sibling process from shutting down.
If the processes rely on a shutdown signal, do not .cancel_join_thread() for the queues used to send this signal.

Nevermind, I found the problem.
Turns out I misinterpreted what queue.cancel_join_thread() does.
This makes process 1 finish when done sending all data, even if there is some data left in the queue to be consumed by my second process. This causes all the unconsumed data to be flushed and, therefore, lost, never arriving to my second process.

Related

Python beginner having some issues with thread here

I am progressing on my project of a word puzzle. It's almost finished but I face some troubles with my thread that countdown and shut the program when time's up.
Here is a part of my code
def lose():
print("\n\nBtzzzzz!!! Times up!")
print(f.renderText('Game Over'))
quit()
# A thread that excecute lose when time's up
t = Timer(3.0, lose)
t.start()
print("\nGuess a word that contains all of the given letters.")
print("The letters are: ", end="")
count = 1
for question in Question_list:
print(question, end="")
if count < level:
print(" - ", end="")
count += 1
print()
print(correct_ans)
while True:
try:
answer = input("Your answer: ")
except ValueError:
quit()
if len(answer) > level + 2 and level != 5:
print(f"The range of letters for the word in this level is between {str(level)} and {str(level + 2)}.")
if answer == correct_ans:
answer_list.clear()
t.cancel()
if level != 5:
print("Good job! Here comes the next one.")
time.sleep(1)
break
else:
print("Wrong answer.Please try again.")
This is the output.
It seems like that it still executes the while loop once before it finally ends
So, first off, use sys.exit(), not quit() in actual scripts (quit() is added by the site module and isn't guaranteed to exist; even when it exists, it's overridden by some interpreter wrappers to do different things; quit() is intended solely for interactive use).
Secondly, quit() (and sys.exit()) ends the thread it is in (assuming nothing catches the SystemExit exception it's implemented in terms of), it doesn't end the program; the thread terminating has no effect on the main thread, which continues running, so the program doesn't actually die.
Possible solutions involve:
Polling t.is_alive() in your loop, so your loop stops running when the Timer stops
Moving your loop to a daemon thread, and having the main thread sleep for X seconds then exit (with the daemon thread terminating on its own when all non-daemon threads have exited)
(I recommend against it) Replace quit() with os._exit(), which forcibly terminates the program (this can do bad things if other threads were relying on with or try/finally blocks, or atexit or whatever to do proper cleanup)
Avoid threads entirely and just record time.monotonic() when you enter the loop, and check if you've exceeded the time limit as needed.
Have the thread explicitly call sys.stdin.close() before dying (not 100% this will work, but it should interrupt input calls by taking away the source of input; you'll probably need to handle the exception this causes in the main thread)
There are other options, but they all boil down to either sharing information between threads, implicitly (like #1 or #5) or explicitly (with an Event variable or the like), forcibly dying when the time is up (#2 or #3), or avoiding threads entirely (#4).

What is the logic behind this function and its output? - Queue

q= queue.Queue()
for i in [3,2,1]:
def f():
time.sleep(i)
print(i)
q.put(i)
threading.Thread(target=f).start()
print(q.get())
For this piece of code, it returns 1. The reason for this is because the queue is FIFO and "1" is put first as it slept the least time.
extended question,
If I continue to run q.get() twice, it still outputs the same value "1" rather than "2" and "3". Can anyone tell me why that is? Is there anything to do with threading?
Another extended question,
When the code finishes running completely, but there are still threads that haven't finished, will they get shut down immediately as the whole program finishes?
q.get()
#this gives me 1, but I suppose it should give me 2
q.get()
#this gives me 1, but I suppose it should give me 3
Update:
It is a Python 3 code.
Assuming that the language is Python3.
The second and third calls to q.get() return 1 because each of the three threads puts a 1 into the queue. There is never a 2 or a 3 in the queue.
I don't fully understand what to expect in this case—I'm not a Python expert—but the function, f does not appear to capture the value of the loop variable, i. The i in the function f appears to be the same variable as the i in the loop, and the loop leaves i==1 before any of the three threads wakes up from sleeping. So, in all three threads, i==1 by the time q.put(i) is called.
When the code finishes running completely, but there are still threads that haven't finished, will they get shut down immediately?
No. The process won't exit until all of its threads (including the main thread) have terminated. If you want to create a thread that will be automatically, forcibly, abruptly terminated when all of the "normal" threads are finished, then you can make that thread a daemon thread.
See https://docs.python.org/3/library/threading.html, and search for "daemon".

How to check if all opened threading processes are finished?

I wanted to implement some threading in my code, and it seemed at first that it was working fine. After checking my results, I have noticed that the code seems not to wait for the threads to be finished, but instead as long as they start, it continues with the rest of the code.
def start_local_process(pair):
try:
name = 'name'
some_other_function(name)
except:
print("Failed")
print("Starting a total of %d threading processes." %len(some_list))
for element in some_list:
t= Thread(target=start_local_process, args=(pair,))
t.start()
print("Closed all threading processes for " + element + "!")
I can see that it does start a thread process for each element in some_list, which exactly what I want -parallel execution for each element. But, I get the last output message immediately after starting them, what I would prefer is, if it would wait for them to finish and then so print a message that they are finished. Is there a way to do it ?
UPDATE:
So, here is a link where part of the solution was given. The function that answers if a thread is still active is .isAlive()
With this function I could know if a thread is still active or not, but what would be a neat way of rechecking the same thing until all of the functions return TRUE?
Supposing you're saving your threads to list, you can do the following thing to check if all your threads finished the work:
finished = all(not thread.is_alive() for thread in thread_list)
while not finished:
finished = all(not thread.is_alive() for thread in thread_list)
print('All task finished...')

Process finishes but cannot be joined?

To accelerate a certain task, I'm subclassing Process to create a worker that will process data coming in samples. Some managing class will feed it data and read the outputs (using two Queue instances). For asynchronous operation I'm using put_nowait and get_nowait. At the end I'm sending a special exit code to my process, upon which it breaks its internal loop. However... it never happens. Here's a minimal reproducible example:
import multiprocessing as mp
class Worker(mp.Process):
def __init__(self, in_queue, out_queue):
super(Worker, self).__init__()
self.input_queue = in_queue
self.output_queue = out_queue
def run(self):
while True:
received = self.input_queue.get(block=True)
if received is None:
break
self.output_queue.put_nowait(received)
print("\tWORKER DEAD")
class Processor():
def __init__(self):
# prepare
in_queue = mp.Queue()
out_queue = mp.Queue()
worker = Worker(in_queue, out_queue)
# get to work
worker.start()
in_queue.put_nowait(list(range(10**5))) # XXX
# clean up
print("NOTIFYING")
in_queue.put_nowait(None)
#out_queue.get() # XXX
print("JOINING")
worker.join()
Processor()
This code never completes, hanging permanently like this:
NOTIFYING
JOINING
WORKER DEAD
Why?
I've marked two lines with XXX. In the first one, if I send less data (say, 10**4), everything will finish normally (processes join as expected). Similarly in the second, if I get() after notifying the workers to finish. I know I'm missing something but nothing in the documentation seems relevant.
Documentation mentions that
When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences [...] After putting an object on an empty queue there may be an infinitesimal delay before the queue’s empty() method returns False and get_nowait() can return without raising queue.Empty.
https://docs.python.org/3.7/library/multiprocessing.html#pipes-and-queues
and additionally that
whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate.
https://docs.python.org/3.7/library/multiprocessing.html#multiprocessing-programming
This means that the behaviour you describe is caused probably by a racing condition between self.output_queue.put_nowait(received) in the worker and joining the worker with worker.join() in the Processers __init__. If joining was faster than feeding it into the queue, everything finishes fine. If it was too slow, there is an item in the queue, and the worker would not join.
Uncommenting the out_queue.get() in the main process would empty the queue, which allows joining. But as it is important for the queue to return if the queue would already be empty, using a time-out might be an option to try to wait out the racing condition, e.g out_qeue.get(timeout=10).
Possibly important might also be to protect the main routine, especially for Windows (python multiprocessing on windows, if __name__ == "__main__")

Queue/thread not affecting main process

I'm trying to utilize threading and queueing (based on a recommendation) to pause the main process.
My program basically iterates through images, opening and closing them utilizing a 3-second time-loop for each iteration.
I'm trying to use threading to interject a time.sleep(20) if a certain condition is met (x == True). The condition is being met (evident by the output of the print statement), but time.sleep(20) is not affecting the main process.
I plan to subsitute time.sleep(20) with a more complex process but for simpliclity I've used it here.
import time
import subprocess
import pickle
import keyboard
import threading
from threading import Thread
import multiprocessing
import queue
import time
with open('C:\\Users\Moondra\\Bioteck.pickle', 'rb') as file:
bio = pickle.load(file)
q = queue.LifoQueue(0)
def keyboard_press(): # This is just receiving boolean values based on key presses
while True:
q.put(keyboard.is_pressed('down'))
x = q.get()
print(x)
if x == True:
time.sleep(20)
t = Thread(target = keyboard_press, args= ())
t.start()
if __name__ == "__main__":
for i in bio[:5]:
p = subprocess.Popen(["C:\Program Files\IrfanView\i_view64.exe",'C:\\Users\Moondra\\Bioteck_charts\{}.png'.format(i)])
time.sleep(3)
p.kill()
So why isn't my thread affecting my main process?
Thank you.
Update:
So It seems I have to use flags and use flag as a global variable within my function. I would like to avoid using global but it's not working without globalizing flag within my function.
Second, I don't know how to restart the thread.
Once the thread returns the flag as false, the thread sort of just stalls.
I tried starting the thread again, with t.start, but I received the error:
RuntimeError: threads can only be started once
Here is updated code:
def keyboard_press():
while True:
global flag
q.put(keyboard.is_pressed('down'))
x = q.get()
print(x)
if x == True:
flag = False
#print('keyboard_flag is',flag)
return flag
if __name__ == "__main__":
flag = True
q = queue.LifoQueue(0)
t = Thread(target = keyboard_press, args= ())
t.start()
for i in bio[:5]:
p = subprocess.Popen(["C:\Program Files\IrfanView\i_view64.exe",'C:\\Users\Moondra\\Bioteck_charts\{}.png'.format(i)])
time.sleep(3)
print ('flag is',flag)
if flag == True:
p.kill()
else:
time.sleep(20)
p.kill()
flag = True
#t.start() #doesn't seem to work.
why isn't my thread affecting my main process?
Because you have not written any code to be executed by the keyboard_press() thread that would affect the main process.
It looks like you're trying to create a slide show that shows one image every three seconds, and you want it to pause for an extra twenty seconds when somebody presses a key. Is that right?
So, you've got one thread (the main thread) that runs the slide show, and you've got another that polls the keyboard, but your two threads don't communicate with one another.
You put a time.sleep(20) call in your keyboard thread. But that only pauses the keyboard thread. It doesn't do anything at all to the main thread.
What you need, is for the keyboard thread to set a variable that the main thread looks at after it wakes up from its three second sleep. The main thread can look at the variable, and see if a longer sleep has been requested, and if so, sleep for twenty more seconds.
Of course, after the longer sleep, you will want the main thread to re-set the variable so that it won't always sleep for twenty seconds after the first time the keyboard is touched.
P.S.: I am not a Python expert. I know that in other programming environments (e.g., Java), you also have to worry about "memory visibility." That is, when a variable is changed by one thread, there is no guarantee of when (if ever) some other thread will see the change...
...Unless, the threads use some kind of synchronization when they access the variable.
Based on what I have read (It's on the Internet! It must be true!), Python either does not have that problem now, or it did not have that problem in the recent past. I'm not sure which.
If memory consistency actually is an issue, then you will either have to use a mutex when you access the shared variable, or else you will have to make the threads communicate through some kind of a synchronized object such as a queue.

Resources