q= queue.Queue()
for i in [3,2,1]:
def f():
time.sleep(i)
print(i)
q.put(i)
threading.Thread(target=f).start()
print(q.get())
For this piece of code, it returns 1. The reason for this is because the queue is FIFO and "1" is put first as it slept the least time.
extended question,
If I continue to run q.get() twice, it still outputs the same value "1" rather than "2" and "3". Can anyone tell me why that is? Is there anything to do with threading?
Another extended question,
When the code finishes running completely, but there are still threads that haven't finished, will they get shut down immediately as the whole program finishes?
q.get()
#this gives me 1, but I suppose it should give me 2
q.get()
#this gives me 1, but I suppose it should give me 3
Update:
It is a Python 3 code.
Assuming that the language is Python3.
The second and third calls to q.get() return 1 because each of the three threads puts a 1 into the queue. There is never a 2 or a 3 in the queue.
I don't fully understand what to expect in this case—I'm not a Python expert—but the function, f does not appear to capture the value of the loop variable, i. The i in the function f appears to be the same variable as the i in the loop, and the loop leaves i==1 before any of the three threads wakes up from sleeping. So, in all three threads, i==1 by the time q.put(i) is called.
When the code finishes running completely, but there are still threads that haven't finished, will they get shut down immediately?
No. The process won't exit until all of its threads (including the main thread) have terminated. If you want to create a thread that will be automatically, forcibly, abruptly terminated when all of the "normal" threads are finished, then you can make that thread a daemon thread.
See https://docs.python.org/3/library/threading.html, and search for "daemon".
Related
When using the multi-threaded approach to solve IO Bound problems in Python, this works by freeing the GIL. Let us suppose we have Thread1 which takes 10 seconds to read a file, during this 10 seconds it does not require the GIL and can leave Thread2 to execute code. Thread1 and Thread2 are effectively running in parallel because Thread1 is doing system call operations and can execute independently of Thread2, however Thread1 is still executing code.
Now, suppose we have a setup using asyncio or any asynchronous programming code. When we do something such as,
file_content = await ten_second_long_file_read()
During the time in which await is called, system calls are done to read the content of the files and when it is done an event is sent back and code execution can be later continue. During the time we are await'ing, other code can be ran.
My confusion comes from the fact that asynchronous programming is primarily single threaded. With the multiple threaded approach when T1 is reading from a file, it is still performing code execution, it simply free'd the GIL to perform work in parallel with another thread. However with asynchronous programming, when we are awaiting, how is it performing other tasks when we are waiting, aswell as reading data in a single thread? I understand the multiple-threaded idea, but not asynchronous because it is still performing the system calls in a single thread. With asynchronous programming it has nowhere to free the GIL to, considering there is only one thread. Is asyncio secretly using threads?
The number of filehandles is independent of the GIL, and threads. Posix select documentation gives a bit of an idea of the distinct mechanism around file handles.
To illustrate I created three files, 1.txt etc. These are just:
1
one
Obviously open for reading is ok but not for writing. To make a ten second read I just held the filehandle open for ten seconds, reading the first line, waiting 10 seconds, then reading the second line.
asyncio version
import asyncio
from threading import active_count
do = ['1.txt', '2.txt', '3.txt']
async def ten_second_long_file_read():
while do:
doing = do.pop()
with open(doing, 'r') as f:
print(f.readline().strip())
await asyncio.sleep(10)
print(f"threads {active_count()}")
print(f.readline().strip())
async def main():
await asyncio.gather(asyncio.create_task(ten_second_long_file_read()),
asyncio.create_task(ten_second_long_file_read()))
asyncio.run(main())
This produces a very predictable output and as expected, one thread only.
3
2
threads 1
three
1
threads 1
two
threads 1
one
threading - changes
Remove async of course. Swap asyncio.sleep(10) for time.sleep(10). The main change is the calling function.
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
Also a fairly predictable output, however you cannot rely on this.
3
2
threads 3
three
threads 3
two
1
threads 2
one
Running the same threaded version in debug the output is a bit random, on one run on my computer this was:
23
threads 3threads 3
twothree
1
threads 2
one
This highlights a difference in threads in that the running thread is pre-emptively switched creating a whole bundle of complexity under the heading thread safety. This issue does not exist in asyncio as there is a single thread.
multi-processing
Similar to the threaded code however __name__ == '__main__' is required and the process pool executor provides a snapshot of the context.
def main():
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as e:
e.submit(ten_second_long_file_read)
e.submit(ten_second_long_file_read)
if __name__ == '__main__': # required for executor
main()
Two big differences. No shared understanding of the do list so everything is done twice. Processes don't know what the other process has done. More CPU power available, however more work required to manage the load.
Three processes required for this so the overhead is large, however each process only has one thread.
3
3
threads 1
threads 1
three
three
2
2
threads 1
threads 1
two
two
1
1
threads 1
threads 1
one
one
I have some lists of multiprocessing.Queues to communicate between two processes. I want to send a "None" as a last value on each one of the Queues to indicate to the second process the end of the data stream, but this does not seem to always work (I get the None in some of the Queues but not in each one of them) unless I add at least one print() after one of the put() instruction.
Clarification: It works sometimes without the print, but not always. Also, when I put the print instructions, this works so far 100% of the time.
I have also tried setting block=True for the put() method, but this does not seem to make any difference.
I found this solution wile trying to debug the problem, to find out if I'm having problems while putting the values in the Queue or while getting them, but when I put a print() on the put() side, the code always works.
EDIT:
A simplified but complete version that reproduces in part the problem: I have identified two potentially problematic parts, marked in the code as CODEBLOCK1 and CODEBLOCK2: If I uncomment either one of these, the code works as expected.
minimal_example.py:
import multiprocessing, processes
def MainProcess():
multiprocessing.set_start_method("spawn")
metricsQueue = multiprocessing.Queue() # Virtually infinite size
# Define and start the parallel processes
process1 = multiprocessing.Process(target=processes.Process1,
args=(metricsQueue,))
process2 = multiprocessing.Process(target=processes.Process2,
args=(metricsQueue,))
process1.start()
process2.start()
process1.join()
process2.join()
# Script entry point
if __name__ == '__main__':
MainProcess()
processes.py:
import random, queue
def Process1(metricsQueue):
print("Start of process 1")
# Cancel join for the queues, so that upon killing this process, the main process does not block on join if there
# are still elements on the queues -> We don't mind losing data if the process is killed.
# Start of CODEBLOCK1
metricsQueue.cancel_join_thread()
# End of CODEBLOCK1
longData = random.sample(range(10205, 26512), 992)
# Start of CODEBLOCK2
# Put a big number of data in the queue
for data in longData:
try:
metricsQueue.put(data, block=False)
except queue.Full:
print("Error")
# End of CODEBLOCK2
# Once finished, push a None through all queues to mark the end of the process
try:
metricsQueue.put(None, block=False)
print("put None in metricsQueue")
except queue.Full:
print("Error")
print("End of process 1")
def Process2(metricsQueue):
print("Start of process 2")
newMetricsPoint = 0
recoveredMetrics = []
while (newMetricsPoint is not None):
# Metrics point
try:
newMetricsPoint = metricsQueue.get(block=False)
except queue.Empty:
pass
else:
if (newMetricsPoint is not None):
recoveredMetrics.append(newMetricsPoint)
print(f"got {len(recoveredMetrics)} points so far")
else:
print("get None from metricsQueue")
print("End of process 2")
This code give as a result something like this, and the second process will never end, because stuck in the wile loop:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
If I comment either CODEBLOCK1 OR CODEBLOCK2, the code will work as expected:
Start of process 1
Start of process 2
put None in metricsQueue 0
End of process 1
get None from metricsQueue 0
End of process 2
We don't mind losing data if the process is killed.
This assumption is not correct. The closing signal None is part of the data; losing it prevents the sibling process from shutting down.
If the processes rely on a shutdown signal, do not .cancel_join_thread() for the queues used to send this signal.
Nevermind, I found the problem.
Turns out I misinterpreted what queue.cancel_join_thread() does.
This makes process 1 finish when done sending all data, even if there is some data left in the queue to be consumed by my second process. This causes all the unconsumed data to be flushed and, therefore, lost, never arriving to my second process.
Lets say I have a function that will run in its own thread since its gettign serial data through a port.
def serialDataIncoming ():
device = Radar()
device.connect(port 1, baudrate 256000)
serialdata = device.startscan
for count, scan in enumerate(serialdata):
distance = device.distance
sector = device.angle
Now I want to run this in its own thread
try:
thread.start_new_thread(serialDataIncoming())
except:
# error handling here
now , I want to add to the code of serialDataIncoming(), a line where I send the distance and sector to another function to be processed and then send somewhere else, now here is this issue, the data incoming from "device" is continusly being sent, I can experience a delay or even lose some data if I lose some time inside the loop for another loop, so I want to create a new thread and from that thread run a function that will receive data from the first thread and process it and do whatever.
def dataProcessing():
# random code here where I process the data
However my issue is , how do I send both variables from one thread to the second thread, in my mind within multiple threads the second thread would have to wait until it receives variables and then start working, its going to be send a lot of data at the same time so I might have to introduce a third thread that would hold that data and then send it to the thread that processes.
So the question is basically that, how would I write in python sending 2 variables to another thread, and how would that be written in the function being used on the second thread?
To pass arguments to the thread function you can do:
def thread_fn(a, b, c):
print(a, b, c)
thread.start_new_thread(thread_fn, ("asdsd", 123, False))
The list of arguments must be a tuple or list. However in Python only one thread is actually running at a time so it may actually be more reliable (and simpler) to work out a way to do this with one thread. From the sounds of it you are polling the data so this is not like file access where the OS will notify the thread when it can wake up again once the file operation has completed (hence you wont get the kind of gains you would from multithreaded file access.)
I wanted to implement some threading in my code, and it seemed at first that it was working fine. After checking my results, I have noticed that the code seems not to wait for the threads to be finished, but instead as long as they start, it continues with the rest of the code.
def start_local_process(pair):
try:
name = 'name'
some_other_function(name)
except:
print("Failed")
print("Starting a total of %d threading processes." %len(some_list))
for element in some_list:
t= Thread(target=start_local_process, args=(pair,))
t.start()
print("Closed all threading processes for " + element + "!")
I can see that it does start a thread process for each element in some_list, which exactly what I want -parallel execution for each element. But, I get the last output message immediately after starting them, what I would prefer is, if it would wait for them to finish and then so print a message that they are finished. Is there a way to do it ?
UPDATE:
So, here is a link where part of the solution was given. The function that answers if a thread is still active is .isAlive()
With this function I could know if a thread is still active or not, but what would be a neat way of rechecking the same thing until all of the functions return TRUE?
Supposing you're saving your threads to list, you can do the following thing to check if all your threads finished the work:
finished = all(not thread.is_alive() for thread in thread_list)
while not finished:
finished = all(not thread.is_alive() for thread in thread_list)
print('All task finished...')
I got the source code from http://www.saltycrane.com/blog/2008/09/simplistic-python-thread-example/ however when I tried to modify the code to my needs the results are not what I wanted.
import time
from threading import Thread
def myfunc():
time.sleep(2)
print("thread working on something")
while 1:
thread = Thread(target=myfunc())
thread.start()
print("looping")
and got the results of
thread working on something
looping
// wait 2 secondd
thread working on something
looping
// wait 2 seconds
thread working on something
looping
// wait 2 seconds and so on
thread working on something
looping
// wait 2 seconds
but then I have to wait 2 seconds before I do anything.
I want to be able to do anything while the thread does something else like checking things in an array and compare them.
In the main loop, you are initialising and starting a new thread an endless number of times. In reality you will have millions of threads running. This of course is not practical and would soon crash the program.
The reason your program does not crash is that the function that is running in the thread is executed and ends in the one pass i.e. you do not have a loop in the thread function to keep the thread alive and working.
Suggestion.
Add a loop to your threading function (myfunc) that will continue to run indefinitely in the background.
Initialise and call the thread function outside of the loop in your main section. In this way you will create only 1 thread that will run its own loop in the background. You could of course run a number of these same threads in the background if you called it more than once.
Now create a loop in your main body, and continue with your array checking or any other task that you want to run whilst the threading function continues to run in the background.
Something like this may help
import time
from threading import Thread
def myfunc():
counter = 0
while 1>0:
print "The thread counter is at ", counter
counter += 1
time.sleep (2)
thread = Thread(target=myfunc)
thread.start()
# The thread has now initialised and is running in the background
mCounter = 0
while 1:
print "Main loop counter = ", mCounter
mCounter += 1
time.sleep (5)
In this example, the thread will print a line every 2 seconds, and the main loop will print a line every 5 seconds.
Be careful to close your thread down. In some cases, a keyboard interrupt will stop the main loop, but the thread will keep on running.
I hope this helps.