Why is this queue.join call blocking indefinitely? - python-3.x

I'm playing about with a personal project in python3.6 and I've run into the following issue which results in the my_queue.join() call blocking indefinitely. Note this isn't my actual code but a minimal example demonstrating the issue.
import threading
import queue
def foo(stop_event, my_queue):
while not stop_event.is_set():
try:
item = my_queue.get(timeout=0.1)
print(item) #Actual logic goes here
except queue.Empty:
pass
print('DONE')
stop_event = threading.Event()
my_queue = queue.Queue()
thread = threading.Thread(target=foo, args=(stop_event, my_queue))
thread.start()
my_queue.put(1)
my_queue.put(2)
my_queue.put(3)
print('ALL PUT')
my_queue.join()
print('ALL PROCESSED')
stop_event.set()
print('ALL COMPLETE')
I get the following output (it's actually been consistent, but I understand that the output order may differ due to threading):
ALL PUT
1
2
3
No matter how long I wait I never see ALL PROCESSED output to the console, so why is my_queue.join() blocking indefinitely when all the items have been processed?

From the docs:
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
You're never calling q.task_done() inside your foo function. The foo function should be something like the example:
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()

Related

Best way to keep creating threads on variable list argument

I have an event that I am listening to every minute that returns a list ; it could be empty, 1 element, or more. And with those elements in that list, I'd like to run a function that would monitor an event on that element every minute for 10 minute.
For that I wrote that script
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
import Client
client = Client()
def handle_event(event):
for i in range(10):
client.get_info(event)
sleep(60)
async def main():
while True:
entires = client.get_new_entry()
if len(entires) > 0:
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
executor.map(handle_event, entires)
await asyncio.sleep(60)
if __name__ == "__main__":
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
However, instead of keep monitoring the entries, it blocks while the previous entries are still being monitors.
Any idea how I could do that please?
First let me explain why your program doesn't work the way you want it to: It's because you use the ThreadPoolExecutor as a context manager, which will not close until all the threads started by the call to map are finished. So main() waits there, and the next iteration of the loop can't happen until all the work is finished.
There are ways around this. Since you are using asyncio already, one approach is to move the creation of the Executor to a separate task. Each iteration of the main loop starts one copy of this task, which runs as long as it takes to finish. It's a async def function so many copies of this task can run concurrently.
I changed a few things in your code. Instead of Client I just used some simple print statements. I pass a list of integers, of random length, to handle_event. I increment a counter each time through the while True: loop, and add 10 times the counter to every integer in the list. This makes it easy to see how old calls continue for a time, mixing with new calls. I also shortened your time delays. All of these changes were for convenience and are not important.
The important change is to move ThreadPoolExecutor creation into a task. To make it cooperate with other tasks, it must contain an await expression, and for that reason I use executor.submit rather than executor.map. submit returns a concurrent.futures.Future, which provides a convenient way to await the completion of all the calls. executor.map, on the other hand, returns an iterator; I couldn't think of any good way to convert it to an awaitable object.
To convert a concurrent.futures.Future to an asyncio.Future, an awaitable, there is a function asyncio.wrap_future. When all the futures are complete, I exit from the ThreadPoolExecutor context manager. That will be very fast since all of the Executor's work is finished, so it does not block other tasks.
import random
from concurrent.futures import ThreadPoolExecutor
from time import sleep
import asyncio
def handle_event(event):
for i in range(10):
print("Still here", event)
sleep(2)
async def process_entires(counter, entires):
print("Counter", counter, "Entires", entires)
x = [counter * 10 + a for a in entires]
with ThreadPoolExecutor(max_workers=len(entires)) as executor:
futs = []
for z in x:
futs.append(executor.submit(handle_event, z))
await asyncio.gather(*(asyncio.wrap_future(f) for f in futs))
async def main():
counter = 0
while True:
entires = [0, 1, 2, 3, 4][:random.randrange(5)]
if len(entires) > 0:
counter += 1
asyncio.create_task(process_entires(counter, entires))
await asyncio.sleep(3)
if __name__ == "__main__":
asyncio.run(main())

Is it possible for two coroutines running in different threads can communicate with each other by asyncio.Queue?

Two coroutintes in code below, running in different threads, cannot communicate with each other by asyncio.Queue. After the producer inserts a new item in asyncio.Queue, the consumer cannot get this item from that asyncio.Queue, it gets blocked in method await self.n_queue.get().
I try to print the ids of asyncio.Queue in both consumer and producer, and I find that they are same.
import asyncio
import threading
import time
class Consumer:
def __init__(self):
self.n_queue = None
self._event = None
def run(self, loop):
loop.run_until_complete(asyncio.run(self.main()))
async def consume(self):
while True:
print("id of n_queue in consumer:", id(self.n_queue))
data = await self.n_queue.get()
print("get data ", data)
self.n_queue.task_done()
async def main(self):
loop = asyncio.get_running_loop()
self.n_queue = asyncio.Queue(loop=loop)
task = asyncio.create_task(self.consume())
await asyncio.gather(task)
async def produce(self):
print("id of queue in producer ", id(self.n_queue))
await self.n_queue.put("This is a notification from server")
class Producer:
def __init__(self, consumer, loop):
self._consumer = consumer
self._loop = loop
def start(self):
while True:
time.sleep(2)
self._loop.run_until_complete(self._consumer.produce())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
print(id(loop))
consumer = Consumer()
threading.Thread(target=consumer.run, args=(loop,)).start()
producer = Producer(consumer, loop)
producer.start()
id of n_queue in consumer: 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
I try to debug step by step in asyncio.Queue, and I find after the method self._getters.append(getter) is invoked in asyncio.Queue, the item is inserted in queue self._getters. The following snippets are all from asyncio.Queue.
async def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
"""
while self.empty():
getter = self._loop.create_future()
self._getters.append(getter)
try:
await getter
except:
# ...
raise
return self.get_nowait()
When a new item is inserted into asycio.Queue in producer, the methods below would be invoked. The variable self._getters has no items although it has same id in methods put() and set().
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
Does anyone know what's wrong with the demo code above? If the two coroutines are running in different threads, how could they communicate with each other by asyncio.Queue?
Short answer: no!
Because the asyncio.Queue needs to share the same event loop, but
An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task.
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Even though you can pass the event loop to threads, it might be dangerous to mix the different concurrency concepts. Still note, that passing the loop just means that you can add tasks to the loop from different threads, but they will still be executed in the main thread. However, adding tasks from threads can lead to race conditions in the event loop, because
Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low-level asyncio API, the loop.call_soon_threadsafe() method should be used
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Typically, you should not need to run async functions in different threads, because they should be IO bound and therefore a single thread should be sufficient to handle the work load. If you still have some CPU bound tasks, you are able to dispatch them to different threads and make the result awaitable using asyncio.to_thread, see https://docs.python.org/3/library/asyncio-task.html#running-in-threads.
There are many questions already about this topic, see e.g. Send asyncio tasks to loop running in other thread or How to combine python asyncio with threads?
If you want to learn more about the concurrency concepts, I recommend to read https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8

running asyncio task concurrently and in background

So apologies because i've seen this question asked a bunch, but having looked through all of the questions none seem to fix my problem. My code looks like this
TDSession = TDClient()
TDSession.grab_refresh_token()
q = queue.Queue(10)
asyncio.run(listener.startStreaming(TDSession, q))
while True:
message = q.get()
print('oh shoot!')
print(message)
orderEntry.placeOrder(TDSession=TDSession)
I have tried doing asyncio.create_task(listener.startStreaming(TDSession,q)), the problem is I get
RuntimeError: no running event loop
sys:1: RuntimeWarning: coroutine 'startStreaming' was never awaited
which confused me because this seemed to work here Can an asyncio event loop run in the background without suspending the Python interpreter? which is what i'm trying to do.
with the listener.startStreaming func looking like this
async def startStreaming(TDSession, q):
streamingClient = TDSession.create_streaming_session()
streamingClient.account_activity()
await streamingClient.build_pipeline()
while True:
message = await streamingClient.start_pipeline()
message = parseMessage(message)
if message != None:
print('putting message into q')
print( dict(message) )
q.put(message)
Is there a way to make this work where I can run the listener in the background?
EDIT: I've tried this as well, but it only runs the consumer function, instead of running both at the same time
TDSession.grab_refresh_token()
q = queue.Queue(10)
loop = asyncio.get_event_loop()
loop.create_task(listener.startStreaming(TDSession, q))
loop.create_task(consumer(TDSession, q))
loop.run_forever()
As you found out, the asyncio.run function runs the given coroutine until it is complete. In other words, it waits for the coroutine returned by listener.startStreaming to finish before proceeding to the next line.
Using asyncio.create_task, on the other hand, requires the caller to be already running inside an asyncio loop already. From the docs:
The task is executed in the loop returned by get_running_loop(), RuntimeError is raised if there is no running loop in current thread.
What you need is to combine the two, by creating a function that's async, and then call create_task inside that async function.
For example:
async def main():
TDSession = TDClient()
TDSession.grab_refresh_token()
q = asyncio.Queue(10)
streaming_task = asyncio.create_task(listener.startStreaming(TDSession, q))
while True:
message = await q.get()
print('oh shoot!')
print(message)
orderEntry.placeOrder(TDSession=TDSession)
await streaming_task # If you want to wait for `startStreaming` to complete after the while loop
if __name__ == '__main__':
asyncio.run(main())
Edit: From your comment I realized you want to use the producer-consumer pattern, so I also updated the example above to use asyncio.Queue instead of a queue.Queue, in order for the thread to be able to jump between the producer (startStreaming) and the consumer (the while loop)

Python: asyncio loops with threads

Could you tell me if this is a correct approach to build several independent async loops inside own threads?
def init():
print("Initializing Async...")
global loop_heavy
loop_heavy = asyncio.new_event_loop()
start_loop(loop_heavy)
def start_loop(loop):
thread = threading.Thread(target=loop.run_forever)
thread.start()
def submit_heavy(task):
future = asyncio.run_coroutine_threadsafe(task, loop_heavy)
try:
future.result()
except Exception as e:
print(e)
def stop():
loop_heavy.call_soon_threadsafe(loop_heavy.stop)
async def heavy():
print("3. heavy start %s" % threading.current_thread().name)
await asyncio.sleep(3) # or await asyncio.sleep(3, loop=loop_heavy)
print("4. heavy done")
Then I am testing it with:
if __name__ == "__main__":
init()
print("1. submit heavy: %s" % threading.current_thread().name)
submit_heavy(heavy())
print("2. submit is done")
stop()
I am expecting to see 1->3->2->4 but in fact it is 1->3->4->2:
Initializing Async...
1. submit heavy: MainThread
3. heavy start Thread-1
4. heavy done
2. submit is done
I think that I miss something in understanding async and threads.
Threads are different. Why am I waiting inside MainThread until the job inside Thread-1 is finished?
Why am I waiting inside MainThread until the job inside Thread-1 is finished?
Good question, why are you?
One possible answer is, because you actually want to block the current thread until the job is finished. This is one of the reasons to put the event loop in another thread and use run_coroutine_threadsafe.
The other possible answer is that you don't have to if you don't want. You can simply return from submit_heavy() the concurrent.futures.Future object returned by run_coroutine_threadsafe, and leave it to the caller to wait for the result (or check if one is ready) at their own leisure.
Finally, if your goal is just to run a regular function "in the background" (without blocking the current thread), perhaps you don't need asyncio at all. Take a look at the concurrent.futures module, whose ThreadPoolExecutor allows you to easily submit a function to a thread pool and leave it to execute unassisted.
I will add one of the possible solutions that I found from the asyncio documentation.
I'm not sure that it is the correct way, but it works as expected (MainThread is not blocked by the execution of the child thread)
Running Blocking Code
Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second.
An executor can be used to run a task in a different thread or even in a different process to avoid blocking block the OS thread with the event loop. See the loop.run_in_executor() method for more details.
Applying to my code:
import asyncio
import threading
import concurrent.futures
import multiprocessing
import time
def init():
print("Initializing Async...")
global loop, thread_executor_pool
thread_executor_pool = concurrent.futures.ThreadPoolExecutor(max_workers=multiprocessing.cpu_count())
loop = asyncio.get_event_loop()
thread = threading.Thread(target=loop.run_forever)
thread.start()
def submit_task(task, *args):
loop.run_in_executor(thread_executor_pool, task, *args)
def stop():
loop.call_soon_threadsafe(loop.stop)
thread_executor_pool.shutdown()
def blocked_task(msg1, msg2):
print("3. task start msg: %s, %s, thread: %s" % (msg1, msg2, threading.current_thread().name))
time.sleep(3)
print("4. task is done -->")
if __name__ == "__main__":
init()
print("1. --> submit task: %s" % threading.current_thread().name)
submit_task(blocked_task, "a", "b")
print("2. --> submit is done")
stop()
Output:
Initializing Async...
1. --> submit task: MainThread
3. task start msg: a, b, thread: ThreadPoolExecutor-0_0
2. --> submit is done
4. task is done -->
Correct me if there are still any mistakes or it can be done in the other way.

Monitoring the asyncio event loop

I am writing an application using python3 and am trying out asyncio for the first time. One issue I have encountered is that some of my coroutines block the event loop for longer than I like. I am trying to find something along the lines of top for the event loop that will show how much wall/cpu time is being spent running each of my coroutines. If there isn't anything already existing does anyone know of a way to add hooks to the event loop so that I can take measurements?
I have tried using cProfile which gives some helpful output, but I am more interested in time spent blocking the event loop, rather than total execution time.
Event loop can already track if coroutines take much CPU time to execute. To see it you should enable debug mode with set_debug method:
import asyncio
import time
async def main():
time.sleep(1) # Block event loop
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.set_debug(True) # Enable debug
loop.run_until_complete(main())
In output you'll see:
Executing <Task finished coro=<main() [...]> took 1.016 seconds
By default it shows warnings for coroutines that blocks for more than 0.1 sec. It's not documented, but based on asyncio source code, looks like you can change slow_callback_duration attribute to modify this value.
You can use call_later. Periodically run callback that will log/notify the difference of loop's time and period interval time.
class EventLoopDelayMonitor:
def __init__(self, loop=None, start=True, interval=1, logger=None):
self._interval = interval
self._log = logger or logging.getLogger(__name__)
self._loop = loop or asyncio.get_event_loop()
if start:
self.start()
def run(self):
self._loop.call_later(self._interval, self._handler, self._loop.time())
def _handler(self, start_time):
latency = (self._loop.time() - start_time) - self._interval
self._log.error('EventLoop delay %.4f', latency)
if not self.is_stopped():
self.run()
def is_stopped(self):
return self._stopped
def start(self):
self._stopped = False
self.run()
def stop(self):
self._stopped = True
example
import time
async def main():
EventLoopDelayMonitor(interval=1)
await asyncio.sleep(1)
time.sleep(2)
await asyncio.sleep(1)
await asyncio.sleep(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
output
EventLoop delay 0.0013
EventLoop delay 1.0026
EventLoop delay 0.0014
EventLoop delay 0.0015
For anyone reading this in 2019, this might be a better answer: yappi. With Yappi version 1.2.1>=, you can natively profile coroutines and see exactly how much wall or cpu time is spent inside a coroutine.
See here for details on this coroutine profiling.
To expand a bit on one of the answers, if you want to monitor your loop and detect hangs, here's a snippet to do just that. It launches a separate thread that checks whether the loop's tasks yielded execution recently enough.
def monitor_loop(loop, delay_handler):
loop = loop
last_call = loop.time()
INTERVAL = .5 # How often to poll the loop and check the current delay.
def run_last_call_updater():
loop.call_later(INTERVAL, last_call_updater)
def last_call_updater():
nonlocal last_call
last_call = loop.time()
run_last_call_updater()
run_last_call_updater()
def last_call_checker():
threading.Timer(INTERVAL / 2, last_call_checker).start()
if loop.time() - last_call > INTERVAL:
delay_handler(loop.time() - last_call)
threading.Thread(target=last_call_checker).start()

Resources