Python asyncio queue doesnt finish - python-3.x

I have one producer and 3 consumers. Each consumer waits to acquire a global lock before it can proceed. The program runs and doesnt finish and come out of the while loop. Could you tell me where it is going wrong?
import asyncio
import random
async def produce(queue, n):
for x in range(1, n + 1):
# produce an item
print('producing {}/{}'.format(x, n))
# simulate i/o operation using sleep
await asyncio.sleep(random.random())
item = str(x)
# put the item in the queue
await queue.put(item)
# indicate the producer is done
await queue.put(None)
async def consume(queue, lock):
while True:
item = await queue.get()
if item is None:
# the producer emits None to indicate that it is done
break
# wait for an item from the producer
async with lock:
# process the item
print('consuming item {}...'.format(item))
# simulate i/o operation using sleep
await asyncio.sleep(0.3)
loop = asyncio.get_event_loop()
lock = asyncio.Lock()
queue = asyncio.Queue(loop=loop)
producer_coro = produce(queue, 10)
consumers = []
for _ in range(3):
consumers.append(consume(queue, lock))
all_coroutines = []
all_coroutines.append(producer_coro)
all_coroutines.extend(consumers)
loop.run_until_complete(asyncio.wait(all_coroutines))
loop.close()

The problem is in the consumer:
if item is None:
# the producer emits None to indicate that it is done
break
The None sentinel is only picked up by a single consumer, and the rest are left waiting for the next value to arrive through the queue. A simple fix is to return the sentinel value back to the queue:
if item is None:
# The producer emits None to indicate that it is done.
# Propagate it to other consumers and quit.
await queue.put(None)
break
Alternatively, produce could enqueue as many None sentinels as there are consumers - but that would require the producer to know how many consumers there are, which is not always desirable.

Adding to the answer #user4815162342 provided, try:
if item is None and queue.qsize() == 0:
await queue.put(None)
break
I had an issue where the consumer also had to queue.put() to the same queue to rerun the function but it hung at the end without the both conditions.

Related

Is it possible for two coroutines running in different threads can communicate with each other by asyncio.Queue?

Two coroutintes in code below, running in different threads, cannot communicate with each other by asyncio.Queue. After the producer inserts a new item in asyncio.Queue, the consumer cannot get this item from that asyncio.Queue, it gets blocked in method await self.n_queue.get().
I try to print the ids of asyncio.Queue in both consumer and producer, and I find that they are same.
import asyncio
import threading
import time
class Consumer:
def __init__(self):
self.n_queue = None
self._event = None
def run(self, loop):
loop.run_until_complete(asyncio.run(self.main()))
async def consume(self):
while True:
print("id of n_queue in consumer:", id(self.n_queue))
data = await self.n_queue.get()
print("get data ", data)
self.n_queue.task_done()
async def main(self):
loop = asyncio.get_running_loop()
self.n_queue = asyncio.Queue(loop=loop)
task = asyncio.create_task(self.consume())
await asyncio.gather(task)
async def produce(self):
print("id of queue in producer ", id(self.n_queue))
await self.n_queue.put("This is a notification from server")
class Producer:
def __init__(self, consumer, loop):
self._consumer = consumer
self._loop = loop
def start(self):
while True:
time.sleep(2)
self._loop.run_until_complete(self._consumer.produce())
if __name__ == '__main__':
loop = asyncio.get_event_loop()
print(id(loop))
consumer = Consumer()
threading.Thread(target=consumer.run, args=(loop,)).start()
producer = Producer(consumer, loop)
producer.start()
id of n_queue in consumer: 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
id of queue in producer 2255377743176
I try to debug step by step in asyncio.Queue, and I find after the method self._getters.append(getter) is invoked in asyncio.Queue, the item is inserted in queue self._getters. The following snippets are all from asyncio.Queue.
async def get(self):
"""Remove and return an item from the queue.
If queue is empty, wait until an item is available.
"""
while self.empty():
getter = self._loop.create_future()
self._getters.append(getter)
try:
await getter
except:
# ...
raise
return self.get_nowait()
When a new item is inserted into asycio.Queue in producer, the methods below would be invoked. The variable self._getters has no items although it has same id in methods put() and set().
def put_nowait(self, item):
"""Put an item into the queue without blocking.
If no free slot is immediately available, raise QueueFull.
"""
if self.full():
raise QueueFull
self._put(item)
self._unfinished_tasks += 1
self._finished.clear()
self._wakeup_next(self._getters)
def _wakeup_next(self, waiters):
# Wake up the next waiter (if any) that isn't cancelled.
while waiters:
waiter = waiters.popleft()
if not waiter.done():
waiter.set_result(None)
break
Does anyone know what's wrong with the demo code above? If the two coroutines are running in different threads, how could they communicate with each other by asyncio.Queue?
Short answer: no!
Because the asyncio.Queue needs to share the same event loop, but
An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an await expression, the running Task gets suspended, and the event loop executes the next Task.
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Even though you can pass the event loop to threads, it might be dangerous to mix the different concurrency concepts. Still note, that passing the loop just means that you can add tasks to the loop from different threads, but they will still be executed in the main thread. However, adding tasks from threads can lead to race conditions in the event loop, because
Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low-level asyncio API, the loop.call_soon_threadsafe() method should be used
see
https://docs.python.org/3/library/asyncio-dev.html#asyncio-multithreading
Typically, you should not need to run async functions in different threads, because they should be IO bound and therefore a single thread should be sufficient to handle the work load. If you still have some CPU bound tasks, you are able to dispatch them to different threads and make the result awaitable using asyncio.to_thread, see https://docs.python.org/3/library/asyncio-task.html#running-in-threads.
There are many questions already about this topic, see e.g. Send asyncio tasks to loop running in other thread or How to combine python asyncio with threads?
If you want to learn more about the concurrency concepts, I recommend to read https://medium.com/analytics-vidhya/asyncio-threading-and-multiprocessing-in-python-4f5ff6ca75e8

Asyncio with two loops, best practice

I have two infinite loops. Their processing is lightweight. I don't want them to block each other. Is using await asyncio.sleep(0) a good practice?
This is my code
import asyncio
async def loop1():
while True:
print("loop1")
# pull data from kafka
await asyncio.sleep(0)
async def loop2():
while True:
print("loop2")
# send data to all clients using asyncio stream api
await asyncio.sleep(0)
async def main():
await asyncio.gather(loop1(), loop2())
asyncio.run(main())
Two (many more) asyncio tasks will not block each other until one of tasks have some long sync operation inside.
Both of your tasks have only network operations inside (Kafka and API requests), so none of them will block another task.
When should you use asyncio.sleep(0)?
Imagine you have some long sync operation - calculations. Calculations is not I/O operation.
This example is more like good to know, if you have such operations in real app, you have to move them in loop.run_in_executor and use concurrent.futures.ProcessPoolExecutor as executor. The example:
import asyncio
async def long_calc():
"""
Some Heavy CPU bound task.
Better make it sync function and move to ProcessPoolExecutor
"""
s = 0
for _ in range(100):
for i in range(1_000_000):
s += i**2
# comment the line and watch result
# you'll get no working messages
# that's why I use sleep(0.0) here
await asyncio.sleep(0.0)
return s
async def pinger():
"""Task which shows that app is alive"""
n = 0
while True:
await asyncio.sleep(1)
print(f"Working {n}")
n += 1
async def amain():
"""Main async function in this app"""
# run in asyncio.create_task since we want the task
# to run in parallel with long_calc +
# we do not want to wait till it will be finished
# If it were thread it would be called daemon thread
asyncio.create_task(pinger())
# await results of long task
s = await long_calc()
print(f"Done: {s}")
if __name__ == '__main__':
asyncio.run(amain())
If you need me to provide you with run_in_executor example - let me know.

running asyncio task concurrently and in background

So apologies because i've seen this question asked a bunch, but having looked through all of the questions none seem to fix my problem. My code looks like this
TDSession = TDClient()
TDSession.grab_refresh_token()
q = queue.Queue(10)
asyncio.run(listener.startStreaming(TDSession, q))
while True:
message = q.get()
print('oh shoot!')
print(message)
orderEntry.placeOrder(TDSession=TDSession)
I have tried doing asyncio.create_task(listener.startStreaming(TDSession,q)), the problem is I get
RuntimeError: no running event loop
sys:1: RuntimeWarning: coroutine 'startStreaming' was never awaited
which confused me because this seemed to work here Can an asyncio event loop run in the background without suspending the Python interpreter? which is what i'm trying to do.
with the listener.startStreaming func looking like this
async def startStreaming(TDSession, q):
streamingClient = TDSession.create_streaming_session()
streamingClient.account_activity()
await streamingClient.build_pipeline()
while True:
message = await streamingClient.start_pipeline()
message = parseMessage(message)
if message != None:
print('putting message into q')
print( dict(message) )
q.put(message)
Is there a way to make this work where I can run the listener in the background?
EDIT: I've tried this as well, but it only runs the consumer function, instead of running both at the same time
TDSession.grab_refresh_token()
q = queue.Queue(10)
loop = asyncio.get_event_loop()
loop.create_task(listener.startStreaming(TDSession, q))
loop.create_task(consumer(TDSession, q))
loop.run_forever()
As you found out, the asyncio.run function runs the given coroutine until it is complete. In other words, it waits for the coroutine returned by listener.startStreaming to finish before proceeding to the next line.
Using asyncio.create_task, on the other hand, requires the caller to be already running inside an asyncio loop already. From the docs:
The task is executed in the loop returned by get_running_loop(), RuntimeError is raised if there is no running loop in current thread.
What you need is to combine the two, by creating a function that's async, and then call create_task inside that async function.
For example:
async def main():
TDSession = TDClient()
TDSession.grab_refresh_token()
q = asyncio.Queue(10)
streaming_task = asyncio.create_task(listener.startStreaming(TDSession, q))
while True:
message = await q.get()
print('oh shoot!')
print(message)
orderEntry.placeOrder(TDSession=TDSession)
await streaming_task # If you want to wait for `startStreaming` to complete after the while loop
if __name__ == '__main__':
asyncio.run(main())
Edit: From your comment I realized you want to use the producer-consumer pattern, so I also updated the example above to use asyncio.Queue instead of a queue.Queue, in order for the thread to be able to jump between the producer (startStreaming) and the consumer (the while loop)

Why is this queue.join call blocking indefinitely?

I'm playing about with a personal project in python3.6 and I've run into the following issue which results in the my_queue.join() call blocking indefinitely. Note this isn't my actual code but a minimal example demonstrating the issue.
import threading
import queue
def foo(stop_event, my_queue):
while not stop_event.is_set():
try:
item = my_queue.get(timeout=0.1)
print(item) #Actual logic goes here
except queue.Empty:
pass
print('DONE')
stop_event = threading.Event()
my_queue = queue.Queue()
thread = threading.Thread(target=foo, args=(stop_event, my_queue))
thread.start()
my_queue.put(1)
my_queue.put(2)
my_queue.put(3)
print('ALL PUT')
my_queue.join()
print('ALL PROCESSED')
stop_event.set()
print('ALL COMPLETE')
I get the following output (it's actually been consistent, but I understand that the output order may differ due to threading):
ALL PUT
1
2
3
No matter how long I wait I never see ALL PROCESSED output to the console, so why is my_queue.join() blocking indefinitely when all the items have been processed?
From the docs:
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
You're never calling q.task_done() inside your foo function. The foo function should be something like the example:
def worker():
while True:
item = q.get()
if item is None:
break
do_work(item)
q.task_done()

what happens when consumers have no tasks in the queue when using asyncio.Queue()

I have this script
import asyncio
import random
q = asyncio.Queue()
async def producer(num):
while True:
await q.put(num + random.random())
await asyncio.sleep(random.random())
async def consumer(num):
while True:
value = await q.get()
print('Consumed', num, value)
loop = asyncio.get_event_loop()
for i in range(6):
loop.create_task(producer(i))
for i in range(3):
loop.create_task(consumer(i))
loop.run_forever()
that uses asyncio.Queue()
I am running my script forever and produces tasks randomly and adding them to the queue. In the event that there are no tasks to consume in the queue, will this be using cpu for nothing or is it harmless and produces no error?
In the event that there are no tasks to consume in the queue, will this produce an error?
Since the consumer calls get() to consume the next queued item, if the queue is empty, it will simply wait for the next item to arrive. No CPU will be wasted on the wait, and no error reported - the consumer coroutine will just be suspended until an item is produced, after which it will be immediately woken up. During the wait the event loop is free to run other coroutines, if any.

Resources