ValueError when asyncio.run() is called in separate thread

ValueError when asyncio.run() is called in separate thread - python-3.x

I have a network application which is listening on multiple sockets.
To handle each socket individually, I use Python's threading.Thread module.
These sockets must be able to run tasks on packet reception without delaying any further packet reception from the socket handling thread.
To do so, I've declared the method(s) that are running the previously mentioned tasks with the keyword async so I can run them asynchronously with asyncio.run(my_async_task(my_parameters)).
I have tested this approach on a single socket (running on the main thread) with great success.
But when I use multiple sockets (each one with it's independent handler thread), the following exception is raised:
ValueError: set_wakeup_fd only works in main thread
My question is the following: Is asyncio the appropriate tool for what I need? If it is, how do I run an async method from a thread that is not a main thread.
Most of my search results are including "event loops" and "awaiting" assync results, which (if I understand these results correctly) is not what I am looking for.
I am talking about sockets in this question to provide context but my problem is mostly about the behaviour of asyncio in child threads.
I can, if needed, write a short code sample to reproduce the error.
Thank you for the help!
Edit1, here is a minimal reproducible code example:
import asyncio
import threading
import time
# Handle a specific packet from any socket without interrupting the listenning thread
async def handle_it(val):
print("handled: {}".format(val))
# A class to simulate a threaded socket listenner
class MyFakeSocket(threading.Thread):
def __init__(self, val):
threading.Thread.__init__(self)
self.val = val # Value for a fake received packet
def run(self):
for i in range(10):
# The (fake) socket will sequentially receive [val, val+1, ... val+9]
asyncio.run(handle_it(self.val + i))
time.sleep(0.5)
# Entry point
sockets = MyFakeSocket(0), MyFakeSocket(10)
for socket in sockets:
socket.start()

This is possibly related to the bug discussed here: https://bugs.python.org/issue34679
If so, this would be a problem with python 3.8 on windows. To work around this, you could try either downgrading to python 3.7, which doesn't include asyncio.main so you will need to get and run the event loop manually like:
loop = asyncio.get_event_loop()
loop.run_until_complete(<your tasks>)
loop.close()
Otherwise, would you be able to run the code in a docker container? This might work for you and would then be detached from the OS behaviour, but is a lot more work!

Related

asyncio and threading: why is the thread id always the same?

With the simplest example of a pure TCP asyncio server I could write, I want to get the thread id of the current thread. Because I'm in a async coroutine, I thought this would be in a different thread (especially with asyncio library). But the result always prints the same id value. What am I missing? Is it the wrong function call? Does the asyncio not create a new thread?
import asyncio
import threading
from asyncio import StreamWriter, StreamReader
HOST = '127.0.0.1'
PORT = 7070
async def handle(reader: StreamReader, writer: StreamWriter):
print(f"{threading.get_native_id()=} / {threading.get_ident()=}")
writer.close()
async def main():
server = await asyncio.start_server(handle, HOST, PORT)
async with server:
await server.serve_forever()
asyncio.run(main())

asyncio library works in a single OS thread. Basically it's all about the event loop and coroutines being run by that event loop. asyncio applies the concept of cooperative multitasking - a coroutine itself decides when to bring control back to the event loop.
As for multithreading, I suggest you to read this article about GIL. Running multiple threads will not give you any performance gain because of GIL. That's why the key to performance gain (mostly with I/O bound tasks) is to use things like gevent/asyncio. Those libraries will manage "switching between tasks" (i.e. OS scheduler is not applied).

Creating non blocking restful service using aiohttp [duplicate]

I have tried the following code in Python 3.6 for asyncio:
Example 1:
import asyncio
import time
async def hello():
print('hello')
await asyncio.sleep(1)
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output is as expected:
hello
hello
hello again
hello again
Then I want to change the asyncio.sleep into another def:
async def sleep():
time.sleep(1)
async def hello():
print('hello')
await sleep()
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output:
hello
hello again
hello
hello again
It seems it is not running in an asynchronous mode, but a normal sync mode.
The question is: Why is it not running in an asynchronous mode and how can I change the old sync module into an 'async' one?

Asyncio uses an event loop, which selects what task (an independent call chain of coroutines) in the queue to activate next. The event loop can make intelligent decisions as to what task is ready to do actual work. This is why the event loop also is responsible for creating connections and watching file descriptors and other I/O primitives; it gives the event loop insight into when there are I/O operations in progress or when results are available to process.
Whenever you use await, there is an opportunity to return control to the loop which can then pass control to another task. Which task then is picked for execution depends on the exact implementation; the asyncio reference implementation offers multiple choices, but there are other implementations, such as the very, very efficient uvloop implementation.
Your sample is still asynchronous. It just so happens that by replacing the await.sleep() with a synchronous time.sleep() call, inside a new coroutine function, you introduced 2 coroutines into the task callchain that don't yield, and thus influenced in what order they are executed. That they are executed in what appears to be synchronous order is a coincidence. If you switched event loops, or introduced more coroutines (especially some that use I/O), the order can easily be different again.
Moreover, your new coroutines use time.sleep(); this makes your coroutines uncooperative. The event loop is not notified that your code is waiting (time.sleep() will not yield!), so no other coroutine can be executed while time.sleep() is running. time.sleep() simply doesn't return or lets any other code run until the requested amount of time has passed. Contrast this with the asyncio.sleep() implementation, which simply yields to the event loop with a call_later() hook; the event loop now knows that that task won't need any attention until a later time.
Also see asyncio: why isn't it non-blocking by default for a more in-depth discussion of how tasks and the event loop interact. And if you must run blocking, synchronous code that can't be made to cooperate, then use an executor pool to have the blocking code executed in a separate tread or child process to free up the event loop for other, better behaved tasks.

Python 3 - multiple AsyncIO connections

I am trying to learn how to use AsyncIO in Python 3.7 and I am still a little confused by its principles.
My goal is to write a simple chat program, however I need to use a ring network topology -- one node only knows about its two neighbours. When the message is sent, it is passed by the nodes until it reaches the sender again. This means that each node is basically a client and a server at the same time.
I also need to be able to detect dead nodes, so that my ring does not break.
I thought it might be a good solution for each node to have a separate connection for every neighbour -- successor and predecessor.
class Node:
...
def run():
...
s = loop.create_connection(lambda: Client(...), addr1, port1)
p = loop.create_server(lambda: Server(...), addr2, port2)
successor = loop.run_until_complete(s)
predecessor = loop.run_until_complete(p)
loop.run_forever()
...
...
Server and Client are classes that implement asyncio.Protocol.
The reason I wanted to do it this way is, that if there is a message being sent through the circle, it is always sent from the predecessor to the successor. In connection_lost method of the predecessor I can detect that it is disconnected and send its predecessor a message (through the whole ring) to connect to me.
I would like to be able to send a message that I received from my predecessor further on to my successor. I would also like to be able to send a message with my address to my successor in case my predecessor dies (this message would be sent from predecessor's Server.connection_lost() and would be passed all the way to my dead predecessor's predecessor).
My question is: Can I pass the received data from predecessor to successor? If not, what would be a better implementation of this program that uses AsyncIO and the ring topology?

For anyone new to AsyncIO having the same problem, I found the solution myself.
First of all, it is better to use the high-level aspects of AsyncIO -- streams. Calling loop.create_connction and loop.create_server is considered low-level (which I understood wrong at first).
The high-level alternative to create_connection is asyncio.open_connection, which will supply you with a tuple consisting of asyncio.StreamReader and asyncio.StreamWriter which you can use to read from and write to the open connection. You can also detect the loss of the connection when the data read from the StreamReader equals to b'' or when you catch an exception (ConnectionError) while trying to write to the StreamWriter.
The high-level alternative to create_server is asyncio.start_server, which needs to be supplied a callback function that will be called every time a connection to the server is made (open connection, received data...). The callback has StreamReader and StreamWriter as arguments. The loss of the connection can be also detected by receiving b'' or ConnectionError on writing to the writer.
Multiple connections can be handled by coroutines. There can be a coroutine for the server part (which accepts the connection from one of the neighbors in the ring topology) and a coroutine for the client part (which opens a connection to the other neighbor in the ring). The Node class can look like this:
import asyncio
class Node:
...
async def run(self):
...
self.next_reader, self.next_writer = await asyncio.open_connection(self.next_IP, self.next_port)
server_coro = asyncio.create_task(self.server_init())
client_coro = asyncio.create_task(self.client_method())
await client_coro
await server_coro
...
async def server_init(self):
server = await asyncio.start_server(self.server_callback, self.IP, self.port)
async with server:
await server.serve_forever()
async def client_method(self):
...
try:
data = await self.next_reader.read()
except ConnectionError:
...
...
Note that I am using asyncio.create_task for the coroutines and (not here in the code listing) asyncio.run(node.run()), which are considered high-level alternatives of asyncio.ensure_future() and loop.run_forever(). Both of these were added in Python 3.7 and asyncio.run() is said to be provisional, so by the time you read this, is might already have been replaced by something else.
I'm not an AsyncIO expert, so there might be a better, cleaner way to do this (if you know it, please share it).

Python thread never starts if run() contains yield from

Python 3.4, I'm trying to make a server using the websockets module (I was previously using regular sockets but wanted to make a javascript client) when I ran into an issue (because it expects async, at least if the examples are to be trusted, which I didn't use before). Threading simply does not work. If I run the following code, bar will never be printed, whereas if I comment out the line with yield from, it works as expected. So yield is probably doing something I don't quite understand, but why is it never even executed? Should I install python 3.5?
import threading
class SampleThread(threading.Thread):
def __init__(self):
super(SampleThread, self).__init__()
print("foo")
def run(self):
print("bar")
yield from var2
thread = SampleThread()
thread.start()

This is not the correct way to handle multithreading. run is neither a generator nor a coroutine. It should be noted that the asyncio event loop is only defined for the main thread. Any call to asyncio.get_event_loop() in a new thread (without first setting it with asyncio.set_event_loop() will throw an exception.
Before looking at running the event loop in a new thread, you should first analyze to see if you really need the event loop running in its own thread. It has a built-in thread pool executor at: loop.run_in_executor(). This will take a pool from concurrent.futures (either a ThreadPoolExecutor or a ProcessPoolExecutor) and provides a non-blocking way of running processes and threads directly from the loop object. As such, these can be await-ed (with Python3.5 syntax)
That being said, if you want to run your event loop from another thread, you can do it thustly:
import asyncio
class LoopThread(threading.Thread):
def __init__(self):
self.loop = asyncio.new_event_loop()
def run():
ayncio.set_event_loop(self.loop)
self.loop.run_forever()
def stop():
self.loop.call_soon_threadsafe(self.loop.stop)
From here, you still need to device a thread-safe way of creating tasks, etc. Some of the code in this thread is usable, although I did not have a lot of success with it: python asyncio, how to create and cancel tasks from another thread

Python Notify when all files have been transferred

I am using "watchdog" api to keep checking changes in a folder in my filesystem. Whatever files changes in that folder, I pass them to a particular function which starts threads for each file I pass them.
But watchdog, or any other filesystem watcher api (in my knowledge), notifies users file by file i.e. as the files come by, they notify the user. But I would like it to notify me a whole bunch of files at a time so that I can pass that list to my function and take use of multi-threading. Currently, when I use "watchdog", it notifies me one file at a time and I am only able to pass that one file to my function. I want to pass it many files at a time to be able to have multithreading.
One solution that comes to my mind is: you see when you copy a bunch of files in a folder, OS shows you a progress bar. If it would be possible for me to be notified when that progress bar is done, then it would be a perfect solution for my question. But I don't know if that is possible.
Also I know that watchdog is a polling API, and an ideal API for watching filesystem would be interrupt driven api like pyinotify. But I didn't find any API which was interrupt driven and also cross platform. iWatch is good, but only for linux, and I want something for all OS. So, if you have suggestions on any other API, please do let me know.
Thanks.

Instead of accumulating filesystem events, you could spawn a pool of worker
threads which get tasks from a common queue. The watchdog thread could then put
tasks in the queue as filesystem events occur. Done this way, a worker thread
can start working as soon as an event occurs.
For example,
import logging
import Queue
import threading
import time
import watchdog.observers as observers
import watchdog.events as events
logger = logging.getLogger(__name__)
SENTINEL = None
class MyEventHandler(events.FileSystemEventHandler):
def on_any_event(self, event):
super(MyEventHandler, self).on_any_event(event)
queue.put(event)
def __init__(self, queue):
self.queue = queue
def process(queue):
while True:
event = queue.get()
logger.info(event)
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
queue = Queue.Queue()
num_workers = 4
pool = [threading.Thread(target=process, args=(queue,)) for i in range(num_workers)]
for t in pool:
t.daemon = True
t.start()
event_handler = MyEventHandler(queue)
observer = observers.Observer()
observer.schedule(
event_handler,
path='/tmp/testdir',
recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Running
% mkdir /tmp/testdir
% script.py
yields output like
[14:48:31 Thread-1] <FileCreatedEvent: src_path=/tmp/testdir/.#foo>
[14:48:32 Thread-2] <FileModifiedEvent: src_path=/tmp/testdir/foo>
[14:48:32 Thread-3] <FileModifiedEvent: src_path=/tmp/testdir/foo>
[14:48:32 Thread-4] <FileDeletedEvent: src_path=/tmp/testdir/.#foo>
[14:48:42 Thread-1] <FileDeletedEvent: src_path=/tmp/testdir/foo>
[14:48:47 Thread-2] <FileCreatedEvent: src_path=/tmp/testdir/.#bar>
[14:48:49 Thread-4] <FileCreatedEvent: src_path=/tmp/testdir/bar>
[14:48:49 Thread-4] <FileModifiedEvent: src_path=/tmp/testdir/bar>
[14:48:49 Thread-1] <FileDeletedEvent: src_path=/tmp/testdir/.#bar>
[14:48:54 Thread-2] <FileDeletedEvent: src_path=/tmp/testdir/bar>
Doug Hellman has written an excellent set of tutorials (which has now been edited into a book) which should help you get started:
on using Queue
the threading module
how to setup and use a pool of worker processes
how to setup a pool of worker threads
I didn't actually end up using a multiprocessing Pool or ThreadPool as discussed
in the last two links, but you may find them useful anyway.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string