Python 3 - multiple AsyncIO connections - python-3.x

I am trying to learn how to use AsyncIO in Python 3.7 and I am still a little confused by its principles.
My goal is to write a simple chat program, however I need to use a ring network topology -- one node only knows about its two neighbours. When the message is sent, it is passed by the nodes until it reaches the sender again. This means that each node is basically a client and a server at the same time.
I also need to be able to detect dead nodes, so that my ring does not break.
I thought it might be a good solution for each node to have a separate connection for every neighbour -- successor and predecessor.
class Node:
...
def run():
...
s = loop.create_connection(lambda: Client(...), addr1, port1)
p = loop.create_server(lambda: Server(...), addr2, port2)
successor = loop.run_until_complete(s)
predecessor = loop.run_until_complete(p)
loop.run_forever()
...
...
Server and Client are classes that implement asyncio.Protocol.
The reason I wanted to do it this way is, that if there is a message being sent through the circle, it is always sent from the predecessor to the successor. In connection_lost method of the predecessor I can detect that it is disconnected and send its predecessor a message (through the whole ring) to connect to me.
I would like to be able to send a message that I received from my predecessor further on to my successor. I would also like to be able to send a message with my address to my successor in case my predecessor dies (this message would be sent from predecessor's Server.connection_lost() and would be passed all the way to my dead predecessor's predecessor).
My question is: Can I pass the received data from predecessor to successor? If not, what would be a better implementation of this program that uses AsyncIO and the ring topology?

For anyone new to AsyncIO having the same problem, I found the solution myself.
First of all, it is better to use the high-level aspects of AsyncIO -- streams. Calling loop.create_connction and loop.create_server is considered low-level (which I understood wrong at first).
The high-level alternative to create_connection is asyncio.open_connection, which will supply you with a tuple consisting of asyncio.StreamReader and asyncio.StreamWriter which you can use to read from and write to the open connection. You can also detect the loss of the connection when the data read from the StreamReader equals to b'' or when you catch an exception (ConnectionError) while trying to write to the StreamWriter.
The high-level alternative to create_server is asyncio.start_server, which needs to be supplied a callback function that will be called every time a connection to the server is made (open connection, received data...). The callback has StreamReader and StreamWriter as arguments. The loss of the connection can be also detected by receiving b'' or ConnectionError on writing to the writer.
Multiple connections can be handled by coroutines. There can be a coroutine for the server part (which accepts the connection from one of the neighbors in the ring topology) and a coroutine for the client part (which opens a connection to the other neighbor in the ring). The Node class can look like this:
import asyncio
class Node:
...
async def run(self):
...
self.next_reader, self.next_writer = await asyncio.open_connection(self.next_IP, self.next_port)
server_coro = asyncio.create_task(self.server_init())
client_coro = asyncio.create_task(self.client_method())
await client_coro
await server_coro
...
async def server_init(self):
server = await asyncio.start_server(self.server_callback, self.IP, self.port)
async with server:
await server.serve_forever()
async def client_method(self):
...
try:
data = await self.next_reader.read()
except ConnectionError:
...
...
Note that I am using asyncio.create_task for the coroutines and (not here in the code listing) asyncio.run(node.run()), which are considered high-level alternatives of asyncio.ensure_future() and loop.run_forever(). Both of these were added in Python 3.7 and asyncio.run() is said to be provisional, so by the time you read this, is might already have been replaced by something else.
I'm not an AsyncIO expert, so there might be a better, cleaner way to do this (if you know it, please share it).

Related

Is `asyncio.open_connection(host, port)` blocking?

I am new to asyncio library and am struggling with the behavior of asyncio.open_connection. I have created a task and has await asyncio.open_connection(host, port)within it. I want the call to open_connection blocking, that is, don't yield to the event loop until the connection is established. However, my experience is suggesting that it is not blocking and yields to the loop. So here I have two questions
I want to make sure if await asyncio.open_connection really yields to the event loop?
And if yes, what is the best way to avoid this?
Yes, it yields to event loop.
In asyncio's source code:
async def open_connection(host=None, port=None, *,
limit=_DEFAULT_LIMIT, **kwds):
"""A wrapper for create_connection() returning a (reader, writer) pair.
The reader returned is a StreamReader instance; the writer is a
StreamWriter instance.
The arguments are all the usual arguments to create_connection()
except protocol_factory; most common are positional host and port,
with various optional keyword arguments following.
Additional optional keyword arguments are loop (to set the event loop
instance to use) and limit (to set the buffer limit passed to the
StreamReader).
(If you want to customize the StreamReader and/or
StreamReaderProtocol classes, just copy the code -- there's
really nothing special here except some convenience.)
"""
loop = events.get_running_loop()
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, loop=loop)
transport, _ = await loop.create_connection(
lambda: protocol, host, port, **kwds)
writer = StreamWriter(transport, protocol, reader, loop)
return reader, writer
it calls loop.create_connection which is also await call, and about loop.create_connection:
This method will try to establish the connection in the background. When successful, it returns a (transport, protocol) pair.
Says the Docs. So it is then yielding control to event loop and let other coroutines run, while previous coroutine is waiting in await for connection to be established.
If you absolutely sure you want to block the thread that is running the event loop then you can just use low-level socket. Honestly I really am against this idea because there's not many reason to do so.
Just a minor addition, I saw you wasn't accepting answers on your previous questions. People write answers spending their own time and effort to help other peoples in help. If answer solved your questions then marking them as answer is a way to thank their efforts! Please refer Stackoverflow tour for more tips!

ValueError when asyncio.run() is called in separate thread

I have a network application which is listening on multiple sockets.
To handle each socket individually, I use Python's threading.Thread module.
These sockets must be able to run tasks on packet reception without delaying any further packet reception from the socket handling thread.
To do so, I've declared the method(s) that are running the previously mentioned tasks with the keyword async so I can run them asynchronously with asyncio.run(my_async_task(my_parameters)).
I have tested this approach on a single socket (running on the main thread) with great success.
But when I use multiple sockets (each one with it's independent handler thread), the following exception is raised:
ValueError: set_wakeup_fd only works in main thread
My question is the following: Is asyncio the appropriate tool for what I need? If it is, how do I run an async method from a thread that is not a main thread.
Most of my search results are including "event loops" and "awaiting" assync results, which (if I understand these results correctly) is not what I am looking for.
I am talking about sockets in this question to provide context but my problem is mostly about the behaviour of asyncio in child threads.
I can, if needed, write a short code sample to reproduce the error.
Thank you for the help!
Edit1, here is a minimal reproducible code example:
import asyncio
import threading
import time
# Handle a specific packet from any socket without interrupting the listenning thread
async def handle_it(val):
print("handled: {}".format(val))
# A class to simulate a threaded socket listenner
class MyFakeSocket(threading.Thread):
def __init__(self, val):
threading.Thread.__init__(self)
self.val = val # Value for a fake received packet
def run(self):
for i in range(10):
# The (fake) socket will sequentially receive [val, val+1, ... val+9]
asyncio.run(handle_it(self.val + i))
time.sleep(0.5)
# Entry point
sockets = MyFakeSocket(0), MyFakeSocket(10)
for socket in sockets:
socket.start()
This is possibly related to the bug discussed here: https://bugs.python.org/issue34679
If so, this would be a problem with python 3.8 on windows. To work around this, you could try either downgrading to python 3.7, which doesn't include asyncio.main so you will need to get and run the event loop manually like:
loop = asyncio.get_event_loop()
loop.run_until_complete(<your tasks>)
loop.close()
Otherwise, would you be able to run the code in a docker container? This might work for you and would then be detached from the OS behaviour, but is a lot more work!

Do AsyncIO stream writers/readers require manually ensuring that all data is sent/received?

When dealing with sockets, you need to make sure that all data is sent/received, since you may receive incomplete chunks of data when reading. From the docs:
In general, they return when the associated network buffers have been filled (send) or emptied (recv). They then tell you how many bytes they handled. It is your responsibility to call them again until your message has been completely dealt with.
Emphasis mine. It then shows sample implementations that ensure all data has been handled in each direction.
Is the same true though when dealing with AsyncIO wrappers over sockets?
For read, it seems to be required as the docs mention that it "[reads] up to n bytes.".
For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.
Is this correct? Do I need to check how much was read using read, but can just drain the StreamWriter and know that everything was sent?
I thought that my above assumptions were correct, then I had a look at the example TCP Client immediately below the method docs:
import asyncio
async def tcp_echo_client(message):
reader, writer = await asyncio.open_connection(
'127.0.0.1', 8888)
print(f'Send: {message!r}')
writer.write(message.encode())
data = await reader.read(100)
print(f'Received: {data.decode()!r}')
print('Close the connection')
writer.close()
asyncio.run(tcp_echo_client('Hello World!'))
And it doesn't do any kind of checking. It assumes everything is both read and written the first time.
For read, [checking for incomplete read] seems to be required as the docs mention that it "[reads] up to n bytes.".
Correct, and this is a useful feature for many kinds of processing, as it allows you to read new data as it arrives from the peer and process it incrementally, without having to know how much to expect at any point. If you do know exactly how much you expect and need to read that amount of bytes, you can use readexactly.
For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.
This is partially correct. Yes, asyncio will automatically keep writing the data you give it in the background until all is written, so you don't need to (nor can you) ensure it by checking the return value of write.
However, a sequence of stream.write(data); await stream.drain() will not pause the coroutine until all data has been transmitted to the OS. This is because drain doesn't wait for all data to be written, it only waits until it hits a "low watermark", trying to ensure (misguidedly according to some) that the buffer never becomes empty as long as there are new writes. As far as I know, in current asyncio there is no way to wait until all data has been sent - except for manually tweaking the watermarks, which is inconvenient and which the documentation warns against. The same applies to awaiting the return value of write() introduced in Python 3.8.
This is not as bad as it sounds simply because a successful write itself doesn't guarantee that the data was actually transmitted to, let alone received by the peer - it could be languishing in the socket buffer, or in network equipment along the way. But as long as you can rely on the system to send out the data you gave it as fast as possible, you don't really care whether some of it is in an asyncio buffer or in a kernel buffer. (But you still need to await drain() to ensure backpressure.)
The one time you do care is when you are about to exit the program or the event loop; in that case, a portion of the data being stuck in an asyncio buffer means that the peer will never see it. This is why, starting with 3.7, asyncio provides a wait_closed() method which you can await after calling close() to ensure that all the data has been sent. One could imagine a flush() method that does the same, but without having to actually close the socket (analogous to the method of the same name on file objects, and with equivalent semantics), but currently there are no plans to add it.

Unable to run ZMQStream with Tornado Event loop using python 3.7

I've been trying to set up a server / client using zmq eventloop for REQ / REP messaging. Since python 3 doesn't support the eventloop provided by zmq, I'm trying to run it with tornado's eventloop.
I'm facing issues running zmqStream with tornado's event loop using python 3.
I created the server / client code using zmq's zmqStream and tornado's eventloop. The client is sending the correct messages, but the server doesn't seem to be responding to the message requests.
The server side code:
from tornado import ioloop
import zmq
def echo(stream, msg):
stream.send_pyobj(msg)
ctx = zmq.Context()
socket = ctx.socket(zmq.REP)
socket.bind('tcp://127.0.0.1:5678')
stream = ZMQStream(socket)
stream.on_recv(echo)
ioloop.IOLoop.current().start()
The client side code:
import zmq
context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://127.0.0.1:5678")
for request in range (1,10):
print("Sending request ", request,"...")
socket.send_string("Hello")
# Get the reply.
message = socket.recv_pyobj()
print("Received reply ", request, "[", message, "]")
I was expecting the server to return back the request messages being sent by the client. But it is just not responding to the requests being sent.
Q : server doesn't seem to be responding
Step 0 :
One server-side SLOC, stream = ZMQStream( socket ) calls a function, that is not MCVE-documented and must and does fail to execute to yield any result: "ZMQStream" in dir() confirms this with False
Remedy:
repair the MCVE and also print( zmq.zmq_version ) + "ZMQStream" in dir() confirmation
Step 1:
Always prevent infinite deadlocks, unless due reason exists not to do so, with setting prior to doing respective .bind() or .connect() <aSocket>.setsockopt( zmq.LINGER, 0 ). Hung forever applications and un-released (yes, you read it correctly, infinitely blocked) resources are not welcome in distributed computing systems.
Step 2:
Avoid a blind distributed-mutual-deadlock the REQ/REP is always prone to run into. It will happen, one just never knows when. You may read heaps of details about this on Stack Overflow.
And remedy? May (and shall, where possible) avoid using the blocking-forms of .recv()-s (fair .poll()-s are way smarter-design-wise, resources-wise) may use additional sender-side signalisation before "throwing" either side into infinitely-blocking .recv()-s (yet a network delivery failure or other reason for a silent message drop may cause soft-signaling to flag sending, which did not result in receiving and mutual-deadlocking, where hard-wired behaviour moves both of the REQ/REP side into waiting one for the other, to send a message (which the counterparty will never send, as it is also waiting for .recv()-ing the still not received one from the (still listening) opposite side ))
Last, but not least:
The ZeroMQ Zen-of-Zero has also a Zero-Warranty - as messages are either fully delivered (error-free) or not delivered at all. The REQ/REP mutual deadlocks are best resolvable if one never falls into 'em (ref. LINGER and poll() above)

pyzmq / ZeroMQ: Why could manual reading of a multipart message lead to problems in combination with AsyncIO?

I am reading data from a ZeroMQ socket using pyzmq. The code also uses
websockets so chose to use asyncio for ZeroMQ on the subscriber side, too.
The subscriber has to read a multipart message. Instead of using socket.recv_multipart()
I chose to make use of pyzmq's convenience functions for receiving and sending
import zmq
import zmq.asyncio
context = zmq.asyncio.Context()
socket = context.socket(zmq.SUB)
address = await socket.recv_string()
print(socket.RCVMORE)
clock = await socket.recv_pyobj()
print(socket.RCVMORE)
data = await socket.recv_pyobj()
The publisher also uses the convenience functions, does not use asyncio:
socket.send_string('raw', zmq.SNDMORE)
socket.send_pyobj(time.time(), zmq.SNDMORE)
socket.send_pyobj(d.data)
Usually the code works fine. But sometimes it failes on the subscriber side with exceptions like
EOFError: Ran out of input
or
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
I am quite sure this has to do with the piecewise reading in combination with
asyncio and could be solved by using await socket.recv_multipart(). But I am trying
to get a deeper understanding of the ZeroMQ protocol and asyncio and would
like to know why exactly it could fail.
Any ideas?
The only thread that
might be related to this does not provide a good answer on why this could happen.
Edit: I came across this in the docs, but not sure it is related:
Multi-part messages
A ØMQ message is composed of 1 or more message parts. Each message
part is an independent zmq_msg_t in its own right. ØMQ ensures atomic
delivery of messages: peers shall receive either all message parts of
a message or none at all.

Resources