Is `asyncio.open_connection(host, port)` blocking? - python-3.x

I am new to asyncio library and am struggling with the behavior of asyncio.open_connection. I have created a task and has await asyncio.open_connection(host, port)within it. I want the call to open_connection blocking, that is, don't yield to the event loop until the connection is established. However, my experience is suggesting that it is not blocking and yields to the loop. So here I have two questions
I want to make sure if await asyncio.open_connection really yields to the event loop?
And if yes, what is the best way to avoid this?

Yes, it yields to event loop.
In asyncio's source code:
async def open_connection(host=None, port=None, *,
limit=_DEFAULT_LIMIT, **kwds):
"""A wrapper for create_connection() returning a (reader, writer) pair.
The reader returned is a StreamReader instance; the writer is a
StreamWriter instance.
The arguments are all the usual arguments to create_connection()
except protocol_factory; most common are positional host and port,
with various optional keyword arguments following.
Additional optional keyword arguments are loop (to set the event loop
instance to use) and limit (to set the buffer limit passed to the
StreamReader).
(If you want to customize the StreamReader and/or
StreamReaderProtocol classes, just copy the code -- there's
really nothing special here except some convenience.)
"""
loop = events.get_running_loop()
reader = StreamReader(limit=limit, loop=loop)
protocol = StreamReaderProtocol(reader, loop=loop)
transport, _ = await loop.create_connection(
lambda: protocol, host, port, **kwds)
writer = StreamWriter(transport, protocol, reader, loop)
return reader, writer
it calls loop.create_connection which is also await call, and about loop.create_connection:
This method will try to establish the connection in the background. When successful, it returns a (transport, protocol) pair.
Says the Docs. So it is then yielding control to event loop and let other coroutines run, while previous coroutine is waiting in await for connection to be established.
If you absolutely sure you want to block the thread that is running the event loop then you can just use low-level socket. Honestly I really am against this idea because there's not many reason to do so.
Just a minor addition, I saw you wasn't accepting answers on your previous questions. People write answers spending their own time and effort to help other peoples in help. If answer solved your questions then marking them as answer is a way to thank their efforts! Please refer Stackoverflow tour for more tips!

Related

Does a loop.run_in_executor functions need asyncio.lock() or threading.Lock()?

I copied the following code for my project and it's worked quite well for me but I don't really understand how the following code runs my blocking_function:
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(ThreadPoolExecutor(), blocking_function)
where on_message is called every time I receive a message. If I receive multiple messages, they are processed asynchronously.
blocking_function is a synchronous function that I don't want to be run when another blocking_function is running.Then within blocking_function, should I use threading.Lock() or asyncio.lock()?
As pointed out by dirn in the comment, in blocking_function you cannot use an asyncio.Lock because it's just not async. (The opposite also applies: you cannot lock a threading.Lock from an async function because attempting to do so would block the event loop.) If you need to guard data accessed by other instances of blocking_function, you should use a threading.Lock.
but I don't really understand how the following code runs my blocking_function
It hands off blocking_function to the thread pool you created to run it. The thread pool queues and runs the function (which happens "in the background" from your perspective), and the run_in_executor arranges the event loop to be notified when the function is done, handing off its return value as the result of the await expression.
Note that you should use None as the first argument of run_in_executor. If you use ThreadPoolExecutor(), you create a whole new thread pool for each message, and you never dispose of it. A thread pool is normally meant to be created once, and reuse a fixed number ("pool") of threads for subsequent work. None tells asyncio to use the thread pool it creates for this purpose.
It seems you can easily achieve your desired objective by ensuring a single thread is used.
A simple solution would be to ensure that all calls to blocking_function is run on a single thread. This can be easily achieved by creating a ThreadPoolExecutor object with 1 worker outside of the async function. Then every subsequent calls to the blocking function will be run on that single thread
thread_pool = ThreadPoolExecutor(max_workers=1)
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(thread_pool, blocking_function)
Don't forget to shutdown the thread afterwards.

Do AsyncIO stream writers/readers require manually ensuring that all data is sent/received?

When dealing with sockets, you need to make sure that all data is sent/received, since you may receive incomplete chunks of data when reading. From the docs:
In general, they return when the associated network buffers have been filled (send) or emptied (recv). They then tell you how many bytes they handled. It is your responsibility to call them again until your message has been completely dealt with.
Emphasis mine. It then shows sample implementations that ensure all data has been handled in each direction.
Is the same true though when dealing with AsyncIO wrappers over sockets?
For read, it seems to be required as the docs mention that it "[reads] up to n bytes.".
For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.
Is this correct? Do I need to check how much was read using read, but can just drain the StreamWriter and know that everything was sent?
I thought that my above assumptions were correct, then I had a look at the example TCP Client immediately below the method docs:
import asyncio
async def tcp_echo_client(message):
reader, writer = await asyncio.open_connection(
'127.0.0.1', 8888)
print(f'Send: {message!r}')
writer.write(message.encode())
data = await reader.read(100)
print(f'Received: {data.decode()!r}')
print('Close the connection')
writer.close()
asyncio.run(tcp_echo_client('Hello World!'))
And it doesn't do any kind of checking. It assumes everything is both read and written the first time.
For read, [checking for incomplete read] seems to be required as the docs mention that it "[reads] up to n bytes.".
Correct, and this is a useful feature for many kinds of processing, as it allows you to read new data as it arrives from the peer and process it incrementally, without having to know how much to expect at any point. If you do know exactly how much you expect and need to read that amount of bytes, you can use readexactly.
For write though, it seems like as long as you call drain afterwards, you know that it's all sent. The docs don't explicitly say that it must be called repeatedly, and write doesn't return anything.
This is partially correct. Yes, asyncio will automatically keep writing the data you give it in the background until all is written, so you don't need to (nor can you) ensure it by checking the return value of write.
However, a sequence of stream.write(data); await stream.drain() will not pause the coroutine until all data has been transmitted to the OS. This is because drain doesn't wait for all data to be written, it only waits until it hits a "low watermark", trying to ensure (misguidedly according to some) that the buffer never becomes empty as long as there are new writes. As far as I know, in current asyncio there is no way to wait until all data has been sent - except for manually tweaking the watermarks, which is inconvenient and which the documentation warns against. The same applies to awaiting the return value of write() introduced in Python 3.8.
This is not as bad as it sounds simply because a successful write itself doesn't guarantee that the data was actually transmitted to, let alone received by the peer - it could be languishing in the socket buffer, or in network equipment along the way. But as long as you can rely on the system to send out the data you gave it as fast as possible, you don't really care whether some of it is in an asyncio buffer or in a kernel buffer. (But you still need to await drain() to ensure backpressure.)
The one time you do care is when you are about to exit the program or the event loop; in that case, a portion of the data being stuck in an asyncio buffer means that the peer will never see it. This is why, starting with 3.7, asyncio provides a wait_closed() method which you can await after calling close() to ensure that all the data has been sent. One could imagine a flush() method that does the same, but without having to actually close the socket (analogous to the method of the same name on file objects, and with equivalent semantics), but currently there are no plans to add it.

Python 3 - multiple AsyncIO connections

I am trying to learn how to use AsyncIO in Python 3.7 and I am still a little confused by its principles.
My goal is to write a simple chat program, however I need to use a ring network topology -- one node only knows about its two neighbours. When the message is sent, it is passed by the nodes until it reaches the sender again. This means that each node is basically a client and a server at the same time.
I also need to be able to detect dead nodes, so that my ring does not break.
I thought it might be a good solution for each node to have a separate connection for every neighbour -- successor and predecessor.
class Node:
...
def run():
...
s = loop.create_connection(lambda: Client(...), addr1, port1)
p = loop.create_server(lambda: Server(...), addr2, port2)
successor = loop.run_until_complete(s)
predecessor = loop.run_until_complete(p)
loop.run_forever()
...
...
Server and Client are classes that implement asyncio.Protocol.
The reason I wanted to do it this way is, that if there is a message being sent through the circle, it is always sent from the predecessor to the successor. In connection_lost method of the predecessor I can detect that it is disconnected and send its predecessor a message (through the whole ring) to connect to me.
I would like to be able to send a message that I received from my predecessor further on to my successor. I would also like to be able to send a message with my address to my successor in case my predecessor dies (this message would be sent from predecessor's Server.connection_lost() and would be passed all the way to my dead predecessor's predecessor).
My question is: Can I pass the received data from predecessor to successor? If not, what would be a better implementation of this program that uses AsyncIO and the ring topology?
For anyone new to AsyncIO having the same problem, I found the solution myself.
First of all, it is better to use the high-level aspects of AsyncIO -- streams. Calling loop.create_connction and loop.create_server is considered low-level (which I understood wrong at first).
The high-level alternative to create_connection is asyncio.open_connection, which will supply you with a tuple consisting of asyncio.StreamReader and asyncio.StreamWriter which you can use to read from and write to the open connection. You can also detect the loss of the connection when the data read from the StreamReader equals to b'' or when you catch an exception (ConnectionError) while trying to write to the StreamWriter.
The high-level alternative to create_server is asyncio.start_server, which needs to be supplied a callback function that will be called every time a connection to the server is made (open connection, received data...). The callback has StreamReader and StreamWriter as arguments. The loss of the connection can be also detected by receiving b'' or ConnectionError on writing to the writer.
Multiple connections can be handled by coroutines. There can be a coroutine for the server part (which accepts the connection from one of the neighbors in the ring topology) and a coroutine for the client part (which opens a connection to the other neighbor in the ring). The Node class can look like this:
import asyncio
class Node:
...
async def run(self):
...
self.next_reader, self.next_writer = await asyncio.open_connection(self.next_IP, self.next_port)
server_coro = asyncio.create_task(self.server_init())
client_coro = asyncio.create_task(self.client_method())
await client_coro
await server_coro
...
async def server_init(self):
server = await asyncio.start_server(self.server_callback, self.IP, self.port)
async with server:
await server.serve_forever()
async def client_method(self):
...
try:
data = await self.next_reader.read()
except ConnectionError:
...
...
Note that I am using asyncio.create_task for the coroutines and (not here in the code listing) asyncio.run(node.run()), which are considered high-level alternatives of asyncio.ensure_future() and loop.run_forever(). Both of these were added in Python 3.7 and asyncio.run() is said to be provisional, so by the time you read this, is might already have been replaced by something else.
I'm not an AsyncIO expert, so there might be a better, cleaner way to do this (if you know it, please share it).

Infinite loop or "recursive" in Asyncio

I'm new to Python3 asyncio.
I have a function that constantly retrieves messages from a websocket connection.
I'm wondering whether I should use a while True loop or asyncio.ensure_future in a recursive manner.
Which is preferred or does it not matter?
Example:
async def foo(websocket):
while True:
msg = await websocket.recv()
print(msg)
await asyncio.sleep(0.0001)
or
async def foo(websocket):
msg = await websocket.recv()
print(msg)
await asyncio.sleep(0.0001)
asyncio.ensure_future(foo(websocket))
I would recommend the iterative variant, for two reasons:
It is easier to understand and extend. One of the benefits of coroutines compared to callback-based futures is that they permit the use of familiar control structures like if and while to model the code's execution. If you wanted to change your code to e.g. add an outer loop around the existing one, or to add some code (e.g. another loop) after the loop, that would be considerably easier in the non-recursive version.
It is more efficient. Calling asyncio.ensure_future(foo(websocket)) instantiates both a new coroutine object and a brand new task for each new iteration. While neither are particularly heavy-weight, all else being equal, it is better to avoid unnecessary allocation.

python asyncio Transport asynchronous methods vs coroutines

I'm new to asyncio and I started working with Transports to create a simple server-client program.
on the asyncio page I see the following:
Transport.close() can be called immediately after WriteTransport.write() even if data are not sent yet on the socket: both methods are asynchronous. yield from is not needed because these transport methods are not coroutines.
I searched the web (including stackoverflow) but couldn't find a good answer to the following question: what are the major differences between an asynchronous method and a coroutine?
The only 2 differences I can make are:
in coroutines you have a more fine grained control over the order of the methods the main loop executes using the yield from expression.
coroutines are generators, hence are more memory efficient.
anything else I am missing?
Thank you.
In the context asynchronous means both .write() and .close() are regular methods, not coroutines.
If .write() cannot write data immediately it stores the data in internal buffer.
.close() never closes connection immediately but schedules socket closing after all internal buffer will be sent.
So
transp.write(b'data')
transp.write(b'another data')
transp.close()
is safe and perfectly correct code.
Also .write() and .close() are not coroutines, obviously.
You should call coroutine via yield from expression, e.g. yield from coro().
But these methods are convention functions, so call it without yield from as shown in example above.

Resources