Creating non blocking restful service using aiohttp [duplicate] - python-3.x

I have tried the following code in Python 3.6 for asyncio:
Example 1:
import asyncio
import time
async def hello():
print('hello')
await asyncio.sleep(1)
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output is as expected:
hello
hello
hello again
hello again
Then I want to change the asyncio.sleep into another def:
async def sleep():
time.sleep(1)
async def hello():
print('hello')
await sleep()
print('hello again')
tasks=[hello(),hello()]
loop=asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
Output:
hello
hello again
hello
hello again
It seems it is not running in an asynchronous mode, but a normal sync mode.
The question is: Why is it not running in an asynchronous mode and how can I change the old sync module into an 'async' one?

Asyncio uses an event loop, which selects what task (an independent call chain of coroutines) in the queue to activate next. The event loop can make intelligent decisions as to what task is ready to do actual work. This is why the event loop also is responsible for creating connections and watching file descriptors and other I/O primitives; it gives the event loop insight into when there are I/O operations in progress or when results are available to process.
Whenever you use await, there is an opportunity to return control to the loop which can then pass control to another task. Which task then is picked for execution depends on the exact implementation; the asyncio reference implementation offers multiple choices, but there are other implementations, such as the very, very efficient uvloop implementation.
Your sample is still asynchronous. It just so happens that by replacing the await.sleep() with a synchronous time.sleep() call, inside a new coroutine function, you introduced 2 coroutines into the task callchain that don't yield, and thus influenced in what order they are executed. That they are executed in what appears to be synchronous order is a coincidence. If you switched event loops, or introduced more coroutines (especially some that use I/O), the order can easily be different again.
Moreover, your new coroutines use time.sleep(); this makes your coroutines uncooperative. The event loop is not notified that your code is waiting (time.sleep() will not yield!), so no other coroutine can be executed while time.sleep() is running. time.sleep() simply doesn't return or lets any other code run until the requested amount of time has passed. Contrast this with the asyncio.sleep() implementation, which simply yields to the event loop with a call_later() hook; the event loop now knows that that task won't need any attention until a later time.
Also see asyncio: why isn't it non-blocking by default for a more in-depth discussion of how tasks and the event loop interact. And if you must run blocking, synchronous code that can't be made to cooperate, then use an executor pool to have the blocking code executed in a separate tread or child process to free up the event loop for other, better behaved tasks.

Related

How to access data objects from outside of infinitely running asyncio event loop?

Currently working with a server simulator that is initialized via a call to asyncio.run().
After calling await on a number of coroutines, it is designed to run as an infinite loop that serves continuously.
One of the coroutines that inits the server creates a ModbusTcpServer object that I need to access from the top level (the file that initially makes the call to asyncio.run()). How can I access this data if the coroutines are designed to run infinitely and never return? I'd ultimately like to get the data so I can change values in a GUI such as PyQT5. Here is sample of my code:
def start_server():
cmd_args = get_commandline()
run_args = setup_simulator(cmd_args)
asyncio.run(run_server_simulator(run_args), debug=True)
######
async def run_server_simulator(args):
await StartAsyncTcpServer(
context=args.context,
address=("", args.port),
framer=args.framer,
allow_reuse_address=True,
)
The StartAsyncTcpServer() function creates the object that I want to modify and then awaits a coroutine that creates a task that runs forever.
I've tried manually creating an event loop using
asyncio.new_event_loop()
yet this still doesn't provide me access to the data from within the coroutines, only the loop itself. Not only that, but doing this gives me an error that my first coroutine was never awaited.
I've also attempted initializing the first of my coroutines via asyncio.create_task(), and calling await on the task object but awaiting them never provides return values because the coroutines never finish.

How can I sleep() parallely inside asyncio task if parent function isn't async?

CODE:
class App:
def __init__(self):
# some of the code
...
...
xxx.add_handler(self.event_handler, event_xyz)
asyncio.create_task(self.keep_alive())
xxx.run_until_disconnected()
def keep_alive(self):
# stuff to keep connection alive
...
...
time.sleep(5) # this will block whole script
asyncio.sleep(5) # this won't work because of lack of async on _init_ and keep_alive
async def event_handler(self):
await stuff
# other functions
if __name__ == '__main__':
App()
The part of the code that keeps the connection alive has api limits. So, I need to have the sleep statement inside keep_alive() function.
I understand that the design of the code can be completely changed to make it work but it is a big script and everything else is working perfectly. So, preferable is if this could be made to work.
I'm open to using anything else like threads as long as rest of the code isn't getting blocked during the sleep.
This is a straightforward situation. time.sleep will block the current thread, including the asyncio event loop for that thread (if there is one). Period. Case closed.
If your API requires you to have time.sleep calls, and your program must do something while the current thread is sleeping, then asyncio is not the solution. That doesn't mean that asyncio cannot be used for other threads or other purposes within your program design, but it absolutely can't run other tasks in the current thread during a time.sleep interval.
Regarding the function keep_alive in your code snippet: this function cannot be made into a task because it's not declared as "async def." Calling asyncio.sleep() from inside this type of regular function is an error; it must always be "awaited," and the "await" keyword must be inside an async def function. On the other hand, calling time.sleep inside an async def function is not an error and the function will work as expected. But it's probably not something you want to do.

Does a loop.run_in_executor functions need asyncio.lock() or threading.Lock()?

I copied the following code for my project and it's worked quite well for me but I don't really understand how the following code runs my blocking_function:
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(ThreadPoolExecutor(), blocking_function)
where on_message is called every time I receive a message. If I receive multiple messages, they are processed asynchronously.
blocking_function is a synchronous function that I don't want to be run when another blocking_function is running.Then within blocking_function, should I use threading.Lock() or asyncio.lock()?
As pointed out by dirn in the comment, in blocking_function you cannot use an asyncio.Lock because it's just not async. (The opposite also applies: you cannot lock a threading.Lock from an async function because attempting to do so would block the event loop.) If you need to guard data accessed by other instances of blocking_function, you should use a threading.Lock.
but I don't really understand how the following code runs my blocking_function
It hands off blocking_function to the thread pool you created to run it. The thread pool queues and runs the function (which happens "in the background" from your perspective), and the run_in_executor arranges the event loop to be notified when the function is done, handing off its return value as the result of the await expression.
Note that you should use None as the first argument of run_in_executor. If you use ThreadPoolExecutor(), you create a whole new thread pool for each message, and you never dispose of it. A thread pool is normally meant to be created once, and reuse a fixed number ("pool") of threads for subsequent work. None tells asyncio to use the thread pool it creates for this purpose.
It seems you can easily achieve your desired objective by ensuring a single thread is used.
A simple solution would be to ensure that all calls to blocking_function is run on a single thread. This can be easily achieved by creating a ThreadPoolExecutor object with 1 worker outside of the async function. Then every subsequent calls to the blocking function will be run on that single thread
thread_pool = ThreadPoolExecutor(max_workers=1)
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(thread_pool, blocking_function)
Don't forget to shutdown the thread afterwards.

Is there any linter that detects blocking calls in an async function?

https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)
So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.

Visualizing asyncio coroutines execution

I am trying to understand how async coroutine are executed start to finish. Lets say i have this function
async def statemachine(state):
that does the following:
Read value on remote server
Write to remote mysql server
Write to local redis server
Delete a record from a remote mysql server
Create event and notify coroutine execution has finished
Since async suspends execution to give other coroutines time to execute, will the execution always start from step 1 to step 5 always.
A coroutine is always executed sequentially. Many (co)routines however can (co)operate together while being supervised by an event-loop or a scheduler of sorts.
So if you stack all your tasks in one coroutine e.g.:
async def statemachine(state):
await read_value_on_remote_server()
await write_to_remote_mysql_server()
await write_to_local_redis_server()
await delete_a_record_from_a_remote_mysql_server()
await create_event_and_notify_coroutine_execution_has_finished()
your statemachine will await each task one by one until they're done. This scenario isn't really useful, and doesn't provide any benefit over sync code.
A scenario where async execution shines is, let's say you have a web app that schedules one statemachine coroutine per user request. Now whenever a user hits your server with a request, a new coroutine is scheduled in the eventloop. And because the event loop can only run one thing at a time (pseudo concurrency), it will let each coroutine execute (let's assume using a round-robin algorithm) until they suspend, because they're awaiting an object or another coroutine that is awaiting another object.
The way a coroutine suspends is by having an await statement. This lets the event loop know that the coroutine is awaiting an operation that isn't necessarily CPU bound. e.g. network call or user input.
Thankfully, we're shielded from the details of the implementation of the eventloop and how it manages to know when a coroutine should be resumed. This is typically done using a library like Python's stdlib select https://docs.python.org/2/library/select.html.
For most use cases, you should know that a coroutine always executes sequentially and that the event-loop is what manages the execution of coroutines by using co-operative methods (unlike a typical OS scheduler for example).
If you want to run several coroutines pseudo-concurrently, you can look at asycio.gather or the more correct asyncio.create_task. Hope this helps.

Resources