What does async actually do in FastAPI? - python-3.x

I have two scripts:
from fastapi import FastAPI
import asyncio
app = FastAPI()
#app.get("/")
async def root():
a = await asyncio.sleep(10)
return {'Hello': 'World',}
And second one:
from fastapi import FastAPI
import time
app = FastAPI()
#app.get("/")
def root():
a = time.sleep(10)
return {'Hello': 'World',}
Please note the second script doesn't use async. Both scripts do the same, at first I thought, the benefit of an async script is that it allows multiple connections at once, but when testing the second code, I was able to run multiple connections as well. The results are the same, performance is the same and I don't understand why would we use async method. Would appreciate your explanation.

FastAPI Docs:
You can mix def and async def in your path operation functions as much as you need and define each one using the best option for you. FastAPI will do the right thing with them.
Anyway, in any of the cases above, FastAPI will still work asynchronously and be extremely fast.
Both endpoints will be executed asynchronously, but if you define your endpoint function asynchronously, it will allow you to use await keyword and work with asynchronous third party libraries

What's missing from the two other answers is why in your second example, the API can still handle multiple simultaneous requests. The reason can be found in this section of the FastAPI documentation:
When you declare a path operation function with normal def instead of async def, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server).
Meaning that all your def endpoints are run in separate threads, and that's why the API can parallelize them.
So why would you want to use async at all, if this feature exists? You can read e.g. this article comparing threading and async, but in summary, threads have more overhead, and each thread is still blocked on IO (e.g. external API or database calls).

Related

What can go wrong if I do not use an async context manager inside an async function?

It is possible for me to use use context managers inside asynchronous functions, for example,
async def main():
with open("Hello.txt", "w") as f:
f.write("Test123")
asyncio.run(main())
However, I did notice that some people utilize __aenter__ and __aexit__ context managers when dealing with context managers inside asynchronous functions. So far I could not provide an example where not utilizing an asynchronous context manager would result in problems. So my question is simply, how dangerous is it to not be using an asynchronous context manager inside an async function?

How can I sleep() parallely inside asyncio task if parent function isn't async?

CODE:
class App:
def __init__(self):
# some of the code
...
...
xxx.add_handler(self.event_handler, event_xyz)
asyncio.create_task(self.keep_alive())
xxx.run_until_disconnected()
def keep_alive(self):
# stuff to keep connection alive
...
...
time.sleep(5) # this will block whole script
asyncio.sleep(5) # this won't work because of lack of async on _init_ and keep_alive
async def event_handler(self):
await stuff
# other functions
if __name__ == '__main__':
App()
The part of the code that keeps the connection alive has api limits. So, I need to have the sleep statement inside keep_alive() function.
I understand that the design of the code can be completely changed to make it work but it is a big script and everything else is working perfectly. So, preferable is if this could be made to work.
I'm open to using anything else like threads as long as rest of the code isn't getting blocked during the sleep.
This is a straightforward situation. time.sleep will block the current thread, including the asyncio event loop for that thread (if there is one). Period. Case closed.
If your API requires you to have time.sleep calls, and your program must do something while the current thread is sleeping, then asyncio is not the solution. That doesn't mean that asyncio cannot be used for other threads or other purposes within your program design, but it absolutely can't run other tasks in the current thread during a time.sleep interval.
Regarding the function keep_alive in your code snippet: this function cannot be made into a task because it's not declared as "async def." Calling asyncio.sleep() from inside this type of regular function is an error; it must always be "awaited," and the "await" keyword must be inside an async def function. On the other hand, calling time.sleep inside an async def function is not an error and the function will work as expected. But it's probably not something you want to do.

How to execute parallel queries with PyGreSQL?

I am trying to run multiple queries in parallel with PyGreSQL and multiprocessing, but below code hangs without returning:
from pg import DB
from multiprocessing import Pool
from functools import partial
def create_query(table_name):
return f"""create table {table_name} (id integer);
CREATE INDEX ON {table_name} USING BTREE (id);"""
my_queries = [ create_query('foo'), create_query('bar'), create_query('baz') ]
def execute_query(conn_string, query):
con = DB(conn_string)
con.query(query)
con.close()
rs_conn_string = "host=localhost port=5432 dbname=postgres user=postgres password="
pool = Pool(processes=len(my_queries))
pool.map(partial(execute_query,rs_conn_string), my_queries)
Is there any way to make it work? Also is it possible make the 3 running queries in same "transaction" in case one query fails and the other get rolled back?
One obvious problem is that you always run the pool.map, not only in the main process, but also when the interpreters used in the parallel sub-processes import the script. You should do something like this instead:
def run_all():
with Pool(processes=len(my_queries)) as pool:
pool.map(partial(execute_query,rs_conn_string), my_queries)
if __name__ == '__main__':
run_all()
Regarding your second question, that's not possible since the transaction are per connection, which live in separate processes if you do it like that.
Asynchronous command processing might be what you want, but it is not yet supported by PyGreSQL. Psygopg + aiopg is probably better suited for doing things like that.
PyGreSql added async with the connection.poll() method. As far as pooling, I like to override MySQL.connectors pooling wrappers to handle pgdb connection objects. There’s a few ‘optional’ connection method calls that will fail that you have to comment out (I.e. checking connection status, etc. these can be implemented on the Pgdb connection object level if you want them, but the calls don’t match MySQL.connectors api interface). There’s probably some low-level bugs associated as the libs are only abstracted similarly, but this solution has been running in prod for a few months now without any problems.

Is Celery really async?

I am codig a realtime game using celery and django-channels.
I have a task that works like a timer and if this timer reaches to zero, once the task was activated, a group_send() is called. From what I see, celery tasks are async but we can't await functions inside tasks.. this makes me a little bit confused.. here is the code:
#app.task(ignore_result=True)
def count_down(channel_name):
set_random_game_result(channel_name)
room = process_game_result(channel_name, revoke=False)
channel_layer = get_channel_layer()
async_to_sync(channel_layer.group_send)(
channel_name,
{
"type": "game_room_info",
"room": room
}
)
from the docs:
By default the send(), group_send(), group_add() and other functions are async functions, meaning you have to await them. If you need to call them from synchronous code, you’ll need to use the handy asgiref.sync.async_to_sync wrapper
So if celery is async, why can't I use the group_send without using the async_to_sync util?
Another thing is about querying.. From the docs:
If you are writing asynchronous code, however, you will need to call database methods in a safe, synchronous context, using database_sync_to_async.
database_sync_to_async actually doesn't work inside the task function. Am I missing something?
The problem you are talking about is something that is done by design, for a good reason. It is also well-documented, and can be solved easily with appropriate Canvas.
Also... Do not get confused with terminology... Celery is asynchronous, but it is not "Python async" as it predates the Python's asyncio... Maybe Celery 5 will get its async parts replaced/refactored to use Python 3+ asyncio and related.

Wrapping synchronous requests into asyncio (async/await)?

I am writing a tool in Python 3.6 that sends requests to several APIs (with various endpoints) and collects their responses to parse and save them in a database.
The API clients that I use have a synchronous version of requesting a URL, for instance they use
urllib.request.Request('...
Or they use Kenneth Reitz' Requests library.
Since my API calls rely on synchronous versions of requesting a URL, the whole process takes several minutes to complete.
Now I'd like to wrap my API calls in async/await (asyncio). I'm using python 3.6.
All the examples / tutorials that I found want me to change the synchronous URL calls / requests to an async version of it (for instance aiohttp). Since my code relies on API clients that I haven't written (and I can't change) I need to leave that code untouched.
So is there a way to wrap my synchronous requests (blocking code) in async/await to make them run in an event loop?
I'm new to asyncio in Python. This would be a no-brainer in NodeJS. But I can't wrap my head around this in Python.
The solution is to wrap your synchronous code in the thread and run it that way. I used that exact system to make my asyncio code run boto3 (note: remove inline type-hints if running < python3.6):
async def get(self, key: str) -> bytes:
s3 = boto3.client("s3")
loop = asyncio.get_event_loop()
try:
response: typing.Mapping = \
await loop.run_in_executor( # type: ignore
None, functools.partial(
s3.get_object,
Bucket=self.bucket_name,
Key=key))
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
raise base.KeyNotFoundException(self, key) from e
elif e.response["Error"]["Code"] == "AccessDenied":
raise base.AccessDeniedException(self, key) from e
else:
raise
return response["Body"].read()
Note that this will work because the vast amount of time in the s3.get_object() code is spent in waiting for I/O, and (generally) while waiting for I/O python releases the GIL (the GIL is the reason that generally threads in python is not a good idea).
The first argument None in run_in_executor means that we run in the default executor. This is a threadpool executor, but it may make things more explicit to explicitly assign a threadpool executor there.
Note that, where using pure async I/O you could easily have thousands of connections open concurrently, using a threadpool executor means that each concurrent call to the API needs a separate thread. Once you run out of threads in your pool, the threadpool will not schedule your new call until a thread becomes available. You can obviously raise the number of threads, but this will eat up memory; don't expect to be able to go over a couple of thousand.
Also see the python ThreadPoolExecutor docs for an explanation and some slightly different code on how to wrap your sync call in async code.

Resources