Infinite loop or "recursive" in Asyncio - python-3.x

I'm new to Python3 asyncio.
I have a function that constantly retrieves messages from a websocket connection.
I'm wondering whether I should use a while True loop or asyncio.ensure_future in a recursive manner.
Which is preferred or does it not matter?
Example:
async def foo(websocket):
while True:
msg = await websocket.recv()
print(msg)
await asyncio.sleep(0.0001)
or
async def foo(websocket):
msg = await websocket.recv()
print(msg)
await asyncio.sleep(0.0001)
asyncio.ensure_future(foo(websocket))

I would recommend the iterative variant, for two reasons:
It is easier to understand and extend. One of the benefits of coroutines compared to callback-based futures is that they permit the use of familiar control structures like if and while to model the code's execution. If you wanted to change your code to e.g. add an outer loop around the existing one, or to add some code (e.g. another loop) after the loop, that would be considerably easier in the non-recursive version.
It is more efficient. Calling asyncio.ensure_future(foo(websocket)) instantiates both a new coroutine object and a brand new task for each new iteration. While neither are particularly heavy-weight, all else being equal, it is better to avoid unnecessary allocation.

Related

Does a loop.run_in_executor functions need asyncio.lock() or threading.Lock()?

I copied the following code for my project and it's worked quite well for me but I don't really understand how the following code runs my blocking_function:
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(ThreadPoolExecutor(), blocking_function)
where on_message is called every time I receive a message. If I receive multiple messages, they are processed asynchronously.
blocking_function is a synchronous function that I don't want to be run when another blocking_function is running.Then within blocking_function, should I use threading.Lock() or asyncio.lock()?
As pointed out by dirn in the comment, in blocking_function you cannot use an asyncio.Lock because it's just not async. (The opposite also applies: you cannot lock a threading.Lock from an async function because attempting to do so would block the event loop.) If you need to guard data accessed by other instances of blocking_function, you should use a threading.Lock.
but I don't really understand how the following code runs my blocking_function
It hands off blocking_function to the thread pool you created to run it. The thread pool queues and runs the function (which happens "in the background" from your perspective), and the run_in_executor arranges the event loop to be notified when the function is done, handing off its return value as the result of the await expression.
Note that you should use None as the first argument of run_in_executor. If you use ThreadPoolExecutor(), you create a whole new thread pool for each message, and you never dispose of it. A thread pool is normally meant to be created once, and reuse a fixed number ("pool") of threads for subsequent work. None tells asyncio to use the thread pool it creates for this purpose.
It seems you can easily achieve your desired objective by ensuring a single thread is used.
A simple solution would be to ensure that all calls to blocking_function is run on a single thread. This can be easily achieved by creating a ThreadPoolExecutor object with 1 worker outside of the async function. Then every subsequent calls to the blocking function will be run on that single thread
thread_pool = ThreadPoolExecutor(max_workers=1)
#client.event
async def on_message(message):
loop = asyncio.get_event_loop()
block_response = await loop.run_in_executor(thread_pool, blocking_function)
Don't forget to shutdown the thread afterwards.

Is there any linter that detects blocking calls in an async function?

https://www.aeracode.org/2018/02/19/python-async-simplified/
It's not going to ruin your day if you call a non-blocking synchronous
function, like this:
def get_chat_id(name):
return "chat-%s" % name
async def main():
result = get_chat_id("django")
However, if you call a blocking function, like the Django ORM, the
code inside the async function will look identical, but now it's
dangerous code that might block the entire event loop as it's not
awaiting:
def get_chat_id(name):
return Chat.objects.get(name=name).id
async def main():
result = get_chat_id("django")
You can see how it's easy to have a non-blocking function that
"accidentally" becomes blocking if a programmer is not super-aware of
everything that calls it. This is why I recommend you never call
anything synchronous from an async function without doing it safely,
or without knowing beforehand it's a non-blocking standard library
function, like os.path.join.
So I am looking for a way to automatically catch instances of this mistake. Are there any linters for Python which will report sync function calls from within an async function as a violation?
Can I configure Pylint or Flake8 to do this?
I don't necessarily mind if it catches the first case above too (which is harmless).
Update:
On one level I realise this is a stupid question, as pointed out in Mikhail's answer. What we need is a definition of a "dangerous synchronous function" that the linter should detect.
So for purpose of this question I give the following definition:
A "dangerous synchronous function" is one that performs IO operations. These are the same operations which have to be monkey-patched by gevent, for example, or which have to be wrapped in async functions so that the event loop can context switch.
(I would welcome any refinement of this definition)
So I am looking for a way to automatically catch instances of this
mistake.
Let's make few things clear: mistake discussed in article is when you call any long running sync function inside some asyncio coroutine (it can be I/O blocking call or just pure CPU function with a lot of calculations). It's a mistake because it'll block whole event loop what will lead to significant performance downgrade (more about it here including comments below answer).
Is there any way to catch this situation automatically? Before run time - no, no one except you can predict if particular function will take 10 seconds or 0.01 second to execute. On run time it's already built-in asyncio, all you have to do is to enable debug mode.
If you afraid some sync function can vary between being long running (detectable in run time in debug mode) and short running (not detectable) just execute function in background thread using run_in_executor - it'll guarantee event loop will not be blocked.

Is it more efficient to use create_task(), or gather()?

I'm still at the basics of asynchronous python, and some things confuse me.
import asyncio
loop=asyncio.get_event_loop()
for variation in args:
loop.create_task(coroutine(variation))
loop.run_forever()
Seems very similar to this
import asyncio
loop=asyncio.get_event_loop()
loop.run_forever(
asyncio.gather(
coroutine(variation_1),
coroutine(variation_2),
...))
They might do the same thing, but that doesn't seem useful, so what's the difference?
As mentioned in the comments, your second example should use run_until_complete, not run_forever.
They might do the same thing, but that doesn't seem useful, so what's the difference?
asyncio.gather is a higher-level construct.
create_task submits the coroutine to the event loop, effectively allowing it to run "in the background" (provided the event loop itself is active). As the name implies, it returns a task, a handle over the execution of the coroutine, most importantly providing the ability to cancel it. You can create any number of such tasks in an event loop, and they will all run until their respective completions.
asyncio.gather is for when you are actually interested in the results of the coroutines you have spawned. It spawns them as if with create_task, allowing them to run in parallel, but also waits for all of them to complete, and then returns their respective results (or raises an exception if any of them raised one).
For example, if you have a download coroutine that downloads a URL and returns its contents, and you are downloading a list of URLs, gather allows you to match URLs to their data:
url_list = [...]
data_list = await asyncio.gather(*[download(url) for url in url_list]
# url_list and data_list now have matching elements, so this works:
for url, data in zip(url_list, data_list):
...
Doing this with just create_task would be much more complicated.

transforming a thread blocking to non thread blocking in f#

I retrieve data from the Bloomberg API, and am quite surprised by the slowness.
My computation is IO bounded by this.
Therefore I decided to use some async monad builder to unthrottle it.
Upon running it, the results are not so much better, which was obvious as I make a call to a function, NextEvent, which is thread blocking.
let outerloop args dic =
...
let rec innerloop continuetoloop =
let eventObj = session.NextEvent(); //This blocks
...
let seqtable = reader.ReadFile( #"C:\homeware\sector.csv", ";".[0], true)
let dic = ConcurrentDictionary<_,_> ()
let wf = seqtable |> Seq.mapi (fun i item -> async { outerloop item dic } )
wf |> Async.Parallel
|> Async.RunSynchronously
|> ignore
printfn "%A" ret
Is there a good way to wrap that blocking call to a nonblocking call ?
Also, why is the async framework not creating as many threads as I have requests (like 200)? when I inspect the threads from which I receive values I see only 4-5 that are used..
UPDATE
I found a compelling reason of why it will never be possible.
async operation take what is after the async instruction and schedule it somewhere in the threadpool.
for all that matters, as long as async function are use correctly, that is, always returning to the threadpool it originated from, we can consider that we are execution on a single thread.
Being on a single thread mean all that scheduling will always be executed somewhere later, and a blocking instruction has no way to avoid the fact that, eventually, once it runs, it will have to block at some point in the future the worflow.
Is there a good way to wrap that blocking call to a nonblocking call ?
No. You can never wrap blocking calls to make them non-blocking. If you could, async would just be a design pattern rather than a fundamental paradigm shift.
Also, why is the async framework not creating as many threads as I have requests (like 200)?
Async is built upon the thread pool which is designed not to create threads aggressively because they are so expensive. The whole point of (real) async is that it does not require as many threads as there are connections. You can handle 10,000 simultaneous connections with ~30 threads.
You seem to have a complete misunderstanding of what async is and what it is for. I'd suggest buying any of the F# books that cover this topic and reading up on it. In particular, your solution is not asynchronous because you just call your blocking StartGetFieldsValue member from inside your async workflow, defeating the purpose of async. You might as well just do Array.Parallel.map getFieldsValue instead.
Also, you want to use a purely functional API when doing things in parallel rather than mutating a ConcurrentDictionary in-place. So replace req.StartGetFieldsValue ret with
let! result = req.StartGetFieldsValue()
...
return result
and replace ignore with dict.
Here is a solution I made that seems to be working.
Of course it does not use only async (minus a), but Async as well.
I define a simple type that has one event, finished, and one method, asyncstart, with the method to run as an argument. it launches the method, then fires the event finished at the appropriate place (in my case I had to capture the synchronization context etc..)
Then in the consumer side, I use simply
let! completion = Async.Waitfromevent wapper.finished |> Async.StartAsChild
let! completed = completion
While running this code, on the consumer side, I use only async calls, making my code non blocking. Of course, there has to be some thread somewhere which is blocked, but this happens outside of my main serving loop which remains reactive and fit.

Should I use coroutines or another scheduling object here?

I currently have code in the form of a generator which calls an IO-bound task. The generator actually calls sub-generators as well, so a more general solution would be appreciated.
Something like the following:
def processed_values(list_of_io_tasks):
for task in list_of_io_tasks:
value = slow_io_call(task)
yield postprocess(value) # in real version, would iterate over
# processed_values2(value) here
I have complete control over slow_io_call, and I don't care in which order I get the items from processed_values. Is there something like coroutines I can use to get the yielded results in the fastest order by turning slow_io_call into an asynchronous function and using whichever call returns fastest? I expect list_of_io_tasks to be at least thousands of entries long. I've never done any parallel work other than with explicit threading, and in particular I've never used the various forms of lightweight threading which are available.
I need to use the standard CPython implementation, and I'm running on Linux.
Sounds like you are in search of multiprocessing.Pool(), specifically the Pool.imap_unordered() method.
Here is a port of your function to use imap_unordered() to parallelize calls to slow_io_call().
def processed_values(list_of_io_tasks):
pool = multiprocessing.Pool(4) # num workers
results = pool.imap_unordered(slow_io_call, list_of_io_tasks)
while True:
yield results.next(9999999) # large time-out
Note that you could also iterate over results directly (i.e. for item in results: yield item) without a while True loop, however calling results.next() with a time-out value works around this multiprocessing keyboard interrupt bug and allows you to kill the main process and all subprocesses with Ctrl-C. Also note that the StopIteration exceptions are not caught in this function but one will be raised when results.next() has no more items return. This is legal from generator functions, such as this one, which are expected to either raise StopIteration errors when there are no more values to yield or just stop yielding and a StopIteration exception will be raised on it's behalf.
To use threads in place of processes, replace
import multiprocessing
with
import multiprocessing.dummy as multiprocessing

Resources