I'm noticing that when I spawn an asyncio task using create_task, it's first completing the rest of the logic rather than starting that task. I'm forced to add an await asyncio.sleep(0) to get the task started, which seems a bit hacky and unclean to me.
Here is some example code:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
Anytime I run the above, it runs the some_very_cpu_intensive_function function before the kicking off the expensive RPCs. The only way I've gotten this to work is to do:
async def make_rpc_calls(...some args...)
val_1, val_2 = await asyncio.gather(rpc_call_1(...), rpc_call_2(...))
return process(val_1, val_2)
def some_very_cpu_intensive_function(...some args...):
// Does a lot of computation, can take 20 seconds to run
task_1 = asyncio.get_running_loop().create_task(make_rpc_calls(...))
await asyncio.sleep(0)
intensive_result = some_very_cpu_intensive_function(...)
await task_1
process(intensive_result, task_1.result())
This feels like a hack to me - I'm forcing the event loop to context switch, and doesn't feel like I'm using the asyncio framework correctly. Is there another way I should be approaching this?
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call.
Source: https://docs.python.org/3/library/asyncio-task.html
Related
My code has 2 functions:
async def blabla():
sleep(5)
And
async def blublu():
sleep(2)
asyncio.wait_for as I know can wait for one function like this:
asyncio.wait_for(blublu(), timeout=6) or asyncio.wait_for(blublu(), timeout=6)
What I wan't to do, is to make asyncio wait for both of them, and if one of them ends faster, proceed without waiting for the second one.
Is it possible to make so?
Edit: timeout is needed
Use asyncio.wait with the return_when kwarg:
# directly passing coroutine objects in `asyncio.wait`
# is deprecated since py 3.8+, wrapping into a task
blabla_task = asyncio.create_task(blabla())
blublu_task = asyncio.create_task(blublu())
done, pending = await asyncio.wait(
{blabla_task, blublu_task},
return_when=asyncio.FIRST_COMPLETED
)
# do something with the `done` set
I have some code that looks like this:
sem = asyncio.Semaphore(max_concurrency)
async def run_concurrent(invocation):
async with sem:
return await _run_invocation_with_timeout(invocation, timeout_seconds)
return await asyncio.gather(
*(run_concurrent(invocation) for invocation in invocations)
)
Behind the scenes, this gives me max_concurrency workers running in parallel. How can I get some unique identifier which distinguishes which "thread" the invocation is actually running on? The reason i want this is so that I can emit some timing information in json that can be loaded into chrome://tracing so that I can visualize the parallelism of my application.
Is it sufficient to start a counter at 0, increment it every time a task starts, and decrement it when a task finishes? Will this accurately model the way the work is scheduled by the runtime?
As user481516342 pointed out in an earlier comment, this probably isn't possible to do 100% accurately. Through a roundabout method and sacrificing 100% correctness, I came up with something that is close enough for my purposes though, and it's surprisingly simple.
My original code now looks like this:
sem = asyncio.Semaphore(max_concurrency)
tasks = list(range(1, max_concurrency+1))
async def run_concurrent(invocation):
async with sem:
invocation.task_id = tasks.pop()
result = await _run_invocation_with_timeout(invocation, timeout_seconds)
tasks.insert(0, invocation.task_id)
return result
return await asyncio.gather(
*(run_concurrent(invocation) for invocation in invocations)
)
This isn't 100% accurate because if max_concurrency is say 4 my algorithm might decide that tasks A, B, and C get ids 1, 2, and 3 whereas internally they run in 2, 3, and 4. For the purposes of visualizing the parallelism though, I think this is sufficiently close to the real thing.
I have approximately the following code
import asyncio
.
.
.
async def query_loop()
while connected:
result = await asyncio.gather(get_value1, get_value2, get_value3)
if True in result:
connected = False
async def main():
await query_loop()
asyncio.run(main())
The get_value - functions query a device, receive values, and publish them to a server. If no problems occur they return False, else True.
Now I need to implement, that the get_value2-function checks if it received the value 7. In this case I need the program to wait for 3 min before sending a special command to the device. But in the mean time, and also afterwards the query_loop should continue.
Has anybody an idea how to do that ?
thanks in advance!
If I understand you correctly, you want to modify get_value2 so that it reacts to a value received from device by spawning additional work in the background, i.e. do something without the loop in query_loop having to wait for that new work to finish.
You can use asyncio.create_task() to spawn a background task. In fact, you can always combine create_task() and await to runs things in the background; asyncio.gather is just a utility function that does it for you. In this case query_loop remains unchanged, and get_value2 gets modified like this:
async def get_value2():
...
value = await receive_value_from_device()
if value == 7:
# schedule send_command() to run, but don't wait for it
asyncio.create_task(special_command())
...
return False
async def special_command():
await asyncio.sleep(180)
await send_command_to_device(...)
Note that if get_value1 and others are async functions, the correct invocation of gather must call them, so it should be await asyncio.gather(get_value1(), get_value2(), get_value3()) (note the extra parentheses).
I'm trying to fetch some data from OpenSubtitles using asyncio and then download a file who's information is contained in that data. I want to fetch that data and download the file at the same time using asyncio.
The problem is that I want to wait for 1 task from the list tasks to finish before commencing with the rest of the tasks in the list or the download_tasks. The reason for this is that in self._perform_query() I am writing information to a file and in self._download_and_save_file() I am reading that same information from that file. So in other words, the download_tasks need to wait for at least one task in tasks to finish before starting.
I found out I can do that with asyncio.wait(return_when=FIRST_COMPLETED) but for some reason it is not working properly:
payloads = [create_payloads(entry) for entry in retreive(table_in_database)]
tasks = [asyncio.create_task(self._perform_query(payload, proxy)) for payload in payloads]
download_tasks = [asyncio.create_task(self._download_and_save_file(url, proxy) for url in url_list]
done, pending = await asyncio.wait(tasks, return_when=asyncio.FIRST_COMPLETED)
print(done)
print(len(done))
print(pending)
print(len(pending))
await asyncio.wait(download_tasks)
The output is completely different than expected. It seems that out of 3 tasks in the list tasks all 3 of them are being completed despite me passing asyncio.FIRST_COMPLETED. Why is this happening?
{<Task finished coro=<SubtitleDownloader._perform_query() done, defined at C:\Users\...\subtitles.py:71> result=None>, <Task finished coro=<SubtitleDownloader._perform_query() done, defined at C:\Users\...\subtitles.py:71> result=None>, <Task finished coro=<SubtitleDownloader._perform_query() done, defined at C:\Users\...\subtitles.py:71> result=None>}
3
set()
0
Exiting
As far as I can tell, the code in self._perform_query() shouldn't affect this problem. Here it is anyway just to make sure:
async def _perform_query(self, payload, proxy):
try:
query_result = proxy.SearchSubtitles(self.opensubs_token, [payload], {"limit": 25})
except Fault as e:
raise "A fault has occurred:\n{}".format(e)
except ProtocolError as e:
raise "A ProtocolError has occurred:\n{}".format(e)
else:
if query_result["status"] == "200 OK":
with open("dl_links.json", "w") as dl_links_json:
result = query_result["data"][0]
subtitle_name = result["SubFileName"]
download_link = result["SubDownloadLink"]
download_data = {"download link": download_link,
"file name": subtitle_name}
json.dump(download_data, dl_links_json)
else:
print("Wrong status code: {}".format(query_result["status"]))
For now, I've been testing this without running download_tasks but I have included it here for context. Maybe I am going about this problem in a completely wrong manner. If so, I would much appreciate your input!
Edit:
The problem was very simple as answered below. _perform_query wasn't an awaitable function, instead it ran synchronously. I changed that by editing the file writing part of _perform_query to be asynchronous with aiofiles:
def _perform_query(self, payload, proxy):
query_result = proxy.SearchSubtitles(self.opensubs_token, [payload], {"limit": 25})
if query_result["status"] == "200 OK":
async with aiofiles.open("dl_links.json", mode="w") as dl_links_json:
result = query_result["data"][0]
download_link = result["SubDownloadLink"]
await dl_links_json.write(download_link)
return_when=FIRST_COMPLETED doesn't guarantee that only a single task will complete. It guarantees that the wait will complete as soon as a task completes, but it is perfectly possible that other tasks complete "at the same time", which for asyncio means in the same iteration of the event loop. Consider, for example, the following code:
async def noop():
pass
async def main():
done, pending = await asyncio.wait(
[noop(), noop(), noop()], return_when=asyncio.FIRST_COMPLETED)
print(len(done), len(pending))
asyncio.run(main())
This prints 3 0, just like your code. Why?
asyncio.wait does two things: it submits the coroutines to the event loop, and it sets up callbacks to notify it when any of them is complete. However, the noop coroutine doesn't contain an await, so none of the calls to noop() suspends, each just does its thing and immediately returns. As a result, all three coroutine instances finish within the same pass of the event loop. wait is then informed that all three coroutines have finished, a fact it dutifully reports.
If you change noop to await a random sleep, e.g. change pass to await asyncio.sleep(0.1 * random.random()), you get the expected behavior. With an await the coroutines no longer complete at the same time, and wait will report the first one as soon as it detects it.
This reveals the true underlying issue with your code: _perform_query doesn't await. This indicates that you are not using an async underlying library, or that you are using it incorrectly. The call to SearchSubtitles likely simply blocks the event loop, which appears to work in trivial tests, but breaks essential asyncio features such as concurrent execution of tasks.
If I have a coroutine which runs a task which should not be cancelled, I will wrap that task in asyncio.shield().
It seems the behavior of cancel and shield is not what I would expect. If I have a task wrapped in shield and I cancel it, the await-ing coroutine returns from that await statement immediately rather than awaiting for the task to finish as shield would suggest. Additionally, the task that was run with shield continues to run but its future is now cancelled an not await-able.
From the docs:
except that if the coroutine containing it is cancelled, the Task running in something() is not cancelled. From the point of view of something(), the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a CancelledError.
These docs do not imply strongly that the caller is cancelled potentially before the callee finishes, which is the heart of my issue.
What is the proper method to shield a task from cancellation and then wait for it to complete before returning.
It would make more sense if asyncio.shield() raised the asyncio.CancelledError after the await-ed task has completed, but obviously there is some other idea going on here that I don't understand.
Here is a simple example:
import asyncio
async def count(n):
for i in range(n):
print(i)
await asyncio.sleep(1)
async def t():
try:
await asyncio.shield(count(5))
except asyncio.CancelledError:
print('This gets called at 3, not 5')
return 42
async def c(ft):
await asyncio.sleep(3)
ft.cancel()
async def m():
ft = asyncio.ensure_future(t())
ct = asyncio.ensure_future(c(ft))
r = await ft
print(r)
loop = asyncio.get_event_loop()
loop.run_until_complete(m())
# Running loop forever continues to run shielded task
# but I'd rather not do that
#loop.run_forever()
It seems the behavior of cancel and shield is not what I would expect. If I have a task wrapped in shield and I cancel it, the await-ing coroutine returns from that await statement immediately rather than awaiting for the task to finish as shield would suggest. Additionally, the task that was run with shield continues to run but its future is now cancelled an not await-able.
Conceptually shield is like a bullet-proof vest that absorbs the bullet and protects the wearer, but is itself destroyed by the impact. shield absorbs the cancellation, and reports itself as canceled, raising a CancelledError when asked for result, but allows the protected task to continue running. (Artemiy's answer explains the implementation.)
Cancellation of the future returned by shield could have been implemented differently, e.g. by completely ignoring the cancel request. The current approach ensures that the cancellation "succeeds", i.e. that the canceller can't tell that the cancellation was in fact circumvented. This is by design, and it makes the cancellation mechanism more consistent on the whole.
What is the proper method to shield a task from cancellation and then wait for it to complete before returning
By keeping two objects: the original task, and the shielded task. You pass the shielded task to whatever function it is that might end up canceling it, and you await the original one. For example:
async def coro():
print('starting')
await asyncio.sleep(2)
print('done sleep')
async def cancel_it(some_task):
await asyncio.sleep(0.5)
some_task.cancel()
print('cancellation effected')
async def main():
loop = asyncio.get_event_loop()
real_task = loop.create_task(coro())
shield = asyncio.shield(real_task)
# cancel the shield in the background while we're waiting
loop.create_task(cancel_it(shield))
await real_task
assert not real_task.cancelled()
assert shield.cancelled()
asyncio.get_event_loop().run_until_complete(main())
The code waits for the task to fully complete, despite its shield getting cancelled.
It would make more sense if asyncio.shield() raised the asyncio.CancelledError after the await-ed task has completed, but obviously there is some other idea going on here that I don't understand.
asyncio.shield
creates a dummy future, that may be cancelled
executes the wrapped coroutine as future and bind to it a callback on done to setting a result for the dummy future from the completed wrapped coroutine
returns the dummy future
You can see the implementation here
What is the proper method to shield a task from cancellation and then wait for it to complete before returning
You should shield count(5) future
async def t():
c_ft = asyncio.ensure_future(count(5))
try:
await asyncio.shield(c_ft)
except asyncio.CancelledError:
print('This gets called at 3, not 5')
await c_ft
return 42
or t() future
async def t():
await count(5)
return 42
async def m():
ft = asyncio.ensure_future(t())
shielded_ft = asyncio.shield(ft)
ct = asyncio.ensure_future(c(shielded_ft))
try:
r = await shielded_ft
except asyncio.CancelledError:
print('Shield cancelled')
r = await ft
Shielding a coroutine
we can also shield a task (but this code is about shilding a coroutine)
import asyncio
async def task1():
print("Starting task1")
await asyncio.sleep(1)
print("Ending task1")
print("SUCCESS !!")
async def task2(some_task):
print("Starting task2")
await asyncio.sleep(2)
print("Cancelling task1")
some_task.cancel()
print("Ending task2")
async def main():
# coroutines
co_task1 = task1()
# creating task from coroutines
task1_shielded = asyncio.shield(co_task1) # Create a shielded task1
task2_obj = asyncio.create_task(coro=task2(task1_shielded))
await task2_obj
await task1_shielded
asyncio.run(main())
out put:
Starting task1
Starting task2
Ending task1
SUCCESS !!
Cancelling task1
Ending task2