Retrieve all chained task result by id separately in Celery - python-3.x

I'm trying to retrieve the results of all chained tasks in celery that's stored in the mysql result backend.
For example, I have the following two celery tasks,
#celery.task(name='celery_fl.add')
def add(x, y, value=None):
if value is None:
try:
return x + y
except TypeError:
return None
return value
#celery.task(name='celery_fl.mul')
def mul(x, y, value=None):
if value is None:
try:
return x * y
except TypeError:
return None
return value
and here is how I chain them,
parent = (add.s(2, 2) | mul.s(8)).apply_async()
Here the output of parent.get() will be the result of the final chained task. parent.parent.get() will give me the output of the first chained task.
What I'm trying to achieve is that I'd like to get the same output using the task id at a latter stage.
task_id = 'bc5fc4b1-613e-4ef0-b5c8-900999d9a6f1'
parent = AsyncResult(task_id, app=celery)
say that the task_id I have belongs to the second task in the chained event (the parent). Then I should get the result of the first chained task if I type parent.parent.get(). But somehow I get None as the value. Is there another way I should be getting the task with task_id instead AsyncResult()?

When using a mysql backend to store results, the results of each chained task is stored separately. But the task instance is no longer available and without this it's not possible to retrieve the results of the sub tasks using the main task (Ref - Celery tasks).
So in order to retrieve the results of all tasks, the task ID of each task should be stored somewhere in the database.
An example using flask (python),
chain = (s3_init.s(order.name, order.id)|create_order_sheet.s(order.id, order.name) | create_order_info.s(order.id, order.name))
res = chain()
process = {
's3_init': res.parent.parent.parent.parent.parent.parent.id,
'order_sheet': res.parent.parent.parent.parent.id,
'order_info': res.parent.parent.parent.id
}
order.update(process_id=json.dumps(process))
Then you can simply get the task IDs from the database and use celery.result.AsyncResult(task_id) to retrieve each task by ID (ref - Async results).

Here is a solution that gets the topmost parent:
from celery import chain, Celery
app = Celery("my-tasks")
#app.task
def run_task(item_id):
res = chain(
long_task_1.s(item_id),
long_task_2.s(),
long_task_3.s(),
).delay()
while getattr(res, "parent", None):
res = res.parent
item = MyItem.objects.get(id=item_id)
item.celery_root_task_id = res.id
item.save()
return res
Then, you can retrieve all the children later with:
from celery.result import AsyncResult
root_result = AsyncResult(obj.celery_task_id, app=app)
task_results = ", ".join(
[
f"{t._cache.get('task_name')}: {t.status}"
for t, _ in root_result.collect()
]
)

Related

how to run two async methods simultaneously?

i'm ashamed to admit i've been using python's asyncio for a long time without really understanding how it works and now i'm in a pickle. in pseudo code, my current program is like this:
async def api_function1(parameters):
result = await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
result = await asyncio.gather(*[some_other_thing2(p) for p in parameters])
def a(initial_parameters):
output = []
data = asyncio.run(api_function1(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function1(get_parameters_from_data(data)))
output.append(data)
if some _condition is True:
break
return output
def b(initial_parameters):
output = []
data = asyncio.run(api_function2(initial_parameters))
output.append(data)
while True:
data = asyncio.run(api_function2(get_parameters_from_data(data)))
output.append(data)
if some condition is True:
break
return output
a() and b() get data from two different rest api endpoints, each with its own rate limits and nuances. i want to run a() and b() simultaneously.
what's the best/easiest way of structuring the program so a() and b() can run simultaneously?
I tried making a() and b() both async methods and tried to await them simultaneously, i.e. something like
async a(initial_parameters):
...
async b(initial_parameters):
...
A = await a(initial_parameters)
B = await b(initial_parameters)
but it didn't work, so based on the docs, I'm guessing maybe i need to manually get the event_loop and pass it as an argument to a() and b() which would pass them to api_function2() and api_function2(), and then close it manually when both tasks are donw but not really sure if i'm on the right track or how to do it.
Also open to better design pattern for this if you have one in mind
There is no reason why you can't nest calls to asyncio.gather. If you want to run a() and b() simultaneously, you must make both of them coroutines. And you can't use asyncio.run() inside either one of them, since that is a blocking call - it doesn't return until its argument has completed. You need to replace all the calls to asyncio.run() in a() and b() with await expressions. You will end up with something that looks like this:
async def api_function1(parameters):
return await asyncio.gather(*[some_other_thing(p) for p in parameters])
async def api_function2(parameters):
return await asyncio.gather(*[some_other_thing2(p) for p in parameters])
async def a(initial_parameters):
output = []
data = await api_function1(initial_parameters)
output.append(data)
while True:
data = await api_function1(get_parameters_from_data(data))
output.append(data)
if some _condition is True:
break
return output
async def b(initial_parameters):
output = []
data = await api_function2(initial_parameters)
output.append(data)
while True:
data = await api_function2(get_parameters_from_data(data))
output.append(data)
if some condition is True:
break
return output
async def main():
a_data, b_data = asyncio.gather(a(initial_parameters), b(initial_parameters))
async def main():
task_a = asyncio.create_task(a(initial_parameters))
task_b = asyncio.create_task(b(initial_parameters))
a_data = await task_a
b_data = await task_b
asyncio.run(main())
This is still pseudocode.
I have given two possible ways of writing main(), one using asyncio.gather and the other using two calls to asyncio.create_task. Both versions create two tasks that run simultaneously, but the latter version doesn't require you to collect all the tasks in one place and start them all at the same time, as gather does. If gather works for your requirements, as it does here, it is more convenient.
Finally, a call to asyncio.run starts the program. The docs recommend having only one call to asyncio.run per program.
The two api functions should return something instead of setting a local variable.
In asyncio the crucial concept is the Task. It is Tasks that cooperate to provide simultaneous execution. Asyncio.gather actually creates Tasks under the hood, even though you typically pass it a list of coroutines. That's how it runs things in parallel.

asyncio wait on multiple tasks with timeout and cancellation

I have some code that runs multiple tasks in a loop like this:
done, running = await asyncio.wait(running, timeout=timeout_seconds,
return_when=asyncio.FIRST_COMPLETED)
I need to be able to determine which of these timed out. According to the documentation:
Note that this function does not raise asyncio.TimeoutError. Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set.
I could use wait_for() instead, but that function only accepts a single awaitable, whereas I need to specify multiple. Is there any way to determine which one from the set of awaitables I passed to wait() was responsible for the timeout?
Alternatively, is there a way to use wait_for() with multiple awaitables?
Your can try that tricks, probably it is not good solution:
import asyncio
async def foo():
return 42
async def need_some_sleep():
await asyncio.sleep(1000)
return 42
async def coro_wrapper(coro):
result = await asyncio.wait_for(coro(), timeout=10)
return result
loop = asyncio.get_event_loop()
done, running = loop.run_until_complete(asyncio.wait(
[coro_wrapper(foo), coro_wrapper(need_some_sleep)],
return_when=asyncio.FIRST_COMPLETED
)
)
for item in done:
print(item.result())
print(done, running)
Here is how I do it:
done, pending = await asyncio.wait({
asyncio.create_task(task, name=index)
for index, task in enumerate([
my_coroutine(),
my_coroutine(),
my_coroutine(),
])
},
return_when=asyncio.FIRST_COMPLETED
)
num = next(t.get_name() for t in done)
if num == 2:
pass
Use enumerate to name the tasks as they are created.

Is there a workaround for the blocking that happens with Firebase Python SDK? Like adding a completion callback?

Recently, I have moved my REST server code in express.js to using FastAPI. So far, I've been successful in the transition until recently. I've noticed based on the firebase python admin sdk documention, unlike node.js, the python sdk is blocking. The documentation says here:
In Python and Go Admin SDKs, all write methods are blocking. That is, the write methods do not return until the writes are committed to the database.
I think this feature is having a certain effect on my code. It also could be how I've structured my code as well. Some code from one of my files is below:
from app.services.new_service import nService
from firebase_admin import db
import json
import redis
class TryNewService:
async def tryNew_func(self, request):
# I've already initialized everything in another file for firebase
ref = db.reference()
r = redis.Redis()
holdingData = await nService().dialogflow_session(request)
fulfillmentText = json.dumps(holdingData[-1])
body = await request.json()
if ("user_prelimInfo_address" in holdingData):
holdingData.append("session")
holdingData.append(body["session"])
print(holdingData)
return(holdingData)
else:
if (("Default Welcome Intent" in holdingData)):
pass
else:
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]})
print(holdingData)
return(fulfillmentText)
Is there any workaround for the blocking effect of usingref.set() line in my code? Kinda like adding a callback in node.js? I'm new to the asyncio world of python 3.
Update as of 06/13/2020: So I added following code and am now getting a RuntimeError: Task attached to a different loop. In my second else statement I do the following:
loop = asyncio.new_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as pool:
result = await loop.run_in_executor(pool, ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
print("custom thread pool:{}".format(result))
With this new RuntimeError, I would appreciate some help in figuring out.
If you want to run synchronous code inside an async coroutine, then the steps are:
loop = get_event_loop()
Note: Get and not new. Get provides current event_loop, and new_even_loop returns a new one
await loop.run_in_executor(None, sync_method)
First parameter = None -> use default executor instance
Second parameter (sync_method) is the synchronous code to be called.
Remember that resources used by sync_method need to be properly synchronized:
a) either using asyncio.Lock
b) or using asyncio.run_coroutine_threadsafe function(see an example below)
Forget for this case about ThreadPoolExecutor (that provides a way to I/O parallelism, versus concurrency provided by asyncio).
You can try following code:
loop = asyncio.get_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
result = await loop.run_in_executor(None, sync_method, ref, UserVal, holdingData)
print("custom thread pool:{}".format(result))
With a new function:
def sync_method(ref, UserVal, holdingData):
result = ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
return result
Please let me know your feedback
Note: previous code it's untested. I have only tested next minimum example (using pytest & pytest-asyncio):
import asyncio
import time
import pytest
#pytest.mark.asyncio
async def test_1():
loop = asyncio.get_event_loop()
delay = 3.0
result = await loop.run_in_executor(None, sync_method, delay)
print(f"Result = {result}")
def sync_method(delay):
time.sleep(delay)
print(f"dddd {delay}")
return "OK"
Answer #jeff-ridgeway comment:
Let's try to change previous answer to clarify how to use run_coroutine_threadsafe, to execute from a sync worker thread a coroutine that gather these shared resources:
Add loop as additional parameter in run_in_executor
Move all shared resources from sync_method to a new async_method, that is executed with run_coroutine_threadsafe
loop = asyncio.get_event_loop()
UserVal = r.hget(name='{}'.format(body["session"]), key="userId").decode("utf-8")
result = await loop.run_in_executor(None, sync_method, ref, UserVal, holdingData, loop)
print("custom thread pool:{}".format(result))
def sync_method(ref, UserVal, holdingData, loop):
coro = async_method(ref, UserVal, holdingData)
future = asyncio.run_coroutine_threadsafe(coro, loop)
future.result()
async def async_method(ref, UserVal, holdingData)
result = ref.child("users/{}".format(UserVal)).child("c_data").set({holdingData[0]:holdingData[1]}))
return result
Note: previous code is untested. And now my tested minimum example updated:
#pytest.mark.asyncio
async def test_1():
loop = asyncio.get_event_loop()
delay = 3.0
result = await loop.run_in_executor(None, sync_method, delay, loop)
print(f"Result = {result}")
def sync_method(delay, loop):
coro = async_method(delay)
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result()
async def async_method(delay):
time.sleep(delay)
print(f"dddd {delay}")
return "OK"
I hope this can be helpful
Run blocking database calls on the event loop using a ThreadPoolExecutor. See https://medium.com/#hiranya911/firebase-python-admin-sdk-with-asyncio-d65f39463916

Multiprocessing pool not taking list of strings to create individual process

I am trying to write a multiprocessing application where i have list of companies for which individual processes needs to be triggered from a process pool.
I have a function which takes 3 args out of which 1 being self, and the second being a list and third being a company code.
I am trying to process the function as a process for each company code.
I initially had problem with the self variable, which gives 'pickle' error, which for now i am overcoming by passing None.
I have used 'Partial' to multiple arguments problem in multiprocessing, after which i am getting an error "TypeError: can only concatenate str (not "list") to str" when adding a company_list as iterable to my map.
def processing_saved_search_per_company(self, saved_search_list, each_cpy):
print("Company Key : " + each_cpy)
print("Saved Search List : " + saved_search_list)
def process(self):
saved_search_list =[]
company_list = APICall.fetch_onboarded_companies_from_customer_csv(self)
saved_search_list_file = os.path.join(code_dir_path, "resources\\saved_search_template.txt")
try:
with open(saved_search_list_file, "r") as ss_file_pointer:
saved_search_list = ss_file_pointer.readlines()
except IOError as ie:
print(f"Error Occurred while accessing the Saved Search file reason being :-: {ie}")
final_ss_list = []
p=Pool(processes=4)
#for each_cpy in company_list:
print("Company List : "+str(company_list))
func = partial(APICall.processing_saved_search_per_company,None,saved_search_list)
p.map(func, company_list)
p.close()
I need the creation of a pool of process which runs like,
p1= processing_saved_search_per_company(self,saved_search_list,"company 1")
p2 = processing_saved_search_per_company(self,saved_search_list,"company 2")
p3 = processing_saved_search_per_company(self,saved_search_list,"company 3")
but getting an error as,
TypeError: can only concatenate str (not "list") to str
Requesting help on this issue.
Thanks,
Shahid
Found an workaround to this problem, like passing the Company list as well as part of the partial function and then the array of numbers to the length of Company list to my below function, which would now take index as a new parameter at the end.
processing_saved_search_per_company(self, saved_search_list, companies_list,index):
With the index now coming from the below map, I am able to run the above function for the specific company, without any hassle.
func = partial(APICall.processing_saved_search_per_company,None,saved_search_list, company_list)
Index_values =[X for X in range(0,len(companies_list))]
p.map(func, index_values)
p.close()
p.join()
Thanks

Python 3 - Example of Multithreading request using requests, asyncio and concurrent

So, I wrote the following code based on an example of a multi request using asyncio and concurrent libs, that I found here.
Basically, for a given list of names, I make some parallel requests to get the user's ID. (That's just an example of usage)
The values returned from getUserId function are stored in the 'futures' list, and you can get these values using the result() method that can be accessed from each element in the list.
This is currently a good approach?
import requests, asyncio, concurrent
async def makeMultiRequests(names):
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
executor,
getUserId,
name
)
for name in names
]
return futures
def getUserId(username):
# requests.get() user Id
return userId;
names = ['Name 1', ..., 'Name N']
loop = asyncio.get_event_loop()
resultList = loop.run_until_complete(makeMultiRequests(names))
loop.close()
for result in resultList:
print(result.result()) # Users Id printed
Best regards.

Resources