Wait till all tasks are run in Celery python

Wait till all tasks are run in Celery python - python-3.x

I am using celery in python for asynchronous tasks. I want to capture the result of it after all the tasks assigned to all workers are done.
For that, I am using the .get() method, but the problem with get() is that all the tasks are being assigned to a single worker which is synchronous but I want the tasks to be distributed to all the workers available.
Below is my snippet.
for url in urls:
res = good_bad_urls.delay(url[1])
res.get()
return JsonResponse(some_data)
Is there any other method in celery to wait until all the tasks run asynchronously?

but the problem with get() is that all the tasks are being assigned to a single worker which is synchronous
Well, not exactly. The task distribution works just the same (even if it can seem to do otherwise), and the tasks themselves are still async. The difference is that result.get() IS a blocking call - so in your case, it waits for the current task to finish until it launches the next one.
But anyway: the solution here is to use a Group. In your case, it should look something like
jobs = group([good_bad_urls.s(url[1]) for url in urls])
async_res = jobs.apply_async()
result = async_res.get()
The get() call will now wait for all tasks to be finished, but they will be launched in parallel.

Related

Celery worker with multithreading - how to update results concurently

I created a Flask API with a Celery worker. User fires "start tests" button which makes a POST request that returns url which user can use to get results of tests every 5 seconds (needed to update fontend progress bar). The Celery task includes threading. My goal is to update Celery task state based on the results of threads concurently. I don't want to wait until all my threads finish to return their result. My Celery task looks like this:
#celery.task(bind=True) # bind argument instructs Celery to send a "self" argument and use it to record status updates
def run_tests(self, dialog_cases):
"""
Testing running as a background task
"""
results = []
test_case_no = 1
test_controller = TestController(dialog_cases)
bot_config = [test_controller.url, test_controller.headers, test_controller.db_name]
threads = []
queue = Queue()
start = time.perf_counter()
threads_list = list()
for test_case in test_controller.test_cases:
t = Thread(target=queue.put({randint(0,1000): TestCase(test_case, bot_config)}))
t.start()
threads_list.append(t)
for t in threads_list:
t.join()
results_dict_list = [queue.get() for _ in range(len(test_controller.test_cases))]
for result in results_dict_list:
for key, value in result.items():
cprint.info(f"{key}, {value.test_failed}")
Now: the TestCase is an object that on creation runs a function that makes a few iterations and afterwards returns whether the test failed or passed. I have another Flask endpoint which returns the status of the tasks. Question is how to get the value returned by threads simultanously without having to wait until they are all finished? I tried Queue but this can only return results when everything is over.

You can simply use update_state to modify state of the task, from each of those threads if that is what you want. Furthermore, you can create your own, custom states. As you want to know result of each test the moment it is finished, it seems like a good idea to have a custom state for teach test that you update from each thread durint runtime.
An alterantive is to refactor your code so each test is actually a Celery task. Then you use Chord or Group primitives to build your workflow. As you want to know the state during runtime, then perhaps Group is better because then you can monitor the state of the GroupResult object...

Python (3.7) asyncio, worker task with while loop and a custom signal handler

I'm trying to understand the pattern for indefinitely running asyncio Tasks
and the difference that a custom loop signal handler makes.
I create workers using loop.create_task() so that they run concurrently.
In my regular workers' code I am polling for data and act accordingly when data is there.
I'm trying to handle the shutdown process gracefully on a signal.
When a signal is delivered - I again create_task() with the shutdown function, so that currently running tasks continue, and shutdown gets executed in next iteration of the event loop.
Now - when a single worker's while loop doesn't actually do any IO or work then it prevents the signal handler from being executed. It never ends and does not give back execution so that other tasks could be run.
When I don't attach a custom signal handler to a loop and run this program, then a signal is delivered and the program stops. I assume it's a main thread that stops the loop itself.
This is obviously different from trying to schedule a (new) shutdown task on a running loop, because that running loop is stuck in a single coroutine which is blocked in a while loop and doesn't give back any control or time for other tasks.
Is there any standard pattern for such cases?
Do I need to asyncio.sleep() if there's no work to do, do I replace the while loop with something else (e.g. rescheduling the work function itself)?
If the range(5) is replaced with range(1, 5) then all workers do await asyncio.sleep,
but if one of them does not, then everything gets blocked. How to handle this case, is there any standard approach?
The code below illustrates the problem.
async def shutdown(loop, sig=None):
print("SIGNAL", sig)
tasks = [t for t in asyncio.all_tasks()
if t is not asyncio.current_task()]
[t.cancel() for t in tasks]
results = await asyncio.gather(*tasks, return_exceptions=True)
# handle_task_results(results)
loop.stop()
async def worker(intval):
print("start", intval)
while True:
if intval:
print("#", intval)
await asyncio.sleep(intval)
loop = asyncio.get_event_loop()
for sig in {signal.SIGINT, signal.SIGTERM}:
loop.add_signal_handler(
sig,
lambda s=sig: asyncio.create_task(shutdown(loop, sig=s)))
workers = [loop.create_task(worker(i)) for i in range(5)] # this range
loop.run_forever()

multiple concurrent threads with futures.threadpool

I am trying to run multiple threads alongside my main thread. Running one thread individually works fine with both threading.Thread and concurrent.futures.ThreadPoolExecutor.
Running two separate threads does not work at all though. One of the threads just runs the entire time, locking up both other threads. There are no "shared" resources that get locked afaik,they have nothing to do with each other (except calling the next thread), so i don't understand why this won't work
My code looks like this:
with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(function())
result = future.result()
And the function running inside the thread also calls :
function():
with concurrent.futures.ThreadPoolExecutor() as executor:
inner_result = (executor.submit(inner_function,"value")).result()
I've also tried running this function with:
t = Thread(target=function..., getting the same result.
Is there something i am missing to running multiple concurrent threads in python?

The issue was passing a result instead of the function itself to the executor.
this: executor.submit(function())
should be: executor.submit(function)

Visualizing asyncio coroutines execution

I am trying to understand how async coroutine are executed start to finish. Lets say i have this function
async def statemachine(state):
that does the following:
Read value on remote server
Write to remote mysql server
Write to local redis server
Delete a record from a remote mysql server
Create event and notify coroutine execution has finished
Since async suspends execution to give other coroutines time to execute, will the execution always start from step 1 to step 5 always.

A coroutine is always executed sequentially. Many (co)routines however can (co)operate together while being supervised by an event-loop or a scheduler of sorts.
So if you stack all your tasks in one coroutine e.g.:
async def statemachine(state):
await read_value_on_remote_server()
await write_to_remote_mysql_server()
await write_to_local_redis_server()
await delete_a_record_from_a_remote_mysql_server()
await create_event_and_notify_coroutine_execution_has_finished()
your statemachine will await each task one by one until they're done. This scenario isn't really useful, and doesn't provide any benefit over sync code.
A scenario where async execution shines is, let's say you have a web app that schedules one statemachine coroutine per user request. Now whenever a user hits your server with a request, a new coroutine is scheduled in the eventloop. And because the event loop can only run one thing at a time (pseudo concurrency), it will let each coroutine execute (let's assume using a round-robin algorithm) until they suspend, because they're awaiting an object or another coroutine that is awaiting another object.
The way a coroutine suspends is by having an await statement. This lets the event loop know that the coroutine is awaiting an operation that isn't necessarily CPU bound. e.g. network call or user input.
Thankfully, we're shielded from the details of the implementation of the eventloop and how it manages to know when a coroutine should be resumed. This is typically done using a library like Python's stdlib select https://docs.python.org/2/library/select.html.
For most use cases, you should know that a coroutine always executes sequentially and that the event-loop is what manages the execution of coroutines by using co-operative methods (unlike a typical OS scheduler for example).
If you want to run several coroutines pseudo-concurrently, you can look at asycio.gather or the more correct asyncio.create_task. Hope this helps.

Task.wait and continueWIth

I am having a task like below.
var task = Task<string>.Factory.StartNew(() => longrunningmethod()
.ContinueWith(a =>
longrunningmethodcompleted((Task<string>)a,
TaskScheduler.FromCurrentSynchronizationContext())));
task.Wait();
My task will call the longrunningmethod and after completing it will call completed method.
Inside my longrunningmethod I am delaying by Thread.Sleep(30000). When I use Task.wait system hangs and it's not calling longrunningmethodcompleted method. If I don't use Task.wait everything flows good.

I strongly suspect your context is a UI context.
In that case, you're causing the deadlock because you're telling longrunningmethodcompleted to execute on the current SynchronizationContext.
Then you're blocking that context by calling Wait. The continuation will never complete because it needs to execute on the thread that is blocked in Wait.
To fix this, you can't use Wait on a continuation running in that context. I'm assuming that longrunningmethodcompleted must run on the UI thread, so the solution is to replace the call to Wait with a call to ContinueWith:
var ui = TaskScheduler.FromCurrentSynchronizationContext();
var task = Task<string>.Factory.StartNew(() => longrunningmethod()
.ContinueWith(a =>
longrunningmethodcompleted((Task<string>)a,
ui);
task.ContinueWith(..., ui);
Or, you can upgrade to VS2012 and use async/await, which lets you write much cleaner code like this:
var task = Task.Run(() => longrunningmethod());
await task;
longrunningmethodcompleted(task);

Well it is hard to tell what is wrong with your code without seeing what the actual asynch actions are, all I know is according to MSDN waits for the task to be completed. Is it possible that because you are trying to use the current SynchronizationContext your actions blocks?
The reason I am asking is because you
Start the taskWait for the task to complete (which is the continue with task)Task tries to continue with current SynchronizationContextTask tries to acquire the main threadTask scheduled to take the thread after the Wait is completedBut Wait is waiting on current Task to complete (deadlock)
What I mean is that the reason your program works with Thread.Sleep(seconds) is because after the time limit is up the thread will continue.

Thread.Sleep(nnn) is blocking. Use Task.Delay(nnn) and await:
await Task.Delay(30000);
Edited to add: Just noted the tag says C# 4. This requires C# 5 and the new async await support. Seriously, if you're doing async and tasks, you need to upgrade.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Wait till all tasks are run in Celery python - python-3.x

Related

Celery worker with multithreading - how to update results concurently

Python (3.7) asyncio, worker task with while loop and a custom signal handler

multiple concurrent threads with futures.threadpool

Visualizing asyncio coroutines execution

Task.wait and continueWIth

Categories

Resources