Task with bind=True
I have a celery task that runs a computation lasting a few seconds.
from celery import states
#celery.task(name="crunch.task", bind=True)
def crunch(self, data):
try:
pass
# ... run computation with data here
except Exception as exc:
self.update_state(
state=states.FAILURE,
meta={"details": "error details here"}
)
raise exc
The important feature here is I'm using bind=True which passes the task into the function as the self parameter. This allows access to the task's update_state method which is great for error handling.
Group job
I now want to run this task in a batch job using celery.group.
#celery.task(name="batch.task", bind=True)
def batch(self, data_list):
try:
# HERE! IT SEEM WRONG TO PASS `SELF` INTO THE CHILD TASKS
job = group([crunch(self, data) for data in data_list]) # <-- here `self` should be created by celery!
job.async_apply()
except Exception as exc:
self.update_state(
state=states.FAILURE,
meta={"details": "error details here"}
)
raise exc
How can I make a celery.group composed of tasks with bind=True?
You need to work with signatures,
crunch.si(data)
Related
I've recently started working on python and its related concurrency aspects and I'm banging my head around asyncio.
Structure of Data: List of companies with users-list in them.
Goal: I want to execute gRPC calls in parallel, with a task to always run for a particular company. Also, the API call is on users-list, and is a batch call [not a single call for one company]
Ref I followed: https://docs.python.org/3/library/asyncio-queue.html [Modified a bit according to my use-case]
What I've done: Below 3 small functions with process_cname_vs_users having input of company_id vs users-list
async def update_data(req_id, user_ids, company_id): # <-- THIS IS THE ASYNC CALL ON CHUNK OF SOME USERS
# A gRPC call to server here.
async def worker(worker_id, queue, company_vs_user_ids):
while True:
company_id = await queue.get()
user_ids = cname_vs_user_ids.get(company_id)
user_ids_chunks = get_data_in_chunks(user_ids, 20)
for user_id_chunk in user_ids_chunks:
try:
await update_data(user_id_chunk, company_id)
except Exception as e:
print("error: {}".format(e))
# Notify the queue that the "work item" has been processed.
queue.task_done()
async def process_cname_vs_users(cname_vs_user_ids):
queue = asyncio.Queue()
for company_id in cname_vs_user_ids:
queue.put_nowait(company_id)
tasks = []
for i in range(5): # <- number of workers
task = asyncio.create_task(
worker(i, queue, cname_vs_user_ids))
tasks.append(task)
# Wait until the queue is fully processed.
await queue.join()
# Cancel our worker tasks.
for task in tasks:
task.cancel()
try:
# Wait until all worker tasks are cancelled.
responses = await asyncio.gather(*tasks, return_exceptions=True)
print('Results:', responses)
except Exception as e:
print('Got an exception:', e)
Expectation: The tasks should be executed concurrently for 5 (no. of workers) companies.
Reality: Only first task is doing work for all companies sequentially.
Any help/suggestions will be helpful. Thanks in advance :)
So, finally, I figured out after reading more about concurrency in python.
I have used futures.ThreadPoolExecutor as of now to achieve the desired output.
Solution:
async def update_data(req_id, user_ids, company_id): # <-- THIS IS THE ASYNC CALL ON CHUNK OF SOME USERS
# A gRPC call to server here.
async def worker(cname_vs_user_ids):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_summary = {executor.submit(update_data, req_id, user_items, company_id) for
company_id, user_items in
cname_vs_user_ids.items()}
for future in concurrent.futures.as_completed(future_summary):
try:
response = future.result()
except Exception as error:
print("ReqId: {}, Error occurred: {}".format(req_id, str(error)))
executor.shutdown()
raise error
async def process_cname_vs_users(cname_vs_user_ids):
loop = asyncio.get_event_loop()
loop.run_until_complete(worker(cname_vs_user_ids))
The above solution worked wonders for me.
I am trying to set up a delayed task task whose timing will depend on several parameters (either passed to it or obtained from a redis database).
the pseudocode would look like this:
def main():
scheduler = BackgroundScheduler()
scheduler.add_job(delayed_task,
id=task_id,
next_run_time=somedate,
args=(task_id, some_data))
scheduler.start()
do_something_else()
def delayed_task(id, passed_data):
rd = connect_to_redis()
redis_data = rd.fetch_data(id)
publish_data(passed_data, redis_data)
updated_run_time = parse(redis_data)
#obtain a scheduler object here
scheduler.modify_job(id, next_run_time=updated_run_time)
The question is the following: is there a way to access the scheduler from a task?
The scheduler cannot be passed as parameter to the task, as this will raise
TypeError: can't pickle _thread.lock objects
For the same reason, I can't put all this in a class and have it called a method since the arguments of the method include self, which is the class containing the scheduler, and will thus result in the same issue.
Is it possible to regain an instance of the scheduler from outside, like I can generate a new connexion to redis?
I am building a worker class that connects to an external event stream using asyncio. It is a single stream, but several consumer may enable it. The goal is to only maintain the connection while one or more consumer requires it.
My requirements are as follow:
The worker instance is created dynamically first time a consumer requires it.
When other consumers then require it, they re-use the same worker instance.
When the last consumer closes the stream, it cleans up its resources.
This sounds easy enough. However, the startup sequence is causing me issues, because it is itself asynchronous. Thus, assuming this interface:
class Stream:
async def start(self, *, timeout=DEFAULT_TIMEOUT):
pass
async def stop(self):
pass
I have the following scenarios:
Scenario 1 - exception at startup
Consumer 1 requests worker to start.
Worker startup sequence begins
Consumer 2 requests worker to start.
Worker startup sequence raises an exception.
Both consumer should see the exception as the result of their call to start().
Scenario 2 - partial asynchronous cancellation
Consumer 1 requests worker to start.
Worker startup sequence begins
Consumer 2 requests worker to start.
Consumer 1 gets cancelled.
Worker startup sequence completes.
Consumer 2 should see a successful start.
Scenario 3 - complete asynchronous cancellation
Consumer 1 requests worker to start.
Worker startup sequence begins
Consumer 2 requests worker to start.
Consumer 1 gets cancelled.
Consumer 2 gets cancelled.
Worker startup sequence must be cancelled as a result.
I struggle to cover all scenarios without getting any race condition and a spaghetti mess of either bare Future or Event objects.
Here is an attempt at writing start(). It relies on _worker() setting an asyncio.Event named self._worker_ready when it completes the startup sequence:
async def start(self, timeout=None):
assert not self.closing
if not self._task:
self._task = asyncio.ensure_future(self._worker())
# Wait until worker is ready, has failed, or timeout triggers
try:
self._waiting_start += 1
wait_ready = asyncio.ensure_future(self._worker_ready.wait())
done, pending = await asyncio.wait(
[self._task, wait_ready],
return_when=asyncio.FIRST_COMPLETED, timeout=timeout
)
except asyncio.CancelledError:
wait_ready.cancel()
if self._waiting_start == 1:
self.closing = True
self._task.cancel()
with suppress(asyncio.CancelledError):
await self._task # let worker shutdown
raise
finally:
self._waiting_start -= 1
# worker failed to start - either throwing or timeout triggering
if not self._worker_ready.is_set():
self.closing = True
self._task.cancel()
wait_ready.cancel()
try:
await self._task # let worker shutdown
except asyncio.CancelledError:
raise FeedTimeoutError('stream failed to start within %ss' % timeout)
else:
assert False, 'worker must propagate the exception'
That seems to work, but it seems too complex, and is really hard to test: the worker has many await points, leading to combinatoric explosion if I am to try all possible cancellation points and execution orders.
I need a better way. I am thus wondering:
Are my requirements reasonable?
Is there a common pattern to do this?
Does my question raise some code smell?
Your requirements sound reasonable. I would try to simplify start by replacing Event with a future (in this case a task), using it to both wait for the startup to finish and to propagate exceptions that occur during its course, if any. Something like:
class Stream:
async def start(self, *, timeout=DEFAULT_TIMEOUT):
loop = asyncio.get_event_loop()
if self._worker_startup_task is None:
self._worker_startup_task = \
loop.create_task(self._worker_startup())
self._add_user()
try:
await asyncio.shield(asyncio.wait_for(
self._worker_startup_task, timeout))
except:
self._rm_user()
raise
async def _worker_startup(self):
loop = asyncio.get_event_loop()
await asyncio.sleep(1) # ...
self._worker_task = loop.create_task(self._worker())
In this code the worker startup is separated from the worker coroutine, and is also moved to a separate task. This separate task can be awaited and removes the need for a dedicated Event, but more importantly, it allows scenarios 1 and 2 to be handled by the same code. Even if someone cancels the first consumer, the worker startup task will not be canceled - the cancellation just means that there is one less consumer waiting for it.
Thus in case of consumer cancellation, await self._worker_startup_task will work just fine for other consumers, whereas in case of an actual exception in worker startup, all other waiters will see the same exception because the task will have completed.
Scenario 3 should work automatically because we always cancel the startup that can no longer be observed by a consumer, regardless of the reason. If the consumers are gone because the startup itself has failed, then self._worker_startup_task will have completed (with an exception) and its cancellation will be a no-op. If it is because all consumers have been themselves canceled while awaiting the startup, then self._worker_startup_task.cancel() will cancel the startup sequence, as required by scenario 3.
The rest of the code would look like this (untested):
def __init__(self):
self._users = 0
self._worker_startup = None
def _add_user(self):
self._users += 1
def _rm_user(self):
self._users -= 1
if self._users:
return
self._worker_startup_task.cancel()
self._worker_startup_task = None
if self._worker_task is not None:
self._worker_task.cancel()
self._worker_task = None
async def stop(self):
self._rm_user()
async def _worker(self):
# actual worker...
while True:
await asyncio.sleep(1)
With my previous tests and integrating suggestions from #user4815162342 I came up with a re-usable solution:
st = SharedTask(test())
task1 = asyncio.ensure_future(st.wait())
task2 = asyncio.ensure_future(st.wait(timeout=15))
task3 = asyncio.ensure_future(st.wait())
This does the right thing: task2 cancels itself after 15s. Cancelling tasks has no effect on test() unless they all get cancelled. In that case, the last task to get cancelled will manually cancel test() and wait for cancellation handling to complete.
If passed a coroutine, it is only scheduled when first task starts waiting.
Lastly, awaiting the shared task after it has completed simply yields its result immediately (seems obvious, but initial version did not).
import asyncio
from contextlib import suppress
class SharedTask:
__slots__ = ('_clients', '_task')
def __init__(self, task):
if not (asyncio.isfuture(task) or asyncio.iscoroutine(task)):
raise TypeError('task must be either a Future or a coroutine object')
self._clients = 0
self._task = task
#property
def started(self):
return asyncio.isfuture(self._task)
async def wait(self, *, timeout=None):
self._task = asyncio.ensure_future(self._task)
self._clients += 1
try:
return await asyncio.wait_for(asyncio.shield(self._task), timeout=timeout)
except:
self._clients -= 1
if self._clients == 0 and not self._task.done():
self._task.cancel()
with suppress(asyncio.CancelledError):
await self._task
raise
def cancel(self):
if asyncio.iscoroutine(self._task):
self._task.close()
elif asyncio.isfuture(self._task):
self._task.cancel()
The re-raising of task exception cancellation (mentioned in comments) is intentional. It allows this pattern:
async def my_task():
try:
await do_stuff()
except asyncio.CancelledError as exc:
await flush_some_stuff() # might raise an exception
raise exc
The clients can cancel the shared task and handle an exception that might arise as a result, it will work the same whether my_task is wrapped in a SharedTask or not.
I have a loading widget that consists of two labels, one is the status label and the other one is the label that the animated gif will be shown in. If I call show() method before heavy stuff gets processed, the gif at the loading widget doesn't update itself at all. There's nothing wrong with the gif btw(looping problems etc.). The main code(caller) looks like this:
self.loadingwidget = LoadingWidgetForm()
self.setCentralWidget(self.loadingwidget)
self.loadingwidget.show()
...
...
heavy stuff
...
...
self.loadingwidget.hide()
The widget class:
class LoadingWidgetForm(QWidget, LoadingWidget):
def __init__(self, parent=None):
super().__init__(parent=parent)
self.setupUi(self)
self.setWindowFlags(self.windowFlags() | Qt.FramelessWindowHint)
self.setAttribute(Qt.WA_TranslucentBackground)
pince_directory = SysUtils.get_current_script_directory() # returns current working directory
self.movie = QMovie(pince_directory + "/media/loading_widget_gondola.gif", QByteArray())
self.label_Animated.setMovie(self.movie)
self.movie.setScaledSize(QSize(50, 50))
self.movie.setCacheMode(QMovie.CacheAll)
self.movie.setSpeed(100)
self.movie.start()
self.not_finished=True
self.update_thread = Thread(target=self.update_widget)
self.update_thread.daemon = True
def showEvent(self, QShowEvent):
QApplication.processEvents()
self.update_thread.start()
def hideEvent(self, QHideEvent):
self.not_finished = False
def update_widget(self):
while self.not_finished:
QApplication.processEvents()
As you see I tried to create a seperate thread to avoid workload but it didn't make any difference. Then I tried my luck with the QThread class by overriding the run() method but it also didn't work. But executing QApplication.processEvents() method inside of the heavy stuff works well. I also think I shouldn't be using seperate threads, I feel like there should be a more elegant way to do this. The widget looks like this btw:
Processing...
Full version of the gif:
Thanks in advance! Have a good day.
Edit: I can't move the heavy stuff to a different thread due to bugs in pexpect. Pexpect's spawn() method requires spawned object and any operations related with the spawned object to be in the same thread. I don't want to change the working flow of the whole program
In order to update GUI animations, the main Qt loop (located in the main GUI thread) has to be running and processing events. The Qt event loop can only process a single event at a time, however because handling these events typically takes a very short time control is returned rapidly to the loop. This allows the GUI updates (repaints, including animation etc.) to appear smooth.
A common example is having a button to initiate loading of a file. The button press creates an event which is handled, and passed off to your code (either via events directly, or via signals). Now the main thread is in your long-running code, and the event loop is stalled — and will stay stalled until the long-running job (e.g. file load) is complete.
You're correct that you can solve this with threads, but you've gone about it backwards. You want to put your long-running code in a thread (not your call to processEvents). In fact, calling (or interacting with) the GUI from another thread is a recipe for a crash.
The simplest way to work with threads is to use QRunner and QThreadPool. This allows for multiple execution threads. The following wall of code gives you a custom Worker class that makes it simple to handle this. I normally put this in a file threads.py to keep it out of the way:
import sys
from PyQt5.QtCore import QObject, QRunnable
class WorkerSignals(QObject):
'''
Defines the signals available from a running worker thread.
error
`tuple` (exctype, value, traceback.format_exc() )
result
`dict` data returned from processing
'''
finished = pyqtSignal()
error = pyqtSignal(tuple)
result = pyqtSignal(dict)
class Worker(QRunnable):
'''
Worker thread
Inherits from QRunnable to handler worker thread setup, signals and wrap-up.
:param callback: The function callback to run on this worker thread. Supplied args and
kwargs will be passed through to the runner.
:type callback: function
:param args: Arguments to pass to the callback function
:param kwargs: Keywords to pass to the callback function
'''
def __init__(self, fn, *args, **kwargs):
super(Worker, self).__init__()
# Store constructor arguments (re-used for processing)
self.fn = fn
self.args = args
self.kwargs = kwargs
self.signals = WorkerSignals()
#pyqtSlot()
def run(self):
'''
Initialise the runner function with passed args, kwargs.
'''
# Retrieve args/kwargs here; and fire processing using them
try:
result = self.fn(*self.args, **self.kwargs)
except:
traceback.print_exc()
exctype, value = sys.exc_info()[:2]
self.signals.error.emit((exctype, value, traceback.format_exc()))
else:
self.signals.result.emit(result) # Return the result of the processing
finally:
self.signals.finished.emit() # Done
To use the above, you need a QThreadPool to handle the threads. You only need to create this once, for example during application initialisation.
threadpool = QThreadPool()
Now, create a worker by passing in the Python function to execute:
from .threads import Worker # our custom worker Class
worker = Worker(fn=<Python function>) # create a Worker object
Now attach signals to get back the result, or be notified of an error:
worker.signals.error.connect(<Python function error handler>)
worker.signals.result.connect(<Python function result handler>)
Then, to execute this Worker, you can just pass it to the QThreadPool.
threadpool.start(worker)
Everything will take care of itself, with the result of the work returned to the connected signal... and the main GUI loop will be free to do it's thing!
I have an implementation of a BackgroundTask object that looks like the following:
class BackgroundTask(QObject):
'''
A utility class that makes running long-running tasks in a separate thread easier
:type task: callable
:param task: The task to run
:param args: positional arguments to pass to task
:param kawrgs: keyword arguments to pass to task
.. warning :: There is one **MAJOR** restriction in task: It **cannot** interact with any Qt GUI objects.
doing so will cause the GUI to crash. This is a limitation in Qt's threading model, not with
this class's design
'''
finished = pyqtSignal() #: Signal that is emitted when the task has finished running
def __init__(self, task, *args, **kwargs):
super(BackgroundTask, self).__init__()
self.task = task #: The callable that does the actual task work
self.args = args #: positional arguments passed to task when it is called
self.kwargs = kwargs #: keyword arguments pass to task when it is called
self.results= None #: After :attr:`finished` is emitted, this will contain whatever value :attr:`task` returned
def runTask(self):
'''
Does the actual calling of :attr:`task`, in the form ``task(*args, **kwargs)``, and stores the returned value
in :attr:`results`
'''
flushed_print('Running Task')
self.results = self.task(*self.args, **self.kwargs)
flushed_print('Got My Results!')
flushed_print('Emitting Finished!')
self.finished.emit()
def __repr__(self):
return '<BackgroundTask(task = {}, {}, {})>'.format(self.task, *self.args, **self.kwargs)
#staticmethod
def build_and_run_background_task(thread, finished_notifier, task, *args, **kwargs):
'''
Factory method that builds a :class:`BackgroundTask` and runs it on a thread in one call
:type finished_notifier: callable
:param finished_notifier: Callback that will be called when the task has completed its execution. Signature: ``func()``
:rtype: :class:`BackgroundTask`
:return: The created :class:`BackgroundTask` object, which will be running in its thread.
Once finished_notifier has been called, the :attr:`results` attribute of the returned :class:`BackgroundTask` should contain
the return value of the input task callable.
'''
flushed_print('Setting Up Background Task To Run In Thread')
bg_task = BackgroundTask(task, *args, **kwargs)
bg_task.moveToThread(thread)
bg_task.finished.connect(thread.quit)
thread.started.connect(bg_task.runTask)
thread.finished.connect(finished_notifier)
thread.start()
flushed_print('Thread Started!')
return bg_task
As my docstrings indicate, this should allow me to pass an arbitrary callable and its arguments to build_and_run_background_task, and upon completion of the task it should call the callable passed as finished_notifier and kill the thread. However, when I run it with the following as finished_notifier
def done():
flushed_print('Done!')
I get the following output:
Setting Up Background Task To Run In Thread
Thread Started!
Running Task
Got My Results!
Emitting Finished!
And that's it. The finished_notifier callback is never executed, and the thread's quit method is never called, suggesting the finshed signal in BackgroundTask isn't actually being emitted. If, however, I bind to finshed directly and call runTask directly (not in a thread), everything works as expected. I'm sure I just missed something stupid, any suggestions?
Figured out the problem myself, I needed to call qApp.processEvents() where another point in the application was waiting for this operation to finish. I had been testing on the command-line as well and that had masked the same problem.