Multithread celery worker for task division - python-3.x

Im currently building an application that, based on some input, runs some scans.
The issue i'm encountering is that a bottleneck is being creating on some of the scans, I was wondering if there was a way to implement a different thread/worker for these tasks.
I'll ellaborate a little bit more.
I start my worker with the command
pipenv run celery -A proj worker -B -l info
### Tasks.py ###
#shared_task
def short_task_1():
return
#shared_task
def short_task_2():
return
#shared_task
def long_task_1():
return
### handler.py ###
def handle_scan():
short_task_1.delay()
short_task_2.delay()
long_task_1.delay()
A possible solution I find is assigning the short tasks to one worker and the longer ones to the other. But I cant find in the docs how to define which worker the task is assigned to with the delay() command.
Will having another worker for handling this tasks help? If another thread is the solution, whats the best way of doing it?

I ended up doing the following
delay() does not work if you are trying to use multiple task queues. Mainly because delay() is only used if the "default" queue is used. For using multiple queues, apply_async() must be used.
For example, if a task was called with .delay(arg1, arg2)
Now (With multiple queues in mind) it needs to be called with .apply_async(args=[arg1,arg2], queue='queue_name')
So, here is how I did it finally, thanks to #DejanLekic
tasks.py
#shared_task
def short_task_1():
return
#shared_task
def short_task_2():
return
#shared_task
def long_task_1():
return
Same as before. But here is the new handler
def handle_scan():
# Fast queue with args if required
short_task_1.apply_async(args=[arg1, arg2], queue='fast_queue')
short_task_2.apply_async(args=[arg1, arg2], queue='fast_queue')
# slow queue
long_task_1.apply_async(args=[arg1, arg2], queue='slow_queue')
I start the workers by doing the following (mind the pipenv):
pipenv run celery -A proj worker -B --loglevel=info -Q slow_queue,fast_queue

Related

How do I retrieve data from a Django DB before sending off Celery task to remote worker?

I have a celery shared_task that is scheduled to run at certain intervals. Every time this task is run, it needs to first retrieve data from the Django DB in order to complete the calculation. This task may or may not be sent to a celery worker that is on a separate machine, so in the celery task I can't make any queries to a local celery database.
So far I have tried using signals to accomplish it, since I know that functions with the wrapper #before_task_publish are executed before the task is even published in the message queue. However, I don't know how I can actually get the data to the task.
#shared_task
def run_computation(data):
perform_computation(data)
#before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
data = create_data()
# How do I get the data to the task from here?
Is this the right way to approach this in the first place? Or would I be better off making an API route that the celery task can get to retrieve the data?
I'm posting the solution that worked for me, thanks for the help #solarissmoke.
What works best for me is utilizing Celery "chain" callback functions and separate RabbitMQ queues for designating what would be computed locally and what would be computed on the remote worker.
My solution looks something like this:
#app.task
def create_data_task():
# this creates the data to be passed to the analysis function
return create_data()
#app.task
def perform_computation_task(data):
# This performs the computation with given data
return perform_computation(data)
#app.task
def store_results(result):
# This would store the result in the DB here, but for now we just print it
print(result)
#app.task
def run_all_computation():
task = signature("path.to.tasks.create_data_task", queue="default") | signature("path.to.tasks.perform_computation_task", queue="remote_computation") | signature("path.to.tasks.store_results", queue="default")
task()
Its important to note here that these tasks were not run serially; they were in fact separate tasks that are run by the workers and therefore do not block a single thread. The other tasks are only activated by a callback function from the others. I declared two celery queues in RabbitMQ, a default one called default, and one specifically for remote computation called "remote_computation". This is described explicitly here including how to point celery workers at created queues.
It is possible to modify the task data in place, from the before_task_publish handler, so that it gets passed to the task. I will say upfront that there are many reasons why this is not a good idea:
#before_task_publish.connect
def receiver_before_task_publish(sender=None, headers=None, body=None, **kwargs):
data = create_data()
# Modify the body of the task data.
# Body is a tuple, the first entry of which is a tuple of arguments to the task.
# So we replace the first argument (data) with our own.
body[0][0] = data
# Alternatively modify the kwargs, which is a bit more explicit
body[1]['data'] = data
This works, but it should be obvious why it's risky and prone to breakage. Assuming you have control over the task call sites I think it would be better to drop the signal altogether and just have a simple function that does the work for you, i.e.,:
def create_task(data):
data = create_data()
run_computation.delay(data)
And then in your calling code, just call create_task(data) instead of calling the task directly (which is what you presumably do right now).

Celery worker with multithreading - how to update results concurently

I created a Flask API with a Celery worker. User fires "start tests" button which makes a POST request that returns url which user can use to get results of tests every 5 seconds (needed to update fontend progress bar). The Celery task includes threading. My goal is to update Celery task state based on the results of threads concurently. I don't want to wait until all my threads finish to return their result. My Celery task looks like this:
#celery.task(bind=True) # bind argument instructs Celery to send a "self" argument and use it to record status updates
def run_tests(self, dialog_cases):
"""
Testing running as a background task
"""
results = []
test_case_no = 1
test_controller = TestController(dialog_cases)
bot_config = [test_controller.url, test_controller.headers, test_controller.db_name]
threads = []
queue = Queue()
start = time.perf_counter()
threads_list = list()
for test_case in test_controller.test_cases:
t = Thread(target=queue.put({randint(0,1000): TestCase(test_case, bot_config)}))
t.start()
threads_list.append(t)
for t in threads_list:
t.join()
results_dict_list = [queue.get() for _ in range(len(test_controller.test_cases))]
for result in results_dict_list:
for key, value in result.items():
cprint.info(f"{key}, {value.test_failed}")
Now: the TestCase is an object that on creation runs a function that makes a few iterations and afterwards returns whether the test failed or passed. I have another Flask endpoint which returns the status of the tasks. Question is how to get the value returned by threads simultanously without having to wait until they are all finished? I tried Queue but this can only return results when everything is over.
You can simply use update_state to modify state of the task, from each of those threads if that is what you want. Furthermore, you can create your own, custom states. As you want to know result of each test the moment it is finished, it seems like a good idea to have a custom state for teach test that you update from each thread durint runtime.
An alterantive is to refactor your code so each test is actually a Celery task. Then you use Chord or Group primitives to build your workflow. As you want to know the state during runtime, then perhaps Group is better because then you can monitor the state of the GroupResult object...

python3 multiprocessing.Pool with maxtasksperchild=1 does not terminate

When using multiprocessing.Pool in python 3.6 or 3.7 with maxtasksperchild=1, I noticed that some processes spawned by the pool are hanging and do not quit, even though the callback to their tasks was already executed. As a result, Pool.join() will block forever, even though all tasks are finished. In the process tree, running but idle child processes can be seen. The problem does not occur if maxtasksperchild=None.
The problem seems to be related to what the callback precisely does. The docs point out that the callback "should return immediately", as it will block other threads managing the pool.
A minimal example to reproduce this behavior on my machine is as follows: (Give it a few tries or increase the number of tasks when it does not block forever.)
from multiprocessing import Pool
from os import getpid
from random import random
from time import sleep
def do_stuff():
pass
def cb(arg):
sleep(random()) # can be replaced with print('foo')
p = Pool(maxtasksperchild=1)
number_of_tasks = 100 # a value may depend on your machine -- for mine 20 is sufficient to trigger the behavior
for i in range(number_of_tasks):
p.apply_async(do_stuff, callback=cb)
p.close()
print("joining ... (this should take just seconds)")
print("use the following command to watch the process tree:")
print(" watch -n .2 pstree -at -p %i" % getpid())
p.join()
Contrary to what I expected, p.join() in the last line will block forever even though do_stuff and cb were both called 100 times.
I am aware that sleep(random()) is in violation of the docs, but is print() also taking 'too long'? The way the docs are written suggest that a non-blocking callback function is required for performance and efficiency and make not clear that a 'slow' callback function will break the pool entirely.
Is print() forbidden in any multiprocessing.Pool callback function? (How to replace that functionality? What is "returning immediately", what is not?)
If yes, should the python documentation be updated to make this clear?
If yes, is it good python practice to rely on "fast" execution of python threads? Does this violate the rule that one should not make assumptions on execution order of threads?
Should I report this to the python bug tracker?

ProcessPoolExecutor gets stuck, ThreadPool Executor does not

I have an application that lets me select whether to use threads or processes:
def _get_future(self, workers):
if self.config == "threadpool":
self.logger.debug("using thread pools")
executor = ThreadPoolExecutor(max_workers=workers)
else:
self.logger.debug("using process pools")
executor = ProcessPoolExecutor(max_workers=workers)
return executor
Later I execute the code:
self.executor = self._get_future()
for component in components:
self.logger.debug("submitting {} to future ".format(component))
self.future_components.append(self.executor.submit
(self._send_component, component))
# Wait for all tasks to finish
while self.future_components:
self.future_components.pop().result()
When I use processes, my Applications gets stuck. The _send_component method is never called. When I use threads all works fine.
The problem is the imperative approach, this is a use case for a functional approach.
self._send_component is a member function of a class. Separate processes mean no joint memory to share variables.
The solution was to rewrite the code so that _send_component is a static method.

Terminating a QThread that wraps a function (i.e. can't poll a "wants_to_end" flag)

I'm trying to create a thread for a GUI that wraps a long-running function. My problem is thus phrased in terms of PyQt and QThreads, but I imagine the same concept could apply to standard python threads too, and would appreciate any suggestions generally.
Typically, to allow a thread to be exited while running, I understand that including a "wants_to_end" flag that is periodically checked within the thread is a good practice - e.g.:
Pseudocode (in my thread):
def run(self):
i = 0
while (not self.wants_to_end) and (i < 100):
function_step(i) # where this is some long-running function that includes many streps
i += 1
However, as my GUI is to wrap a pre-written long-running function, I cannot simply insert such a "wants_to_end" flag poll into the long running code.
Is there another way to forcibly terminate my worker thread from my main GUI (i.e. enabling me to include a button in the GUI to stop the processing)?
My simple example case is:
class Worker(QObject):
finished = pyqtSignal()
def __init__(self, parent=None, **kwargs):
super().__init__(parent)
self.kwargs = kwargs
#pyqtSlot()
def run(self):
result = SomeLongComplicatedProcess(**self.kwargs)
self.finished.emit(result)
with usage within my MainWindow GUI:
self.thread = QThread()
self.worker = Worker(arg_a=1, arg_b=2)
self.worker.finished.connect(self.doSomethingInGUI)
self.worker.moveToThread(self.thread)
self.thread.started.connect(self.worker.run)
self.thread.start()
If the long-running function blocks, the only way to forcibly stop the thread is via its terminate() method (it may also be necessary to call wait() as well). However, there is no guarantee that this will always work, and the docs also state the following:
Warning: This function is dangerous and its use is discouraged. The
thread can be terminated at any point in its code path. Threads can be
terminated while modifying data. There is no chance for the thread to
clean up after itself, unlock any held mutexes, etc. In short, use
this function only if absolutely necessary.
A much cleaner solution is to use a separate process, rather than a separate thread. In python, this could mean using the multiprocessing module. But if you aren't familiar with that, it might be simpler to run the function as a script via QProcess (which provides signals that should allow easier integration with your GUI). You can then simply kill() the worker process whenever necessary. However, if that solution is somehow unsatisfactory, there are many other IPC approaches that might better suit your requirements.

Resources