Determine which Celery workers are consuming jobs and telling them to stop - multithreading

Scenario: How to I gracefully tell a worker to stop accepting new jobs and identify when they are finished processing the current jobs as to shut them down as workers are coming online.
Details (Feel free to correct any of my assumptions):
Here is snippet of my current queue.
As you can see I have 2 exchange queues for the workers (I believe this is the *.pidbox), 2 queues representing celeryev on each host (yes I know I only need one), and one default celery queue. Clearly I have 90+ jobs in this queue.
(Side Question) Where do you go to find the worker consuming the job from the Management console? I know I can look at djcelery and figure that out.
So.. I know there are jobs running on each host - I can't shut celery off those machines as it will kill the jobs running (and any pending?).
How do I stop any further processing of new jobs while allowing those jobs still running to complete? I know that on each host I can stop celery but that will kill any current jobs running as well. I want to say to the 22 jobs in the hopper to halt.
Thanks!!

Related

Change the celery workers' code in production without task loss

I have a system that has important long-running tasks which are executed by Celery workers. Assume that we have deployed our application using k8s or docker-compose.
How can I change the celery workers' code in production without losing the tasks that they are currently executing?
In another word, I want an elegant automated way to execute all unfinished tasks with the new workers.
I’m using Redis 4.3.3 as the broker and my Celery version is 5.2.7.
I have added result_backend and tried following settings but Celery didn't reschedule the running tasks after I ran "docker-compose restart worker_service_name".
CELERY_ACKS_LATE = True
CELERY_TASK_REJECT_ON_WORKER_LOST = True
This answer should provide some information about running on Kubernetes.
In addition, I would recommend adding (doc):
CELERYD_PREFETCH_MULTIPLIER = 1
How many messages to prefetch at a time multiplied by the number of
concurrent processes. The default is 4 (four messages for each
process). The default setting is usually a good choice, however – if
you have very long running tasks waiting in the queue and you have to
start the workers, note that the first worker to start will receive
four times the number of messages initially. Thus the tasks may not be
fairly distributed to the workers.
To disable prefetching, set worker_prefetch_multiplier to 1. Changing
that setting to 0 will allow the worker to keep consuming as many
messages as it wants.

How to make a celery worker stop receiving new tasks (Kubernetes)

So we have a kubernetes cluster running some pods with celery workers. We are using python3.6 to run those workers and celery version is 3.1.2 (I know, really old, we are working on upgrading it). We have also setup some autoscaling mechanism to add more celery workers on the fly.
The problem is the following. So let's say we have 5 workers at any given time. Then lot of tasks come, increasing the CPU/RAM usage of the pods. That triggers an autoscaling event, adding, let's say, two more celery worker pods. So now those two new celery workers take some long running tasks. Before they finishing running those tasks, kubernetes creates a downscaling event, killing those two workers, and killing those long running tasks too.
Also, for legacy reasons, we do not have a retry mechanism if a task is not completed (and we cannot implement one right now).
So my question is, is there a way to tell kubernetes to wait for the celery worker to have run all of its pending tasks? I suppose the solution must include some way to notify the celery worker to make it stop receiving new tasks also. Right now I know that Kubernetes has some scripts to handle this kind of situations, but I do not know what to write on those scripts because I do not know how to make the celery worker stop receiving tasks.
Any idea?
I wrote a blog post exactly on that topic - check it out.
When Kubernetes decide to kill a pod, it first send SIGTERM signal, so your Application have time to gracefully shutdown, and after that if your Application didn't end - Kubernetes will kill it by sending a SIGKILL signal.
This period, between SIGTERM to SIGKILL can be tuned by terminationGracePeriodSeconds (more about it here).
In other words, if your longest task takes 5 minutes, make sure to set this value to something higher than 300 seconds.
Celery handle those signals for you as you can see here (I guess it is relevant for your version as well):
Shutdown should be accomplished using the TERM signal.
When shutdown is initiated the worker will finish all currently
executing tasks before it actually terminates. If these tasks are
important, you should wait for it to finish before doing anything
drastic, like sending the KILL signal.
As explained in the docs, you can set the acks_late=True configuration so the task will run again if it stopped accidentally.
Another thing that I didn't find documentation for (almost sure I saw it somewhere) - Celery worker won't receive a new tasks after getting a SIGTERM - so you should be safe to terminate the worker (might require to set worker_prefetch_multiplier = 1 as well).

Long running tasks is cancelled - Celery5.2.6

I have a project hosted in Digital Ocean in a Basic Droplet with 2 GB Ram. In my local machine, the long-running task runs between 8-10 minutes and is still successful.However in Digital Ocean droplets, often times the celery will not succeed in the long-running task.
Current celery - celery 5.2.6
I have two configurations in supervisor
Running the celery worker celery -A myproject worker -l info
Running the celery beat celery -A myproject beat -l info
This is the message from celeryd.log
CPendingDeprecationWarning:
In Celery 5.1 we introduced an optional breaking change which
on connection, loss cancels all currently executed tasks with late acknowledgment enabled.
These tasks cannot be acknowledged as the connection is gone, and the tasks are automatically redelivered back to the queue.
You can enable this behavior using the worker_cancel_long_running_tasks_on_connection_loss setting.
In Celery 5.1 it is set to False by default. The setting will be set to True by default in Celery 6.0.
warnings.warn(CANCEL_TASKS_BY_DEFAULT, CPendingDeprecationWarning)
[2022-07-07 04:25:36,998: ERROR/MainProcess] consumer: Cannot connect to redis://localhost:6379//: Error 111 connecting to localhost:6379. Connection refused..
Trying again in 2.00 seconds... (1/100)
[2022-07-07 04:25:39,066: ERROR/MainProcess] consumer: Cannot connect to redis://localhost:6379//: Error 111 connecting to localhost:6379. Connection refused..
Trying again in 4.00 seconds... (2/100)
For a temporary solution, what I did is to restart the server, and re-run new tasks, but this will not guarantee that the long-running task will be successful, the problem with this is the previously failed task will not restart.
My goal is,
Prevent long-running tasks from being canceled
If the long-running task is already canceled and cancellation can't be avoided, I need it to rerun and continue instead of starting a new task.
Is this possible? Any ideas on how?
As stated in the warning message, you can control this behavior with worker_cancel_long_running_tasks_on_connection_loss to prevent the task from being cancelled on connection loss. On your celery version it is off by default, so your tasks should not be cancelled. However, even if a late-acknowledging task completes successfully in this scenario, the task is still redelivered to the queue and will be run again -- this happens irrespective of this setting and is unavoidable for tasks with late acknowledgment.
This is why it is vital that you design your tasks to be idempotent.
If your job is not idempotent, an alternative solution is to have your tasks ack early (the default), but this risks the possibility that you may drop a task without it actually being completed.
If you must avoid dropping tasks, you must set acks_late=True to your task and it must be designed to be idempotent. This is necessary irrespective of the specific connection loss issue, as many other things can happen that interrupt your tasks and produce this same scenario.
I need it to rerun and continue instead of starting a new task.
This comes down to how you design your task for idempotency. For example, you might want to have your job keep track of its progress in persistent storage, so when the task fails and is run again, it can determine how best to recover.

Apache Airflow scheduler not running workers on its own

I am currently trying Apache Airflow on my system (Ubuntu 18) and I set it up with postgreSQL and RabbitMQ to use the CeleryExecutor.
I run airflow webserver and airflow scheduler on separate consoles, but the scheduler is only putting tasks as queued but no worker is actually running them.
I tried opening a different terminal and run airflow worker on its own and that seemed to do the trick.
Now the scheduler puts tasks on a queue and the worker I ran manually actually executes them.
As I have read, that should not be the case. The scheduler should run the workers on its own right? What could I do to make this work?
I have checked the logs from the consoles and I don't see any errors.
This is expected. If you look at the docs for airflow worker, it is specifically to bring up a Celery worker when you're using the CeleryExecutor, while the other executors do not require a separate process for tasks to run.
LocalExecutor: uses multiprocessing to run tasks within the scheduler.
SequentialExecutor: just runs one task at a time so that happens within the scheduler as well.
CeleryExecutor: scales out by having N workers, so having it as a separate command lets you run a worker on as many machines as you'd like.
KubernetesExecutor: I imagine talks to your Kubernetes cluster to tell it to run tasks.

Scheduler delay time in spark and YARN

I'm doing some instrumentation in Spark and I've realised that some of my tasks take really long times to complete because the Scheduler Delay Time that can be extracted from the TaskMetrics.
I know there are some questions already about this topic like this What is scheduler delay in spark UI's event timeline but the answers have not been accepted and it says that a task waiting for an open slot is considered scheduler delay, which I think is not true (as far as I know if a task doesn't have a slot into an executor it doesn't start generating metrics).
I'm a bit confused with from where does this Delay really starts. I was wondering if this Delay time takes also into account the period between an app being accepted by the YARN client and submitting the first job of the app. Or in other words, between this moment where the app is accepted:
and this one where is running:
I checked directly by launching one app with few resources available in the cluster. It stayed in the queue until enough executors could be launched for the stage. Then the yarn.Client launched the stage in the cluster. The metrics in spark don't consider this time in the queue as any delay. Also it doesn't matter if you have more tasks than cores like the stack overflow answer I posted above. The tasks will be allocated in the executors as they become available.
In short, scheduler delay time only considers sending the task to the executor. If there is a delay in here, YARN is not the bottleneck but the load in the nodes involved ( normally the driver and the worker nodes with the executors for the app)

Resources