Celery - Loosing celery node/host momentarily when we send SIGTERM signal - celery-task

Context:
I am trying to write a graceful shutdown for my celery application
The logic is that when I receive a SIGTERM signal, I stop (revoke) my currently running tasks by the celery and then exit the main worker process
I am trying to achieve this by registering a SIGTERM handler in "worker_ready" celery signal handler
(For dev testing, I do not exit or raise at the end of SIGTERM handler (sigterm_handler), so we do not kill worker process at the end)
Problem:
To obtain the list of tasks currently being run by celery work, I use "celery.control.inspect.active" method
This method works as per expectation before sending the SIGTERM signal
But as soon as I send SIGTERM, I lose the worker stats
I am unable to get output for inspect commands for the node
Debugging:
(there are multiple workers, focus on 'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr'):
Before sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr': [{
'id': '63eb332cf88bdc58f865d48e',
'name': 'execute_task',
'args': [],
'kwargs': {},
'type': 'execute_task',
'hostname': 'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr',
'time_start': 1676358444.9854753,
'acknowledged': True,
'delivery_info': {
'exchange': '',
'routing_key': 'sync',
'priority': 10,
'redelivered': False
},
'worker_pid': 191
}],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
JUST after sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
After sigterm_handler finishes execution
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr': [],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
Sample Code (stripped):
from celery.platforms import signals
def sigterm_handler(*args, **kwargs):
active_tasks = celery.control.inspect(timeout=2).active()
print(active_tasks)
#worker_ready.connect
def worker_ready(**kwargs):
signals['TERM'] = sigterm_handler
# signal.signal(signal.SIGTERM, sigterm_handler)
I tried debugging this by exec-ing into the K8s pod.
I ran celery inspect commands in the pod celery shell and was able to verify that when we send TERM signal, we loose the details of celery node/host.

Related

Dockerized Logic App dont work when container is running, but work in debug mode on VS Code

I am trying to put inside a docker image an Azure Logic Apps.
I was following some Microsoft tutorials:
This for creating the Logic Apps (is a bit outdate, but is mainly valid): https://microsoft.github.io/AzureTipsAndTricks/blog/tip304.html
And this for make the docker image: https://techcommunity.microsoft.com/t5/azure-developer-community-blog/azure-tips-and-tricks-how-to-run-logic-apps-in-a-docker/ba-p/3545220
The only diference between this tutorials and my poc is that I am using the node mode of the Logic Apps, instead of Net Core, and the dockerfile that I am using:
FROM mcr.microsoft.com/azure-functions/node:3.0
ENV AzureWebJobsStorage DefaultEndpointsProtocol=https;AccountName=logicappsexamples;AccountKey=AHaGR5SQZYdB2LgS2+pPbsFQO3eZDZ25T5EV3mcc1ZWJXOk7QTCKEpjDcyD6lp2J9MYo+c1OcpLu+ASt8aoEWg==;EndpointSuffix=core.windows.net
ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
AzureFunctionsJobHost__Logging__Console__IsEnabled=true \
FUNCTIONS_V2_COMPATIBILITY_MODE=true
ENV WEBSITE_HOSTNAME localhost
ENV WEBSITE_SITE_NAME test
ENV AZURE_FUNCTIONS_ENVIRONMENT Development
COPY . /home/site/wwwroot
RUN cd /home/site/wwwroot
The logic app is simple, just put a message in a queue when you call to an url. In debug mode on VSCode all work fine. But the problem come when run the dockerized logic app.
The problems come when I run the image. The logic App have to use a queue called "test", but when the container end to set up, it create a new queue:
[![enter image description here][1]][1]
An in the last step of the last tutorial (https://techcommunity.microsoft.com/t5/azure-developer-community-blog/azure-tips-and-tricks-how-to-run-logic-apps-in-a-docker/ba-p/3545220) when I call to the trigger url, I dont recibe anything in neither od the queue.
I have receibed this logs from the running container:
info: Host.Triggers.Workflows[206]
Workflow action ends.
flowName='Stateless1',
actionName='Put_a_message_on_a_queue_(V2)',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
flowRunSequenceId='08585288406768053370693349962CU00',
correlationId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16',
status='Failed',
statusCode='BadRequest',
error='', durationInMilliseconds='910',
inputsContentSize='-1',
outputsContentSize='-1',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
actionTrackingId='1439f372-4ce0-4709-b9ab-ee18db8839ae',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.9639703Z",
"endTime":"2023-01-03T17:16:49.8743232Z",
"status":"Failed",
"code":"BadRequest",
"executionClusterType":"Classic",
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"actionName":"Put_a_message_on_a_queue_(V2)"
},
"correlation":{
"actionTrackingId":"1439f372-4ce0-4709-b9ab-ee18db8839ae",
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
actionType='ApiConnection',
sequencerType='Linear',
flowScaleUnit='cu00',
platformOptions='RunDistributionAcrossPartitions, RepetitionsDistributionAcrossSequencers, RunIdTruncationForJobSequencerIdDisabled, RepetitionPreaggregationEnabled',
retryHistory='',
failureCause='',
overrideUsageConfigurationName='',
hostName='',
activityId='46860f56-96bc-462a-b9bc-3aed4ad6464c'.
info: Host.Triggers.Workflows[202]
Workflow run ends.
flowName='Stateless1',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
flowRunSequenceId='08585288406768053370693349962CU00',
correlationId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
status='Failed',
statusCode='ActionFailed',
error='{
"code":"ActionFailed",
"message":"An action failed. No dependent actions succeeded."
}',
durationInMilliseconds='1202',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.7228752Z",
"endTime":"2023-01-03T17:16:50.0174324Z",
"status":"Failed",
"code":"ActionFailed",
"executionClusterType":"Classic",
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"originRunId":"08585288406768053370693349962CU00"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"error":{
"code":"ActionFailed",
"message":"An action failed. No dependent actions succeeded."
}
}',
sequencerType='Linear',
flowScaleUnit='cu00',
platformOptions='RunDistributionAcrossPartitions, RepetitionsDistributionAcrossSequencers, RunIdTruncationForJobSequencerIdDisabled, RepetitionPreaggregationEnabled',
kind='Stateless',
runtimeOperationOptions='None',
usageConfigurationName='',
hostName='',
activityId='c6ca440e-fef5-457a-a4cb-9d5a3d806518'.
info: Host.Triggers.Workflows[203]
Workflow trigger starts.
flowName='Stateless1',
triggerName='manual',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
status='',
statusCode='',
error='',
durationInMilliseconds='-1',
flowRunSequenceId='08585288406768053370693349962CU00',
inputsContentSize='-1',
outputsContentSize='-1',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.6768387Z",
"status":"Succeeded",
"fired":true,
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"triggerName":"manual"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
triggerType='Request',
flowScaleUnit='cu00',
triggerKind='Http',
sourceTriggerHistoryName='',
failureCause='',
hostName='',
activityId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16'.
info: Host.Triggers.Workflows[204]
Workflow trigger ends.
flowName='Stateless1',
triggerName='manual',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
status='Succeeded',
statusCode='',
error='',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
durationInMilliseconds='1348',
flowRunSequenceId='08585288406768053370693349962CU00',
inputsContentSize='-1',
outputsContentSize='-1',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.6768387Z",
"endTime":"2023-01-03T17:16:50.0319177Z",
"status":"Succeeded",
"fired":true,
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"triggerName":"manual"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
triggerType='Request',
flowScaleUnit='cu00',
triggerKind='Http',
sourceTriggerHistoryName='',
failureCause='',
overrideUsageConfigurationName='',
hostName='',
activityId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16'.
info: Function.Stateless1[2]
Executed 'Functions.Stateless1' (Succeeded, Id=10af0f54-765a-4954-a42f-373ceb58c94b, Duration=1545ms)
So, what I am doing bad? I mean, It look like as the info to de queue name (test) is not property passed to the docker imagen and for this reason, the docker image create a new one... but how I can fix it?
I would greatly appreciate any help... I cant find anything clear in internet.
Thanks!

pm2 how to avoid schedule cron lock in nestjs

I use pm2 to run the service. I have a total of 2 clusters.
And I created a schedule job in nestjs and executed the schedule,
but both clusters ran schedule and the database was locked.
how can i avoid this?
below is my ecosystem.js
module.exports = {
apps: [
{
name: 'my_app',
script: 'dist/main.js',
instances: 0,
exec_mode: 'cluster',
listen_timeout: 10000,
kill_timeout: 1000,
},
],
};
i can use process.env.pm_id
pm2 list
check my pm_id list
#Cron('* * * * ... ')
myCron(){
if(process.env.pm_id === '0') {
...
}
}

How to get the passed parameter inside python container in AWS Batch job?

I have 2 job definitions (job-1, job-2) and I'm executing Job1 first. Then Job1 will submit Job2 and starts its execution. I need to pass some parameters to Job2 when submitting the job. Below is my Python3 code,
# job1
import boto3
import os
env = os.environ.get('environment')
batch = boto3.client('batch')
def submit_job():
return batch.submit_job(
jobName='Job2',
jobQueue='job2-queue-dev',
jobDefinition='job-2',
containerOverrides= {
'environment': [
{
'name': 'environment',
'value': env
},
]
},
parameters={
'opco': '123',
'app' : 'app1'
},
);
submit_job()
In the Job2 i can easily get the environment variable with below code.
# job2
env = os.environ.get('environment')
def get_index_name(env):
return 'liberty-'+env
....
So my question is How can we get those parameters (opco, app) inside the job2?
FYI, i could pass them as environment variable, But i want to know how parameter retrieval is done here.
Thanks in advance

Commander can't handle multiple command arguments

I have the following commander command with multiple arguments:
var program = require('commander');
program
.command('rename <id> [name]')
.action(function() {
console.log(arguments);
});
program.parse(process.argv);
Using the app yields the following result:
$ node app.js 1 "Hello"
{ '0': '1',
'1':
{ commands: [],
options: [],
_execs: [],
_args: [ [Object] ],
_name: 'rename',
parent:
{ commands: [Object],
options: [],
_execs: [],
_args: [],
_name: 'app',
Command: [Function: Command],
Option: [Function: Option],
_events: [Object],
rawArgs: [Object],
args: [Object] } } }
As you can see, the action receives the first argument (<id>) and program, but doesn't receives the second argument: [name].
I've tried:
Making [name] a required argument.
Passing the name unquoted to the tool from the command line.
Simplifying my real app into the tiny reproducible program above.
Using a variadic argument for name (rename <id> [name...]), but this results on both 1 and Hello to being assigned into the same array as the first parameter to action, defeating the purpose of having id.
What am I missing? Does commander only accepts one argument per command (doesn't looks so in the documentation)?
I think this was a bug in an old version of commander. This works now with commander#2.9.0.
I ran in to the same problems, and decided to use Caporal instead.
Here's an example from their docs on Creating a command:
When writing complex programs, you'll likely want to manage multiple commands. Use the .command() method to specify them:
program
// a first command
.command("my-command", "Optional command description used in help")
.argument(/* ... */)
.action(/* ... */)
// a second command
.command("sec-command", "...")
.option(/* ... */)
.action(/* ... */)

Node.js script (node-celery) call to celery task handles 'self' argument improperly

I created a celery task script as follows:
from celery import Task
from celery.contrib.methods import task
from celery.contrib.methods import task_method
from pipelines.addsub import settings
from pipelines.addsub.log import register_task_log
#register_task_log(__name__)
class AddTask(Task):
#task(filter=task_method, name='AddTask.get')
def get(self, x, y):
self.log.info("Calling task add(%d, %d)" % (x, y))
return x + y
I did define the following queues & routes:
CELERY_QUEUES = {
'celery': {
'exchange': 'celery',
'binding_key': 'celery',
},
'addsub': {
'exchange': 'addsub',
'binding_key': 'addsub.operations',
},
}
CELERY_ROUTES = {
'AddTask.get': {
'queue': 'addsub',
'routing_key': 'addsub.operations',
},
}
I start the celery worker as follows:
celery -c 2 -A pipelines.celery.celery worker -Q addsub -E -l DEBUG --logfile=~/celery_workflows/addsubtasks/addsub.log
I can successfully run AddTask.get(1,3) from celery shell.
I then used node-celery module to run the following node.js script:
"use strict";
var celery = require('node-celery'),
client = celery.createClient({
CELERY_BROKER_URL: 'amqp://[user]:[password]#[hostname]:5672//prote.broker',
CELERY_RESULT_BACKEND: 'amqp',
CELERY_ROUTES: {'AddTask.get': {queue: 'addsub'}}
}),
get_addition = client.createTask('AddTask.get');
client.on('error', function (err) {
console.log(err);
});
client.on('connect', function () {
console.log('Connected ...');
get_addition.call([], {
x: 1,
y: 3
}); // sends a task to the addsub queue
});
The script returns the following error:
2014-09-13 14:18:59,422: INFO/MainProcess] Received task: AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d]
[2014-09-13 14:18:59,422: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fc407d5fde8> (args:(u'AddTask.get', u'261fb059-b88e-444b-b218-c3c24c94fc1d', [], {u'y': 3, u'x': 1}, {u'task': u'AddTask.get', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': 'addsub', u'exchange': ''}, u'args': [], u'headers': {}, u'correlation_id': None, u'hostname': 'celery#pcs01', u'kwargs': {u'y': 3, u'x': 1}, u'reply_to': None, u'id': u'261fb059-b88e-444b-b218-c3c24c94fc1d'}) kwargs:{})
[2014-09-13 14:18:59,425: DEBUG/MainProcess] Task accepted: AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d] pid:6536
[2014-09-13 14:18:59,425: ERROR/MainProcess] Task AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d] raised unexpected: TypeError('get() takes exactly 3 arguments (2 given)',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
TypeError: get() takes exactly 3 arguments (2 given)
The script does pass the correct x: & y: parameters to the celery worker but the self argument is not handled properly. Does anyone understand why this might be happening?
I have successfully tested the above specified node.js script with a task script that defines a set of functions instead of a class with member functions:
from pipelines.celery.celery import app
from pipelines.addsub import settings
from celery.utils.log import get_task_logger
log = get_task_logger(__name__)
#app.task(name='add')
def add(x, y):
log.info("Calling task add(%d, %d)" % (x, y))
return x + y
#app.task(name='subtract')
def subtract(x, y):
log.info("Calling task subtract(%d, %d)" % (x, y))
return x - y
I'm guessing that the celery.contrib.methods module is failing in the case that I described above. Does anyone have any insight into this problem?

Resources