Celery - Loosing celery node/host momentarily when we send SIGTERM signal

Celery - Loosing celery node/host momentarily when we send SIGTERM signal - celery-task

Context:
I am trying to write a graceful shutdown for my celery application
The logic is that when I receive a SIGTERM signal, I stop (revoke) my currently running tasks by the celery and then exit the main worker process
I am trying to achieve this by registering a SIGTERM handler in "worker_ready" celery signal handler
(For dev testing, I do not exit or raise at the end of SIGTERM handler (sigterm_handler), so we do not kill worker process at the end)
Problem:
To obtain the list of tasks currently being run by celery work, I use "celery.control.inspect.active" method
This method works as per expectation before sending the SIGTERM signal
But as soon as I send SIGTERM, I lose the worker stats
I am unable to get output for inspect commands for the node
Debugging:
(there are multiple workers, focus on 'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr'):
Before sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr': [{
'id': '63eb332cf88bdc58f865d48e',
'name': 'execute_task',
'args': [],
'kwargs': {},
'type': 'execute_task',
'hostname': 'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr',
'time_start': 1676358444.9854753,
'acknowledged': True,
'delivery_info': {
'exchange': '',
'routing_key': 'sync',
'priority': 10,
'redelivered': False
},
'worker_pid': 191
}],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
JUST after sending TERM signal
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
After sigterm_handler finishes execution
>>> celery.control.inspect(timeout=2).active()
{
'fast#taskworker-fast-5f9d8b9849-wx9z4': [],
'asyncsyncsystem#taskworker-async-sync-system-6f48fd489-szhxr': [],
'realtime#taskworker-realtime-8658b56d5b-dwxw8': [],
'fast#taskworker-fast-5f9d8b9849-x8lmz': []
}
Sample Code (stripped):
from celery.platforms import signals
def sigterm_handler(*args, **kwargs):
active_tasks = celery.control.inspect(timeout=2).active()
print(active_tasks)
#worker_ready.connect
def worker_ready(**kwargs):
signals['TERM'] = sigterm_handler
# signal.signal(signal.SIGTERM, sigterm_handler)
I tried debugging this by exec-ing into the K8s pod.
I ran celery inspect commands in the pod celery shell and was able to verify that when we send TERM signal, we loose the details of celery node/host.

Related

Dockerized Logic App dont work when container is running, but work in debug mode on VS Code

I am trying to put inside a docker image an Azure Logic Apps.
I was following some Microsoft tutorials:
This for creating the Logic Apps (is a bit outdate, but is mainly valid): https://microsoft.github.io/AzureTipsAndTricks/blog/tip304.html
And this for make the docker image: https://techcommunity.microsoft.com/t5/azure-developer-community-blog/azure-tips-and-tricks-how-to-run-logic-apps-in-a-docker/ba-p/3545220
The only diference between this tutorials and my poc is that I am using the node mode of the Logic Apps, instead of Net Core, and the dockerfile that I am using:
FROM mcr.microsoft.com/azure-functions/node:3.0
ENV AzureWebJobsStorage DefaultEndpointsProtocol=https;AccountName=logicappsexamples;AccountKey=AHaGR5SQZYdB2LgS2+pPbsFQO3eZDZ25T5EV3mcc1ZWJXOk7QTCKEpjDcyD6lp2J9MYo+c1OcpLu+ASt8aoEWg==;EndpointSuffix=core.windows.net
ENV AzureWebJobsScriptRoot=/home/site/wwwroot \
AzureFunctionsJobHost__Logging__Console__IsEnabled=true \
FUNCTIONS_V2_COMPATIBILITY_MODE=true
ENV WEBSITE_HOSTNAME localhost
ENV WEBSITE_SITE_NAME test
ENV AZURE_FUNCTIONS_ENVIRONMENT Development
COPY . /home/site/wwwroot
RUN cd /home/site/wwwroot
The logic app is simple, just put a message in a queue when you call to an url. In debug mode on VSCode all work fine. But the problem come when run the dockerized logic app.
The problems come when I run the image. The logic App have to use a queue called "test", but when the container end to set up, it create a new queue:
[![enter image description here][1]][1]
An in the last step of the last tutorial (https://techcommunity.microsoft.com/t5/azure-developer-community-blog/azure-tips-and-tricks-how-to-run-logic-apps-in-a-docker/ba-p/3545220) when I call to the trigger url, I dont recibe anything in neither od the queue.
I have receibed this logs from the running container:
info: Host.Triggers.Workflows[206]
Workflow action ends.
flowName='Stateless1',
actionName='Put_a_message_on_a_queue_(V2)',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
flowRunSequenceId='08585288406768053370693349962CU00',
correlationId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16',
status='Failed',
statusCode='BadRequest',
error='', durationInMilliseconds='910',
inputsContentSize='-1',
outputsContentSize='-1',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
actionTrackingId='1439f372-4ce0-4709-b9ab-ee18db8839ae',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.9639703Z",
"endTime":"2023-01-03T17:16:49.8743232Z",
"status":"Failed",
"code":"BadRequest",
"executionClusterType":"Classic",
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"actionName":"Put_a_message_on_a_queue_(V2)"
},
"correlation":{
"actionTrackingId":"1439f372-4ce0-4709-b9ab-ee18db8839ae",
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
actionType='ApiConnection',
sequencerType='Linear',
flowScaleUnit='cu00',
platformOptions='RunDistributionAcrossPartitions, RepetitionsDistributionAcrossSequencers, RunIdTruncationForJobSequencerIdDisabled, RepetitionPreaggregationEnabled',
retryHistory='',
failureCause='',
overrideUsageConfigurationName='',
hostName='',
activityId='46860f56-96bc-462a-b9bc-3aed4ad6464c'.
info: Host.Triggers.Workflows[202]
Workflow run ends.
flowName='Stateless1',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
flowRunSequenceId='08585288406768053370693349962CU00',
correlationId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
status='Failed',
statusCode='ActionFailed',
error='{
"code":"ActionFailed",
"message":"An action failed. No dependent actions succeeded."
}',
durationInMilliseconds='1202',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.7228752Z",
"endTime":"2023-01-03T17:16:50.0174324Z",
"status":"Failed",
"code":"ActionFailed",
"executionClusterType":"Classic",
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"originRunId":"08585288406768053370693349962CU00"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"error":{
"code":"ActionFailed",
"message":"An action failed. No dependent actions succeeded."
}
}',
sequencerType='Linear',
flowScaleUnit='cu00',
platformOptions='RunDistributionAcrossPartitions, RepetitionsDistributionAcrossSequencers, RunIdTruncationForJobSequencerIdDisabled, RepetitionPreaggregationEnabled',
kind='Stateless',
runtimeOperationOptions='None',
usageConfigurationName='',
hostName='',
activityId='c6ca440e-fef5-457a-a4cb-9d5a3d806518'.
info: Host.Triggers.Workflows[203]
Workflow trigger starts.
flowName='Stateless1',
triggerName='manual',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
status='',
statusCode='',
error='',
durationInMilliseconds='-1',
flowRunSequenceId='08585288406768053370693349962CU00',
inputsContentSize='-1',
outputsContentSize='-1',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.6768387Z",
"status":"Succeeded",
"fired":true,
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"triggerName":"manual"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
triggerType='Request',
flowScaleUnit='cu00',
triggerKind='Http',
sourceTriggerHistoryName='',
failureCause='',
hostName='',
activityId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16'.
info: Host.Triggers.Workflows[204]
Workflow trigger ends.
flowName='Stateless1',
triggerName='manual',
flowId='2731d82fc1324e4fb0df69fd5c549d72',
flowSequenceId='08585288407219836302',
status='Succeeded',
statusCode='',
error='',
extensionVersion='1.2.18.1',
siteName='test',
slotName='',
durationInMilliseconds='1348',
flowRunSequenceId='08585288406768053370693349962CU00',
inputsContentSize='-1',
outputsContentSize='-1',
clientTrackingId='08585288406768053370693349962CU00',
properties='{
"$schema":"2016-06-01",
"startTime":"2023-01-03T17:16:48.6768387Z",
"endTime":"2023-01-03T17:16:50.0319177Z",
"status":"Succeeded",
"fired":true,
"resource":{
"workflowId":"2731d82fc1324e4fb0df69fd5c549d72",
"workflowName":"Stateless1",
"runId":"08585288406768053370693349962CU00",
"triggerName":"manual"
},
"correlation":{
"clientTrackingId":"08585288406768053370693349962CU00"
},
"api":{}
}',
triggerType='Request',
flowScaleUnit='cu00',
triggerKind='Http',
sourceTriggerHistoryName='',
failureCause='',
overrideUsageConfigurationName='',
hostName='',
activityId='ebf4e18e-405f-41c8-bb6a-bdf84d4a7a16'.
info: Function.Stateless1[2]
Executed 'Functions.Stateless1' (Succeeded, Id=10af0f54-765a-4954-a42f-373ceb58c94b, Duration=1545ms)
So, what I am doing bad? I mean, It look like as the info to de queue name (test) is not property passed to the docker imagen and for this reason, the docker image create a new one... but how I can fix it?
I would greatly appreciate any help... I cant find anything clear in internet.
Thanks!

pm2 how to avoid schedule cron lock in nestjs

I use pm2 to run the service. I have a total of 2 clusters.
And I created a schedule job in nestjs and executed the schedule,
but both clusters ran schedule and the database was locked.
how can i avoid this?
below is my ecosystem.js
module.exports = {
apps: [
{
name: 'my_app',
script: 'dist/main.js',
instances: 0,
exec_mode: 'cluster',
listen_timeout: 10000,
kill_timeout: 1000,
},
],
};

i can use process.env.pm_id
pm2 list
check my pm_id list
#Cron('* * * * ... ')
myCron(){
if(process.env.pm_id === '0') {
...
}
}

How to get the passed parameter inside python container in AWS Batch job?

I have 2 job definitions (job-1, job-2) and I'm executing Job1 first. Then Job1 will submit Job2 and starts its execution. I need to pass some parameters to Job2 when submitting the job. Below is my Python3 code,
# job1
import boto3
import os
env = os.environ.get('environment')
batch = boto3.client('batch')
def submit_job():
return batch.submit_job(
jobName='Job2',
jobQueue='job2-queue-dev',
jobDefinition='job-2',
containerOverrides= {
'environment': [
{
'name': 'environment',
'value': env
},
]
},
parameters={
'opco': '123',
'app' : 'app1'
},
);
submit_job()
In the Job2 i can easily get the environment variable with below code.
# job2
env = os.environ.get('environment')
def get_index_name(env):
return 'liberty-'+env
....
So my question is How can we get those parameters (opco, app) inside the job2?
FYI, i could pass them as environment variable, But i want to know how parameter retrieval is done here.
Thanks in advance

Commander can't handle multiple command arguments

I have the following commander command with multiple arguments:
var program = require('commander');
program
.command('rename <id> [name]')
.action(function() {
console.log(arguments);
});
program.parse(process.argv);
Using the app yields the following result:
$ node app.js 1 "Hello"
{ '0': '1',
'1':
{ commands: [],
options: [],
_execs: [],
_args: [ [Object] ],
_name: 'rename',
parent:
{ commands: [Object],
options: [],
_execs: [],
_args: [],
_name: 'app',
Command: [Function: Command],
Option: [Function: Option],
_events: [Object],
rawArgs: [Object],
args: [Object] } } }
As you can see, the action receives the first argument (<id>) and program, but doesn't receives the second argument: [name].
I've tried:
Making [name] a required argument.
Passing the name unquoted to the tool from the command line.
Simplifying my real app into the tiny reproducible program above.
Using a variadic argument for name (rename <id> [name...]), but this results on both 1 and Hello to being assigned into the same array as the first parameter to action, defeating the purpose of having id.
What am I missing? Does commander only accepts one argument per command (doesn't looks so in the documentation)?

I think this was a bug in an old version of commander. This works now with commander#2.9.0.

I ran in to the same problems, and decided to use Caporal instead.
Here's an example from their docs on Creating a command:
When writing complex programs, you'll likely want to manage multiple commands. Use the .command() method to specify them:
program
// a first command
.command("my-command", "Optional command description used in help")
.argument(/* ... */)
.action(/* ... */)
// a second command
.command("sec-command", "...")
.option(/* ... */)
.action(/* ... */)

Node.js script (node-celery) call to celery task handles 'self' argument improperly

I created a celery task script as follows:
from celery import Task
from celery.contrib.methods import task
from celery.contrib.methods import task_method
from pipelines.addsub import settings
from pipelines.addsub.log import register_task_log
#register_task_log(__name__)
class AddTask(Task):
#task(filter=task_method, name='AddTask.get')
def get(self, x, y):
self.log.info("Calling task add(%d, %d)" % (x, y))
return x + y
I did define the following queues & routes:
CELERY_QUEUES = {
'celery': {
'exchange': 'celery',
'binding_key': 'celery',
},
'addsub': {
'exchange': 'addsub',
'binding_key': 'addsub.operations',
},
}
CELERY_ROUTES = {
'AddTask.get': {
'queue': 'addsub',
'routing_key': 'addsub.operations',
},
}
I start the celery worker as follows:
celery -c 2 -A pipelines.celery.celery worker -Q addsub -E -l DEBUG --logfile=~/celery_workflows/addsubtasks/addsub.log
I can successfully run AddTask.get(1,3) from celery shell.
I then used node-celery module to run the following node.js script:
"use strict";
var celery = require('node-celery'),
client = celery.createClient({
CELERY_BROKER_URL: 'amqp://[user]:[password]#[hostname]:5672//prote.broker',
CELERY_RESULT_BACKEND: 'amqp',
CELERY_ROUTES: {'AddTask.get': {queue: 'addsub'}}
}),
get_addition = client.createTask('AddTask.get');
client.on('error', function (err) {
console.log(err);
});
client.on('connect', function () {
console.log('Connected ...');
get_addition.call([], {
x: 1,
y: 3
}); // sends a task to the addsub queue
});
The script returns the following error:
2014-09-13 14:18:59,422: INFO/MainProcess] Received task: AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d]
[2014-09-13 14:18:59,422: DEBUG/MainProcess] TaskPool: Apply <function _fast_trace_task at 0x7fc407d5fde8> (args:(u'AddTask.get', u'261fb059-b88e-444b-b218-c3c24c94fc1d', [], {u'y': 3, u'x': 1}, {u'task': u'AddTask.get', u'group': None, u'is_eager': False, u'delivery_info': {u'priority': None, u'redelivered': False, u'routing_key': 'addsub', u'exchange': ''}, u'args': [], u'headers': {}, u'correlation_id': None, u'hostname': 'celery#pcs01', u'kwargs': {u'y': 3, u'x': 1}, u'reply_to': None, u'id': u'261fb059-b88e-444b-b218-c3c24c94fc1d'}) kwargs:{})
[2014-09-13 14:18:59,425: DEBUG/MainProcess] Task accepted: AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d] pid:6536
[2014-09-13 14:18:59,425: ERROR/MainProcess] Task AddTask.get[261fb059-b88e-444b-b218-c3c24c94fc1d] raised unexpected: TypeError('get() takes exactly 3 arguments (2 given)',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 240, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 437, in __protected_call__
return self.run(*args, **kwargs)
TypeError: get() takes exactly 3 arguments (2 given)
The script does pass the correct x: & y: parameters to the celery worker but the self argument is not handled properly. Does anyone understand why this might be happening?
I have successfully tested the above specified node.js script with a task script that defines a set of functions instead of a class with member functions:
from pipelines.celery.celery import app
from pipelines.addsub import settings
from celery.utils.log import get_task_logger
log = get_task_logger(__name__)
#app.task(name='add')
def add(x, y):
log.info("Calling task add(%d, %d)" % (x, y))
return x + y
#app.task(name='subtract')
def subtract(x, y):
log.info("Calling task subtract(%d, %d)" % (x, y))
return x - y
I'm guessing that the celery.contrib.methods module is failing in the case that I described above. Does anyone have any insight into this problem?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Celery - Loosing celery node/host momentarily when we send SIGTERM signal - celery-task

Related

Dockerized Logic App dont work when container is running, but work in debug mode on VS Code

pm2 how to avoid schedule cron lock in nestjs

How to get the passed parameter inside python container in AWS Batch job?

Commander can't handle multiple command arguments

Node.js script (node-celery) call to celery task handles 'self' argument improperly

Categories

Resources