Celery beat is not showing or executing scheduled tasks - python-3.x

I am using celery and celery beat to handle task execution and scheduled tasks in a Python project. I am not using django.
Execution of celery tasks is working as expected. However I have run into a wall trying to get scheduled tasks (celery beat) to run.
I have followed the celery documentation to add my task to app.conf.beat_schedule successfully. If I print out the beat schedule after adding my task, I can see that the task has been added to app.conf.beat_schedule successfully.
from celery import Celery
from celery.task import task
# Celery init
app = Celery('tasks', broker='pyamqp://guest#localhost//')
# get the latest device reading from the appropriate provider
#app.task(bind=True, retry_backoff=True)
def get_reading(self, provider, path, device, config, location, callback):
logger.info("get_reading() called")
module = importlib.import_module('modules.%s' % provider)
try:
module.get_reading(path, device, config, location, callback)
except Exception as e:
self.retry(exc=e)
# add the periodic task
def add_get_reading_periodic_task(provider, path, device, config, location, callback, interval = 600.0):
app.conf.beat_schedule = {
"poll-provider": {
"task": "get_reading",
"schedule": interval,
"args": (provider, path, device, config, location, callback)
}
}
logger.info(app.conf.beat_schedule)
logger.info("Added task 'poll-provider' for %s to beat schedule" % provider)
Looking at my application log, I can see that app.conf.beat_schedule has been updated with the data passed to add_get_reading_periodic_task():
2017-08-17 11:07:13,216 - gateway - INFO - {'poll-provider': {'task': 'get_reading', 'schedule': 10, 'args': ('provider1', '/opt/provider1', None, {'location': {'lan.local': {'uri': 'http://192.168.1.10'}}}, 'lan.local', {'url': 'http://localhost:8080', 'token': '*******'})}}
2017-08-17 11:07:13,216 - gateway - INFO - Added task 'poll-provider' for provider1 to beat schedule
I'm manually running celery worker and celery beat simultaneously (in different terminal windows) on the same application file:
$ celery worker -A gateway --loglevel=INFO
$ celery beat -A gateway --loglevel=DEBUG
If I call get_reading.delay(...) within my application, it is executed by the celery worker as expected.
However, the celery beat process never shows any indication that the scheduled task is registered:
celery beat v4.0.2 (latentcall) is starting.
__ - ... __ - _
LocalTime -> 2017-08-17 11:05:15
Configuration ->
. broker -> amqp://guest:**#localhost:5672//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%DEBUG
. maxinterval -> 5.00 minutes (300s)
[2017-08-17 11:05:15,228: DEBUG/MainProcess] Setting default socket timeout to 30
[2017-08-17 11:05:15,228: INFO/MainProcess] beat: Starting...
[2017-08-17 11:05:15,248: DEBUG/MainProcess] Current schedule:
<ScheduleEntry: celery.backend_cleanup celery.backend_cleanup() <crontab: 0 4 * * * (m/h/d/dM/MY)>
[2017-08-17 11:05:15,248: DEBUG/MainProcess] beat: Ticking with max interval->5.00 minutes
[2017-08-17 11:05:15,250: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:10:15,351: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:10:15,355: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:15:15,400: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:15:15,402: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
[2017-08-17 11:20:15,502: DEBUG/MainProcess] beat: Synchronizing schedule...
[2017-08-17 11:20:15,504: DEBUG/MainProcess] beat: Waking up in 5.00 minutes.
This is seemingly confirmed by running celery inspect scheduled:
-> celery#localhost.lan: OK
- empty -
I have tried starting celery beat both before and after adding the scheduled task to app.conf.beat_schedule, and in both cases the scheduled task never appears in celery beat.
I read that celery beat did not support dynamic reloading of the configuration until version 4, but I am running celery beat 4.0.2
What am I doing wrong here? Why isn't celery beat showing my scheduled task?

Have you tried using the code as described in the Documentation:
#app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
# Calls test('hello') every 10 seconds.
sender.add_periodic_task(10.0, test.s('hello'), name='add every 10')
?

create a model for creating periodic tasks:
class TaskScheduler(models.Model):
periodic_task = models.OneToOneField(
PeriodicTask, on_delete=models.CASCADE, blank=True, null=True)
node = models.OneToOneField(
Node, on_delete=models.CASCADE, blank=True, null=True)
#staticmethod
def schedule_every(task_name, period, every, sheduler_name, args, queue=None):
# schedules a task by name every "every" "period". So an example call would be:
# TaskScheduler('mycustomtask', 'seconds', 30, [1,2,3])
# that would schedule your custom task to run every 30 seconds with the arguments 1,2 and 3 passed to the actual task.
permissible_periods = ['days', 'hours', 'minutes', 'seconds']
if period not in permissible_periods:
raise Exception('Invalid period specified')
# create the periodic task and the interval
# create some name for the period task
ptask_name = sheduler_name
interval_schedules = IntervalSchedule.objects.filter(
period=period, every=every)
if interval_schedules: # just check if interval schedules exist like that already and reuse em
interval_schedule = interval_schedules[0]
else: # create a brand new interval schedule
interval_schedule = IntervalSchedule()
interval_schedule.every = every # should check to make sure this is a positive int
interval_schedule.period = period
interval_schedule.save()
if(queue):
ptask = PeriodicTask(name=ptask_name, task=task_name,
interval=interval_schedule, queue=queue)
else:
ptask = PeriodicTask(name=ptask_name, task=task_name,
interval=interval_schedule)
if(args):
ptask.args = args
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
#staticmethod
def schedule_cron(task_name, at, sheduler_name, args):
# schedules a task by name every day at the #at time.
# create some name for the period task
ptask_name = sheduler_name
crons = CrontabSchedule.objects.filter(
hour=at.hour, minute=at.minute)
if crons: # just check if CrontabSchedule exist like that already and reuse em
cron = crons[0]
else: # create a brand new CrontabSchedule
cron = CrontabSchedule()
cron.hour = at.hour
cron.minute = at.minute
cron.save()
ptask = PeriodicTask(name=ptask_name,
crontab=cron, task=task_name)
if(args):
ptask.args = args
ptask.save()
return TaskScheduler.objects.create(periodic_task=ptask)
def stop(self):
""" pauses the task """
ptask = self.periodic_task
ptask.enabled = False
ptask.save()
def start(self):
ptask = self.periodic_task
ptask.enabled = True
ptask.save()
def terminate(self):
self.stop()
ptask = self.periodic_task
PeriodicTask.objects.get(name=ptask.name).delete()
self.delete()
ptask.delete()
and then from you code
# give the periodic task a unique name
scheduler_name = "%s_%s" % ('node_heartbeat', str(node.pk))
# organize the task arguments
args = json.dumps([node.extern_id, node.pk])
# create the periodic task in heartbeat queue
task_scheduler = TaskScheduler().schedule_every(
'core.tasks.pdb_heartbeat', 'seconds', 15, scheduler_name , args, 'heartbeat')
task_scheduler.node = node
task_scheduler.save()
the original class I came across here a long time ago I've added schedule_cron

Related

Data acquisition and parallel analysis

With this example, I am able to start 10 processes and then continue to do "stuff".
import random
import time
import multiprocessing
if __name__ == '__main__':
"""Demonstration of GIL-friendly asynchronous development with Python's multiprocessing module"""
def process(instance):
total_time = random.uniform(0, 2)
time.sleep(total_time)
print('Process %s : completed in %s sec' % (instance, total_time))
return instance
for i in range(10):
multiprocessing.Process(target=process, args=(i,)).start()
for i in range(2):
print("im doing stuff")
output:
>>
im doing stuff
im doing stuff
Process 8 : completed in 0.5390905372395016 sec
Process 6 : completed in 1.2313793332779521 sec
Process 2 : completed in 1.3439237625459899 sec
Process 0 : completed in 2.171809500083049 sec
Process 5 : completed in 2.6980031493633887 sec
Process 1 : completed in 3.3807358192422416 sec
Process 3 : completed in 4.597366303348297 sec
Process 7 : completed in 4.702447947943171 sec
Process 4 : completed in 4.8355495004170965 sec
Process 9 : completed in 4.9917788543156245 sec
I'd like to have a main while True loop which do data acquisition and just start a new process at each iteration (with the new data) and check if any process has finished and look at the output.
How could I verify that a process has ended and what is its return value? Edit: while processes in a list are still executing.
If I had to summarize my problem: how can I know which process is finished in a list of processes - with some still executing or new added?

python threading event.wait() using same object in multiple threads

where is the documentation for the python3 threading library's event.wait() method that explains how 1 event can be used multiple times in different threads?
the example below shows the same event can be used in multiple threads, each with a different wait() duration, probably because each has its own lock under the hood.
But this functionality is not documented in an obvious way, on the threading page.
this works great but it's not clear why it works or if this will continue to work in future python versions.
are there ways this could bonk unexpectedly?
can an inherited event work properly in multiple classes as long as it's used in separate threads?
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',)
def worker(i,dt,e):
tStart=time.time()
e.wait(dt)
logging.debug('{0} tried to wait {1} seconds but really waited {2}'.format(i,dt, time.time()-tStart ))
e = threading.Event()
maxThreads=10
for i in range(maxThreads):
dt=1+i # (s)
w = threading.Thread(target=worker, args=(i,dt,e))
w.start()
output:
[DEBUG] (Thread-1 ) 0 tried to wait 1 seconds but really waited 1.0003676414489746
[DEBUG] (Thread-2 ) 1 tried to wait 2 seconds but really waited 2.00034761428833
[DEBUG] (Thread-3 ) 2 tried to wait 3 seconds but really waited 3.0001776218414307
[DEBUG] (Thread-4 ) 3 tried to wait 4 seconds but really waited 4.000180244445801
[DEBUG] (Thread-5 ) 4 tried to wait 5 seconds but really waited 5.000337362289429
[DEBUG] (Thread-6 ) 5 tried to wait 6 seconds but really waited 6.000308990478516
[DEBUG] (Thread-7 ) 6 tried to wait 7 seconds but really waited 7.000143051147461
[DEBUG] (Thread-8 ) 7 tried to wait 8 seconds but really waited 8.000152826309204
[DEBUG] (Thread-9 ) 8 tried to wait 9 seconds but really waited 9.00012469291687
[DEBUG] (Thread-10 ) 9 tried to wait 10 seconds but really waited 10.000144481658936
Since e is threading Event,
You are declaring it locally for each thread (all the 10 threads are excecated almost parallelly).
You can check it here:
import logging
import threading
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)s) %(message)s',)
def worker(i,dt,e):
tStart=time.time()
logging.info('Program will wait for {} time while trying to print the change from {} to {}'.format(dt,i,dt))
e.wait(dt)
logging.debug('{0} tried to wait {1} seconds but really waited {2}'.format(i,dt, time.time()-tStart ))
e = threading.Event()
maxThreads=10
for i in range(maxThreads):
dt=1+i # (s)
w = threading.Thread(target=worker, args=(i,dt,e))
w.start()
It's not the locking, its just about the value passed into the thread target

How to use ApScheduler correctly in FastAPI?

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from loguru import logger
from apscheduler.schedulers.background import BackgroundScheduler
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
test_list = ["1"]*10
def check_list_len():
global test_list
while True:
time.sleep(5)
logger.info(f"check_list_len:{len(test_list)}")
#app.on_event('startup')
def init_data():
scheduler = BackgroundScheduler()
scheduler.add_job(check_list_len, 'cron', second='*/5')
scheduler.start()
#app.get("/pop")
async def list_pop():
global test_list
test_list.pop(1)
logger.info(f"current_list_len:{len(test_list)}")
if __name__ == '__main__':
uvicorn.run(app="main3:app", host="0.0.0.0", port=80, reload=False, debug=False)
Above is my code, I want to take out a list of elements through get request, and set a periodic task constantly check the number of elements in the list, but when I run, always appear the following error:
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:50 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:50.016 | INFO | main3:check_list_len:23 - check_list_len:10
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:55 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:55.018 | INFO | main3:check_list_len:23 - check_list_len:10
INFO: 127.0.0.1:55961 - "GET /pop HTTP/1.1" 200 OK
2021-11-25 09:48:57.098 | INFO | main3:list_pop:35 - current_list_len:9
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:49:00 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:49:00.022 | INFO | main3:check_list_len:23 - check_list_len:9
It looks like I started two scheduled tasks and only one succeeded, but I started only one task. How do I avoid this
You're getting the behavior you're asking for. You've configured apscheduler to run check_list_len every five seconds, but you've also made it so that function runs without terminating - just sleeping for five seconds in an endless loop. That function never terminates, so apscheduler doesn't run it again - since it still hasn't finished.
Remove the infinite loop inside your utility function when using apscheduler - it'll call the function every five seconds for you:
def check_list_len():
global test_list # you really don't need this either, since you're not reassigning the variable
logger.info(f"check_list_len:{len(test_list)}")

Python Aps Scheduler run in interval between two weekdays with time and schedule two job

I am trying to run a job between monday hour 0 to friday 19 and add second job scheduled in friday 20. It's not working, Not sure when combining multiple trigger for cron & interval works this way.
from apscheduler.triggers.combining import AndTrigger,OrTrigger
from apscheduler.triggers.interval import IntervalTrigger
from apscheduler.triggers.cron import CronTrigger
from apscheduler.schedulers.blocking import BlockingScheduler
scheduler = BlockingScheduler()
def job():
print('job started in mon 0, will be executed each minute until fri 19')
def job_report():
print('end job & report mail to team in fri 20')
trigger1 = OrTrigger([IntervalTrigger(minutes=1),
CronTrigger(day_of_week='mon', hour=0),
CronTrigger(day_of_week='fri',hour=19)])
trigger2 = OrTrigger([CronTrigger(day_of_week='fri',hour=20)])
scheduler.add_job(job, trigger1)
scheduler.add_job(job_report, trigger2)
scheduler.start()
Or should I use standard cron from aps? For example-
sched.add_job(job_function, CronTrigger.from_crontab('0 0 1-15 may-aug *'))
Any help will be appreciated.
trigger1 is executed monday to fri from 0:00 to 19:00 every day (not executed from 19:00 to 23:59)
trigger1 = AndTrigger([IntervalTrigger(minutes=1), CronTrigger(day_of_week='mon-fri', hour=0-19)])

NodeJs scheduling jobs on multiple nodes

I have two nodeJs servers running behind a Load Balancer. I have some scheduled jobs that i want to run only once on any of the two instances in a distributed manner.
Which module should i use ? Will node-quartz(https://www.npmjs.com/package/node-quartz) be useful for this ?
Adding redis and using node-redlock seemed like overkill for the little caching job I needed to schedule for once a day on a single server with three Node.js processes behind a load balancer.
I discovered http://kvz.io/blog/2012/12/31/lock-your-cronjobs/ - and that led me to the concept behind Tim Kay's solo.
The concept goes like this - instead of locking on an object (only works in a single process) or using a distributed lock (needed for multiple servers), "lock" by listening on a port. All the processes on the server share the same ports. And if the process fails, it will (of course) release the port.
Note that hard-failing (no catch anywhere surrounding) or releasing the lock in catch are both OK, but neglecting to release the lock when catching exceptions around the critical section will mean that the scheduled job never executes until the locking process gets recycled for some other reason.
I'll update when I've tried to implement this.
Edit
Here's my working example of locking on a port:
multiProc.js
var net = require('net');
var server = net.createServer();
server.on('error', function () { console.log('I am process number two!'); });
server.listen({ port: 3000 },
function () { console.log('I am process number one!');
setTimeout(function () {server.close()}, 3000); });
If I run this twice within 3 seconds, here's the output from the first and second instances
first
I am process number one!
second
I am process number two!
If, on the other hand, more than 3 seconds pass between executing the two instances, both claim to be process number one.
I haven't done this before but I can see myself doing it this way.
Using any scheduler library for Node.js.
In order to achieve your goal, i would use redis for distributed lock. Before running any scheduled jobs, a worker / node will have to get the lock; do the job; and release / ack() when finishing the job (or on error).
A single server can be selected a leader by conducting a election among available instances using Zoologist package
https://www.npmjs.com/package/zoologist
Requires Zookeeper server to conduct the election
I don't know if this might help you, but still posting it here.
Usually node-schedule is used for time based schedules where you have to execute arbitrary code only once. For eg: a database read/write on next month 6:00 PM.
The following post will explain writing the scheduled Jobs which will perform certain action based on our requirement for a particular time / day instance.
For performing the above task we are going to use CRON package of node.
To add a job we need to :
1) Install Cron
npm install cron
2) Require cron 's CronJob to our project.
var CronJob = require('cron').CronJob
3) Create an instance of CronJob
var jobs = new CronJob({
cronTime: ' * * * * * *',
onTick: function () {
//perform Your action
},
start: false,
timeZone: 'Asia/Kolkata'
});
Arguments
cronTime: it takes 6 arguments namely :
1) Second - > 0 - 59
2) Minute - > 0 - 59
3) Hour - > 0 - 23
4) Day of Month - > 1 - 31
5) Months - > 0 - 11
6) Day of Week - > 0 - 6
Note: We Can define cronTime in ranges alse like * for always.
0 - 59 / 5 at every 5 minute.
onTick: The operation to perform.
Start: It takes a boolean and if true then starts the job now.
timeZone: job's timeZone
4) To start Job
jobs.start()
For Example :
var jobs = new CronJob({
cronTime: ' 00 00 0-23 * * *',
onTick: function () {
printMyName();
},
start: false,
timeZone: 'Asia/Kolkata'
});
jobs.start();
var printMyName = function () {
var date = new Date();
console.log("Hi Vipul it is ", today);
};
Hope it helps.

Resources