Celery - Start and stop periodic task from tornado interface - cron

I am trying to start and stop celery periodic tasks from a tornado application interface.
As an example, let say there are two tasks: A and B.
I would like the user to be able to select the periodicity from an HTML form (every minute, every month, every 5 minutes, etc) and click start on task A.
The user can do the same with task B. And then come back on a page where there is a button to stop task A and/or task B whenever he wants.
I have been browsing a lot stackoverflow on questions around the topic and none of them are answering this simple question.
As of now, my tornado app handle simple celery workers without any problem, the problem I face is for periodic tasks.(https://docs.celeryproject.org/en/latest/reference/celery.html#celery.Celery.on_after_configure)
continuous_monitoring_worker.py:
continuous_tracking_worker_app.conf.timezone = 'Europe/London'
#continuous_tracking_worker_app.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
print('OK inside setup periodic tasks ...')
sender.add_periodic_task(10.0, test.s('Hello World'), name='add every 10')
# In case I want to stop the task later
#time.sleep(35)
#print('beat_schedule = {}'.format(continuous_tracking_worker_app.conf.beat_schedule))
#del continuous_tracking_worker_app.conf.beat_schedule['add every 10']
#continuous_tracking_worker_app.task
def test(name):
print('Periodic task called, name = {}'.format(name))
With this code, I run through the following problem: When I start the app, it launches the task every 10 seconds (I guess because of the #continuous_tracking_worker_app.on_after_configure.connect decorator). But I want the task to be launch on demand from the interface (on the backend in my tornado view file by calling setup_periodic_tasks(continuous_tracking_worker_app), and not on start of the tornado app !

Celery Beat can't do that.
You may consider using the Celery Redbeat that makes it possible.

Celery Beat can do that.
Just set the expires variable as described here, (go to the expiration tab)

Related

Infinte loop vs cron job

I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
Wondering what would be a good strategy to run this, either to schedule this as a cron job on host or start a go program with infinite loop which execute the program and sleeps(Golang: Implementing a cron / executing tasks at a specific time)
If your task is...
On Unix
Stand alone
Periodic
Has an acceptable startup time
cron will be better than rolling your own scheduler just for the one service. It will guarantee the process will always run at the correct time and has rudimentary error reporting. There's no need to add a watchdog in case your infinite loop has an error, cron will run the process again in 5 minutes.
If cron is insufficient, look into other job schedulers before rolling your own.
I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
These are famous last words. I would suggest adding in some form of locking. For example, write your PID to a file in /var/run and check if that process is running. There's even a little pidfile library for Go.
Take a look on Systemd, you can execute a script with timers and set max execution time for the script.
https://wiki.archlinux.org/index.php/Systemd/Timers

Scheduling Tasks in Node.js Using Timers instead of Scheduled Cron Jobs

This is my objective:
5 minutes after the last document update, I want to execute a one-time task
So this is what the flow of actions will look like:
User updates document - timer is started and counts down from 5 minutes
If user updates the same document again and the previous timer (identified by the document._id) is running still, reset that timer back to 5 minutes and repeat countdown
When timer has elapsed - the one time task is executed
I have not done something like this before and I am at a loss at where to begin.
I can hook into document changes easily using methods available in Mongoose (i.e. on save, do func to setup or reset timer)
However, I cannot figure out the way to:
create a timer that waits 5 minutes - then executes a task
making the timer accessible so that I can reset the timer
making the timer accessible so I can add extra variables which will be used in the funtion when timer has elapsed
I've investigated Cron jobs but they seem to tasks that schedule at the same time everyday.
I need a timer that delays a task, but also the ability to reset that timer, and add extra data to the timer.
Any hints appreciated.
This is the solution I managed to complete up with.
First of all, it's worth noting that my original assumptions are correct:
Cron jobs are great for repetitive, tasks that are scheduled at the same time everyday. However, for tasks that are created on the fly, and have a countdown element, then cron jobs isn't the right tool.
However, enter node-schedule (https://www.npmjs.com/package/node-schedule)
Node Schedule is a flexible cron-like and not-cron-like job scheduler for Node.js. It allows you to schedule jobs (arbitrary functions) for execution at specific dates, with optional recurrence rules. It only uses a single timer at any given time (rather than reevaluating upcoming jobs every second/minute).
The use of timers is really what these, on-the-fly tasks need.
So my solution looks like this:
var schedule = require('node-schedule'),
email_scheduler = {};
email_scheduler["H1e5nE2GW"] = schedule.scheduleJob(Date.now() + 10000, function(){
console.log('its been a total of 10 seconds since initalization / reset.');
});
var second_attempt = schedule.scheduleJob(Date.now() + 5000, function(){
console.log("5 seconds elapsed from start, time to reset original scheduled job");
email_scheduler["H1e5nE2GW"].cancel();
email_scheduler["H1e5nE2GW"] = schedule.scheduleJob(Date.now() + 10000, function(){
console.log('its been a total of 10 since last one seconds.');
});
schedule.scheduleJob(Date.now() + 5000, function(){
console.log("it's been another 5 seconds since reset");
});
});
My thinking (though not yet tested) is that I can create a singleton-like instance for the email_scheduler object by creating a node module. Like such:
// node_modules/email-scheduler/index.js
module.exports = {};
This way, I can access the scheduleJobs and reset the timers in every file of the node application.
You can use events for that. call a event with some parameter and before your task add sleep and exec your task. But add sleep or settimeout is not a good idea in nodejs.
I had the same puzzle but I ended settling on the javascript native functions.
function intervalFunc() {
//inside this function you can request to get the date and time then decide what to do and how to do it
autogens.serverAutoGenerate();//my function to autogenerate bills
}
setInterval(intervalFunc, 1000);//repeat task after one seconds
Adding the code at the index file or entry file of your node js application will make the function repeatedly execute. You can opt to change the frequency depending on your needs. If you wanted to run the cronjob three times a day, get the number of seconds in 24 hours then divide by 3. If you wanted to run the job on a specific hour, by using the new Date(), you can check to determine if it is the right hour to execute your event or not.

How node-scheduler working in this scenerio

I am new in NodeJs and now I want to use node-scheduler, but i have just one query, please give me suggestion regarding this.
https://github.com/node-schedule/node-schedule
When I setup a scheduler that run in every 5 Minutes, If the scheduler does
not completed within 5 minutes. So my question is that then the scheduler
will start another thread or not?
Please solve my query.
Thanks.
Since jobs don't seem to have a mechanism to let the scheduler know they are done, jobs will be scheduled according to their scheduled time alone.
In other words: if you schedule a job to run every 5 minutes, it will be started every 5 minutes, even if the job itself takes more than 5 minutes to complete.
To clarify: this doesn't start a new thread for each job, as JS is single-threaded. If a job blocks the event loop (for instance by doing heavy calculations), it is possible for the scheduler to not be able to start a new job when its time has arrived, but blocking the event loop is not a good thing.

CRON + Nodejs + multiple cores => behaviour?

I'm building in a CRON like module into my service (using node-schedule) that will get required into each instance of my multi-core setup and I'm wondering since they are all running their own threads and they are all scheduled to run at the same time, will they get called for every single thread or just once because they're all loading the same module.
If they do get called multiple times, then what is the best way to make sure the desired actions only get called once?
if you are using pm2 with cluster mode, then can use
process.env.NODE_APP_INSTANCE to detect which instance is running. You can use the following code so your cron jobs will be called only once.
// run cron jobs only for first instance
if(process.env.NODE_APP_INSTANCE === '0'){
// cron jobs
}
node-schedule runs inside a given node process and it schedules things that that particular node process asked it to schedule.
If you are running multiple node processes and each is using node-schedule, then all the node-schedule instances within those separate node processes are independent (no cooperation or coordination between them). If each node process asks it's own node-schedule instance to run a particular task at 3pm on the first wednesday of the month, then all the node processes will start running that task at that time.
If you only want the action carried out once, then you have to coordinate among your node-instances so that the action is only scheduled in one node process, not in all of them or only schedule these types of operations in one of your node instances, not all of them.
The best way to handle this in a generic way is to have a shared database that you write a "lock" entry to. As in, let's say all tasks wrote a DB entry such as {instanceId: "a", taskId: "myTask", timestamp: "2021-12-22:10:35"}.
All tasks would submit the same thing except with their own instanceId. You then have an unique index on 'timestamp' so that only 1 gets accepted.
Then they all do a query and see if their node was the one that was accepted to do the cron.
You could do the same thing but also add a "random" field that generates a random number and the task with the lowest number wins.

cron jobs: Monitor time it takes for jobs to finish

I'm doing a research project that requires I monitor cron jobs on a Ubuntu Linux system. I have collected data about the jobs' tasks and when they are started, I just don't know of a way to monitor how long they take to finish running.
I could calculate the time of finishing the task minus starting it with something like this but that would require doing that on the Shell scripts of each cron job. That's not necessarily difficult by any means but it seems a little silly that cron wouldn't in some way log this, so I'm trying to find an easier way :P
tl;dr Figure out time cron jobs take from start to finish
You could just put time in front of your crontabs, and if you're getting notifications about cron script outputs, it'll get sent to you.
For example, if you had:
0 1,13 * * * /maint/run_webalizer.sh
add time in front
0 1,13 * * * time /maint/run_webalizer.sh
and you'll get some output that looks like (the "real" is the time you want):
real 3m1.255s
user 0m37.890s
sys 0m3.492s
If you don't get cron notifications, you can just pipe the output to a file.
man time. Maybe you can create a wrapper and tell Cron to use it as your "shell" or something like that.
Cronitor (https://cronitor.io) is a tool I built exactly for this purpose. It uses http requests to record the start and end of your jobs.
You'll be notified if your job doesn't run on schedule, or if it runs for too long/too short. You can also configure it to send alerts to you via email, sms, but also Slack, Hipchat, Pagerduty and others.
I use the Jenkins CI to do this via its external-monitor-job plugin. Jenkins can track start and end times, track overall execution time over time, save the output of all jobs it tracks, and present success/failure conditions graphically.
https://wiki.jenkins-ci.org/display/JENKINS/Monitoring+external+jobs

Resources