how to distribute celery periodic task which runs every 5min? - cron

I have several tasks with 'schedule': crontab(minute='*/5'),
They run every 5min that's divisible by 5.
Tasks scheduled with the minute='*/5' will be run at the same time.
Can you somehow distribute them? like minute='*/5' + 1 ( remainder is 1 after dividing by 5)

Based on the documentation of celery you can't use format like minute='*/5' + 1. What you can do is to use explicit definition of minutes to run the task like:
minute='1,6,11,16,21,26,31,36,41,46,51,56'
P.S. Records like this are usual practice in UNIX/Linux world

Related

Prevent concurrent cron jobs in pg-boss

I’m considering pg-boss for running and distributing event-based jobs between the instances of the same service. One of my use cases, apart from event-based, is scheduled jobs. Some of them can take a while and continue running until it’s time to trigger the next invocation - e.g. a job is set to run every 5 minutes but it can take e.g. 8 to complete. In such case I need the system to realize that the previous run is still in progress and not trigger the same job while the previous invocation of it is still in progress, using the example of every 5 minutes and a job taking 8 minutes - I’d like sth like the following to happen:
13:00 job triggered
13:05 job still runs, system sees it and doesn’t trigger once more even though it’s time
13:08 job done
13:10 next job run triggered
Is there an elegant way to achieve it with pg-boss without implementing my own locking mechanism?

Is it possible to run a scheduled task each midnight in Shopware?

We have read https://developer.shopware.com/docs/guides/plugins/plugins/plugin-fundamentals/add-scheduled-task which described how to define a scheduled tasks which runs every x minutes.
Is it also possible to specific the execution time, for example each midnight or every day a 2 am, like in a crontab?
There is no such possibility in Shopware 6. It is easier to accomplish this with a CLI command and crontab. But if you have to use Shopware's scheduled task then you can trick it by setting the nextExecutionTime to the time you want to execute the task.
For example if today's date is 10.03.2022 and you want to execute the scheduled task every day at 2 am then set the nextExecutionTime to 2022-03-11 00:02:00.000 (use future date) and runInterval to 86400 (24h). This way Shopware will start the task at 2 am and then set the nextExecutionTime to the next day at 2 am (+- couple of minutes from my experience).
I'm not aware of such a feature in Shopware core.
The probably most straightforward way would be to add a "real" cronjob like you mentioned, which triggers a CLI-command.
You can encapsulate the logic of the task in it's own service, so that the scheduled task and cli-command both can just use the service (if you want to keep both).

Infinte loop vs cron job

I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
Wondering what would be a good strategy to run this, either to schedule this as a cron job on host or start a go program with infinite loop which execute the program and sleeps(Golang: Implementing a cron / executing tasks at a specific time)
If your task is...
On Unix
Stand alone
Periodic
Has an acceptable startup time
cron will be better than rolling your own scheduler just for the one service. It will guarantee the process will always run at the correct time and has rudimentary error reporting. There's no need to add a watchdog in case your infinite loop has an error, cron will run the process again in 5 minutes.
If cron is insufficient, look into other job schedulers before rolling your own.
I have an uploader service which needs to run every 5minutes and it definitely finished within 5 minutes so there are never two parallel session.
These are famous last words. I would suggest adding in some form of locking. For example, write your PID to a file in /var/run and check if that process is running. There's even a little pidfile library for Go.
Take a look on Systemd, you can execute a script with timers and set max execution time for the script.
https://wiki.archlinux.org/index.php/Systemd/Timers

CRON + Nodejs + multiple cores => behaviour?

I'm building in a CRON like module into my service (using node-schedule) that will get required into each instance of my multi-core setup and I'm wondering since they are all running their own threads and they are all scheduled to run at the same time, will they get called for every single thread or just once because they're all loading the same module.
If they do get called multiple times, then what is the best way to make sure the desired actions only get called once?
if you are using pm2 with cluster mode, then can use
process.env.NODE_APP_INSTANCE to detect which instance is running. You can use the following code so your cron jobs will be called only once.
// run cron jobs only for first instance
if(process.env.NODE_APP_INSTANCE === '0'){
// cron jobs
}
node-schedule runs inside a given node process and it schedules things that that particular node process asked it to schedule.
If you are running multiple node processes and each is using node-schedule, then all the node-schedule instances within those separate node processes are independent (no cooperation or coordination between them). If each node process asks it's own node-schedule instance to run a particular task at 3pm on the first wednesday of the month, then all the node processes will start running that task at that time.
If you only want the action carried out once, then you have to coordinate among your node-instances so that the action is only scheduled in one node process, not in all of them or only schedule these types of operations in one of your node instances, not all of them.
The best way to handle this in a generic way is to have a shared database that you write a "lock" entry to. As in, let's say all tasks wrote a DB entry such as {instanceId: "a", taskId: "myTask", timestamp: "2021-12-22:10:35"}.
All tasks would submit the same thing except with their own instanceId. You then have an unique index on 'timestamp' so that only 1 gets accepted.
Then they all do a query and see if their node was the one that was accepted to do the cron.
You could do the same thing but also add a "random" field that generates a random number and the task with the lowest number wins.

Preventing cronjobs from overlapping

I have 3 different jobs set up in crontab (call them jobA, jobB, jobC) that run at different intervals and start at different times during the day. For example, jobA runs once per hour at 5 mins past the hour, jobB runs every 30 mins at 9 and 39 mins past the hour, and jobC runs every 15 mins. They are not dependent on each other, but for various reasons they can NOT be running at the same time.
The problem is that sometimes one of the jobs takes a long time to run and another one starts before the first one is done, causing issues.
Is there some way to queue or spool these jobs so that one will not start until the current running one has finished? I tried using this solution but this does not guarantee that the pending jobs will resume in the same order they were supposed to start. A queue would be best, but I cannot find anything about how to do this.
You can't do that using cron. Cron is used to run a specific command at specific time. You can do it by the solution you proposed, but that adds a lot more complexity.
I suggest, writing/coding the requirement in high level language like java and use a mutil-thread program to achieve what you need.
Control-m is another scheduling software, with a lot of other features as well. You would be able to integrate the above use-case in it.

Resources