How to avoid conflict of two scheduled jobs - cron

I have one scheduled cron job (at the momment) runs every 5 minutes, the process looks for messages into database (oracle) with status "error" and then it gets all elements to continue in determinate flow (I am working with Mulesoft's Mule Runtime), I think if the process runs in two or more nodes it will reprocess two or more times, because the first node can get 10 registers and the second node also can get the same 10 registers and both of two process will send the same elements two times.
Is there a way to avoid this functionality? What is the best practice in this cases? Should I find another solution?
Please help me with this.

Related

Graceful ramp down of user load in locust

Is there a way to do graceful ramp down of my user load while running load tests using locust framework ?
For e.g. using the below command -
I want to run a single user for a time period of 5 minutes in a loop, but here what happens is that the last iteration end abruptly with lets say 5 requests on some of the tasks and 4 on some others.
Yes! There is a new flag (developed by yours truly :) called --stop-timeout that will make locust wait for the users to complete (up to the specified time span)
There is (as of now) no way to do actual ramp down (gradually reducing the load by shutting down users over time), but hopefully this is enough for you. Such a feature might be added at a later time.
You can create your own Shape that allows you to specify how many user should be present at any time of the test.
This is explained there: https://github.com/locustio/locust/pull/1505

What is better way to making longer delay inside a series of tasks?

I'm trying to build a workflow system, which will process a series of tasks & delays. Delay can be changed or removed from a running workflow.
What is the better way to making longer delay inside a series of tasks? (Like 3-4 months). Right now two ways are pocking around my head:
Pre-calculating & saving delay time. Setup a scheduler that will check delay repeatedly after a specific interval(1 minute maybe). This will make a lot of database queries, but the delay can be changed instantly.
Schedule a job for a delay. This can reduce a lot of database queries &, but the problem is maintaining & changing delay in these long-running jobs. Also, these jobs need to survive a server crash or restart.
Right now I'm not sure how to do it in a better way and still studying about it. If anyone has a similar experience, please share.
You can store the tasks into the database, like :
{
_id: String,
status: Enum,
executionTime: timestamp,
}
When you declare a new task, push a new entry into the DB.
At your server start, or when a new task is declared, create a setTimeout that will wake up your node.js when it's necessary.
Optimization
To avoid having X setTimeout, with X the number of task to execute. Keep only one setTimeout, with the time to wait equals to the closest task to execute.
For example, you have three task, one must run in 1 hour, one in 2 hour and one in 3 hour. Use a setTimeout of 1 hour. When it get triggered, it execute the task 1 and then look at the remaining tasks to re-run.

CRON + Nodejs + multiple cores => behaviour?

I'm building in a CRON like module into my service (using node-schedule) that will get required into each instance of my multi-core setup and I'm wondering since they are all running their own threads and they are all scheduled to run at the same time, will they get called for every single thread or just once because they're all loading the same module.
If they do get called multiple times, then what is the best way to make sure the desired actions only get called once?
if you are using pm2 with cluster mode, then can use
process.env.NODE_APP_INSTANCE to detect which instance is running. You can use the following code so your cron jobs will be called only once.
// run cron jobs only for first instance
if(process.env.NODE_APP_INSTANCE === '0'){
// cron jobs
}
node-schedule runs inside a given node process and it schedules things that that particular node process asked it to schedule.
If you are running multiple node processes and each is using node-schedule, then all the node-schedule instances within those separate node processes are independent (no cooperation or coordination between them). If each node process asks it's own node-schedule instance to run a particular task at 3pm on the first wednesday of the month, then all the node processes will start running that task at that time.
If you only want the action carried out once, then you have to coordinate among your node-instances so that the action is only scheduled in one node process, not in all of them or only schedule these types of operations in one of your node instances, not all of them.
The best way to handle this in a generic way is to have a shared database that you write a "lock" entry to. As in, let's say all tasks wrote a DB entry such as {instanceId: "a", taskId: "myTask", timestamp: "2021-12-22:10:35"}.
All tasks would submit the same thing except with their own instanceId. You then have an unique index on 'timestamp' so that only 1 gets accepted.
Then they all do a query and see if their node was the one that was accepted to do the cron.
You could do the same thing but also add a "random" field that generates a random number and the task with the lowest number wins.

Preventing cronjobs from overlapping

I have 3 different jobs set up in crontab (call them jobA, jobB, jobC) that run at different intervals and start at different times during the day. For example, jobA runs once per hour at 5 mins past the hour, jobB runs every 30 mins at 9 and 39 mins past the hour, and jobC runs every 15 mins. They are not dependent on each other, but for various reasons they can NOT be running at the same time.
The problem is that sometimes one of the jobs takes a long time to run and another one starts before the first one is done, causing issues.
Is there some way to queue or spool these jobs so that one will not start until the current running one has finished? I tried using this solution but this does not guarantee that the pending jobs will resume in the same order they were supposed to start. A queue would be best, but I cannot find anything about how to do this.
You can't do that using cron. Cron is used to run a specific command at specific time. You can do it by the solution you proposed, but that adds a lot more complexity.
I suggest, writing/coding the requirement in high level language like java and use a mutil-thread program to achieve what you need.
Control-m is another scheduling software, with a lot of other features as well. You would be able to integrate the above use-case in it.

How to define frequency of a job in application by users?

I have an application that has to launch jobs repeatingly. But (yes, that would have been to easy without a but...) I would like users to define their backup frequency in application.
In worst case, they would have to choose between :
weekly,
daily,
every 12 hours,
every 6 hours,
hourly
In best case, they should be able to use crontab expressions (see documentation for example)
How to do this? Do I launch a job every minutes that check for last execution time, frequency and then launches another job if needed? Do I create a sort of queue that will be executed by a masterjob?
Any clues, ideas, opinions, best pratices, experiences are welcome!
EDIT : Solved this problem using Akka scheduler. Ok, this is a technical solution not a design answer but still everything works great.
Each user defined repetition is an actor that send messages every period to a new actor to execute the actual job.
There may be two ways to do this depending on your requirements/architecture:
If you can only use Play:
The user creates the job and the frequency it will run (crontab, whatever).
On saving the job, you calculate the first time it will have to be run. You then add an entry to a table JOBS with the execution time, job id, and any other information required. This is required as Play is stateless and information must be stored in the DB for later retrieval.
You have a job that queries the table for entries whose execution date is less than now. Retrieves the first, runs it, removes it from the table and adds a new entry for next execution. You should keep some execution counter so if a task fails (which means the entry is not removed from DB) it won't block execution of the other tasks by the job trying again and again.
The frequency of this job is set to run every second. That way while there is information in the table, you should execute the request around as often as they are required. As Play won't spawn a new job while the current one is working if you have enough tasks this one job will serve all. If not, it will be killed at some point and restored when required.
Of course, the crons of the users will not be too precise, as you have to account for you own cron delays plus execution delays on all the tasks in queue, which will be run sequentially. Not the best approach, unless you somehow disallow crons which run every second or more often than every minute (to be safe). Doing a check on execution time of the crons to kill them if they are over a certain amount of time would be a good idea.
If you can use more than Play:
The better alternative I believe is to use Quartz (see this) to create a future execution when the user creates the job, and reproram it once the execution is over.
There was a discussion on google-groups about it. As far as I remember you must define a job which start every 6 hours and check which backups must be done. So you must remember when the last backup job was finished and make the control yourself. I'm unsure if Quartz can handle such a requirement.
I looked in the source-code (always a good source ;-)) and found a method every, where I think this should be do what you want. How ever I'm unsure if this is a clever design, because if you have 1000 user you will have then 1000 Jobs. I'm unsure if Play was build to handle such a large number of jobs.
[Update] For cron-expressions you should have a look into JobPlugin.scheduleForCRON()
There are several ways to solve this.
If you don't have a really huge load of jobs, I'd just persist them to a table using the required flexibility. Then check all of them every hour (or the lowest interval you support) and run those eligible. Simple.
Or, if you prefer to use cron syntax anyway, just write (export) jobs to a user crontab using a wrapper which calls back to your running app, or starts the job in a standalone process if that's possible.

Resources