I have a web application, where users can schedule different jobs.
I'am not sure how to proceed with this.
All the nodejs schedulers out there basically reads the schedule from within the code. I can of course implement this cron like schedule to be read from a database, but I'am not sure if its the most effective way?
If I back the solution with a database I would need to query that database, let say each second, to see if there is any schduled jobs that needs to be handled. I can't read them once a day, because new jobs might be added on a regular basis.
Keeping them in memory dosen't seem very efficient either?
Am I looking for a different kind of technology to handle this, than a scheduler+database?
We are talkning around 10.000 jobs for the time being (as a maximum). They are mostly related to sending emails and/or giving notifications within the application itself.
Related
I have been researching how to efficiently solve the following use case and I am struggling to find the best solution.
Basically I have a Node.js REST API which handles requests for users from a mobile application. We want some requests to launch background tasks outside of the req/res flow because they are CPU intensive or might just take a while to execute. We are trying to implement or use any existing frameworks which are able to handle different job queues in the following way (or at least compatible with the use case):
Every user has their own set job queues (there are different kind of jobs).
The jobs within one specific queue have to be executed sequentially and only one job at a time but everything else can be executed in parallel (it would be preferable if there are no queues hogging the workers or whatever is actually consuming the tasks so all queues get more or less the same priority).
Some queues might fill up with hundreds of tasks at a given time but most likely they will be empty a lot of the time.
Queues need to be persistent.
We currently have a solution with RabbitMQ with one queue for every kind of task which all the users share. The users dump tasks into the same queues which results in them filling up with tasks from a specific user for a long time and having the rest of users wait for those tasks to be done before their own start to be consumed. We have looked into priority queues but we don't think that's the way to go for our own use case.
The first somewhat logical solution we thought of is to create temporary queues whenever a user needs to run background jobs and have them be deleted when empty. Nevertheless we are not sure if having that many queues is scalable and we are also struggling with dynamically creating RabbitMQ queues, exchanges, etc (we have even read somewhere that it might be an anti-pattern?).
We have been doing some more research and maybe the way to go would be with other stuff such as Kafka or Redis based stuff like BullMQ or similar.
What would you recommend?
If you're on AWS, have you considered SQS? There is no limit on number of standard queues created, and in flight messages can reach up to 120k. This would seem to satisfy your requirements above.
While the mentioned SQS solution did prove to be very scalable our amount of polling we would need to do or use of SNS did not make the solution optimal. On the other hand implementing a self made solution via database polling was too much for our use case and we did not have the time or computational resources to consider a new database in our stack.
Luckily, we ended up finding that the Pro version of BullMQ does have a "Group" functionality which performs a round robin strategy for different tasks within a single queue. This ended up adjusting perfectly to our use case and is what we ended up using.
I have been looking for a time based persistent scheduler. I looked into some applications (Agenda, node-cron, node-schedule). But I couldn't find anything that satisfies my criteria.
So my applications sends out reminders to our customers based on their event timings. I am hesitating to run a regular cronjob because I have to run every 15 mins or so in this case. And for each cronjob, I have to make a database call. I am trying not to use resources unnecessarily.
In addition to that, I am already running a lot of cronjobs. But in my case, when the job is completed, I want the cron to get cancelled/finished; not live on memory until the server restart happens.
I tried using the above specified applications by setting exact timestamps (agenda, node-cron, node-schedule). But the cron lives on forever even after the job is completed, and if i restart the server, all the scheduled jobs are cron. So persistence is also an issue I am facing.
My server uses node js. If there are any other languages/tools to make this work, I am all ears.
Looking forward to your help.
I tried following this solution. But this solution is for one predefined event. In my case, the number of reminders to be sent out are dynamic and jobs are to be scheduled on the fly.
I know this might be a broad questions, but I've been trying to find the right way to do this and I don't seem to be going anywhere.
Basically, I have a bunch of Objects saved in mongo that contain events, like below :
{
"date" : "2020-09-09",
"day" : 1599573600000 // epoch time
"from" : 1599595200000 // epoch time
"to" : 1599695200000 // epoch time
}
I need to fire some events, like sending a reminded SMS etc, before the date that is specified in from field.
I know I can write a cron job and regularly check on my entire mongo collection, find all the ones that are due and the rest is obvious.
However, somehow I feel like there must be a better way, because this can be extremely slow after our database grows with millions of events.
So the question that I have is,
1- What are some other options, beside cron jobs.
2- Is there any difference between running Cron jobs in NodeJS, and running Cron Jobs in Google App Engine ( server-less instance), which one is better?
3- Is there any service out there that anyone has used?
Any direction would be appreciated.
I'm assuming you're trying to stay in the GCP ecosystem.
For scalability you could use cron to kick off a Google Dataflow pipeline. With this pipeline you can define a pipeline step to be executed for each record that matches the given query. Dataflow will ramp up the number of workers as it goes to handle the scale.
If you're not at that level of scale, Dataflow can be a bit heavy and may feel like overkill for your current use case. If that's the case, then you can use a combination of cron and google cloud tasks where you'd enqueue/launch a task per record. For large amounts of records, you could launch a task per batch of records (i.e. an injector pattern)
https://cloud.google.com/tasks/docs/manage-cloud-task-scaling#large-scalebatch_task_enqueues
Another option is just using google cloud tasks, using the 'schedule_time' field. Here you'd enqueue the tasks when you originally write the record into the DB, instead of periodically querying to see which ones need to be run
https://cloud.google.com/tasks/docs/creating-http-target-tasks
2- Is there any difference between running Cron jobs in NodeJS, and running Cron Jobs in Google App Engine ( server-less instance), which one is better?
I wasn't sure what you meant by your second question because you can run node.js in app engine. In my experience things do work better when you keep everything within GCP.
Currently I am solving an engineering problem, and want to open the conversation to the SO community.
I want to implement a task scheduler. I have two separate instances of a nodeJS application sitting behind an elastic load balancer (ELB). The problem is when both instances come up, they try to execute the same tasks logic, causing the tasks run more than once.
My current solution is to use node-schedule to schedule tasks to run, then have them referencing the database to check if the task hasn't already been run since it's specified run time interval.
The logic here is a little messy, and I am wondering if there is a more elegant way I could go about doing this.
Perhaps it is possible to set a particular env variable on a specific instance - so that only that instance will run the tasks.
What do you all think?
What you are describing appears to be a perfect example of a use case for AWS Simple Queue Service.
https://aws.amazon.com/sqs/details/
Key points to look out for in your solution:
Make sure that you pick a visibility timeout that is reflective of your workload (so messages don't reenter the queue whilst still in process by another worker)
Don't store your workload in the message, reference it! A message can only be up to 256kb in size and message sizes have an impact on performance and cost.
Make sure you understand billing! As billing is charged in 64KB chunks, meaning 1 220KB message is charged as 4x 64KB chucks / requests.
If you make your messages small, you can save more money by doing batch requests as your bang for buck will be far greater!
Use longpolling to retrieve messages to get the most value out of your message requests.
Grant your application permissions to SQS by the use of an EC2 IAM Role, as this is the best security practice and the recommended approach by AWS.
It's an excellent service, and should resolve your current need nicely.
Thanks!
Xavier.
I am developing an application using Azure Cloud Service and web api. I would like to allow users that create a consultation session the ability to change the price of that session, however I would like to allow all users 30 days to leave the session before the new price affects the price for all members currently signed up for the session. My first thought is to use queue storage and set the visibility timeout for the 30 day time limit, but this seems like this could grow the queue really fast over time, especially if the message should not run for 30 days; not to mention the ordering issues. I am looking at the task scheduler as well but the session pricing changes are not a recurring concept but more random. Is the queue idea a good approach or is there a better and more efficient way to accomplish this?
The stuff you are trying to do should be done with a relational database. You can use timestamps to record when prices for session changed. I wouldn't use a queue at all for this. A queue is more for passing messages in a distributed system. Your problem is just about tracking what prices changed on what sessions and when. That data should be modeled in a database.
I think this scenario is more suitable to use Azure Scheduler. Programatically create a Job with one time recurrence with set date as 30 days later to run once. Once this job gets triggered automatically by scheduler, assign an action to callback to one of your API/Service to do the price & other required updates and also remove this Job from the scheduler as part of this action to have a clean jobs list. Anyways premium plan of Azure Scheduler Job Collection will give you unlimited number of jobs to run.
Hope this is exactly what you were looking for...
I would consider using Azure WebJobs. A WebJob basically gives you the ability to run a .NET console application within the context of an Azure Web App. It can be run on demand, continuously, or in response to a reoccurring schedule. If your processing requirements are low and allow for it they can also run in the same process that your Web App is running in to save you $$$ as they are free that way.
You could schedule the WebJob to run once or twice per day and examine the situation and react as is appropriate. Since it's really just a .NET worker role you have ultimate flexibility.