How do you handle time based events in a cluster? - node.js

I have a Node.js application that runs in a cluster, therefore, there are many instances of an app running simultaneously and accepting requests from load balancer.
Consider I have a notion of a "subscription" in my app, and each subscription is stored in the central database with dateStart and dateEnd fields. For each subscription I need to send notifications, reminding clients about subscription expiration (e.g. 14, 7 and 3 days before expiration). Also, I will need to mark a subscription as expired and perform some additional logic, when time comes.
What are the best practices to handle such time-based events for multi-instance applications?
I can make my application to run expiration routine, e.g. every five minutes, but then I will have to deal with concurrency issues, because every instance will try to do so and we don't want notifications to be submitted twice.

I refactored the scheduled jobs for one of our systems when we clustered it a few years ago, a similar issue to what you are describing.
I created a cluster aware scheduled job monitor and used the DB to ensure only one was operating at any given time. Each generated their own unique GUID at startup and used it for an ID. At startup, they all look to the DB to see if a primary is running based on a table indicating ID, start time and last run. A primary is running if the recorded last run is with a specified time. If a primary is running, the rest stay running as backups and check on a given interval to take over if the primary were to die. If the primary dies, the one which takes over as primary marks the record with its ID and updates the times, then looks for jobs in other tables which would be similar to your subscriptions. The primary will continue to look for jobs at a configurable interval until it dies or is restarted.
During testing, I was able to spin up 50+ instances of the monitor which all constantly attempted to become primary. Only one would ever take over and during testing I would then manually kill the primary and watch the others all vie for primary, but only one would prevail. This approach relies on the DB record to only allow one of the threads to update the record using qualified updates based on the prior information in the record.

Related

Tasks that need to be performed on a certain date in Azure

I am developing an application using Azure Cloud Service and web api. I would like to allow users that create a consultation session the ability to change the price of that session, however I would like to allow all users 30 days to leave the session before the new price affects the price for all members currently signed up for the session. My first thought is to use queue storage and set the visibility timeout for the 30 day time limit, but this seems like this could grow the queue really fast over time, especially if the message should not run for 30 days; not to mention the ordering issues. I am looking at the task scheduler as well but the session pricing changes are not a recurring concept but more random. Is the queue idea a good approach or is there a better and more efficient way to accomplish this?
The stuff you are trying to do should be done with a relational database. You can use timestamps to record when prices for session changed. I wouldn't use a queue at all for this. A queue is more for passing messages in a distributed system. Your problem is just about tracking what prices changed on what sessions and when. That data should be modeled in a database.
I think this scenario is more suitable to use Azure Scheduler. Programatically create a Job with one time recurrence with set date as 30 days later to run once. Once this job gets triggered automatically by scheduler, assign an action to callback to one of your API/Service to do the price & other required updates and also remove this Job from the scheduler as part of this action to have a clean jobs list. Anyways premium plan of Azure Scheduler Job Collection will give you unlimited number of jobs to run.
Hope this is exactly what you were looking for...
I would consider using Azure WebJobs. A WebJob basically gives you the ability to run a .NET console application within the context of an Azure Web App. It can be run on demand, continuously, or in response to a reoccurring schedule. If your processing requirements are low and allow for it they can also run in the same process that your Web App is running in to save you $$$ as they are free that way.
You could schedule the WebJob to run once or twice per day and examine the situation and react as is appropriate. Since it's really just a .NET worker role you have ultimate flexibility.

Asynchronous recurring task scheduler in Windows Azure

We would like to make our customers able to schedule recurring tasks on a daily, weekly and monthly basis. Linear scalability is really important to us, that is why we use Windows Azure Table Storage instead of SQL Azure. The current design is the following:
- Scheduling information is stored in a Table Storage table. For example: Task A, daily; Task B, weekly; ...
- There are worker processes, which run hourly and query this table. Then decide, they have to run a given task or not.
But what if, multiple worker roles start to run the same task?
Some other requirements:
- The worker processes can be in different time zones.
Windows Azure Queue Storage could solve all cuncurrency problems mentioned above, but it also introduces some new issues:
- How many queue items should we generate?
- What if the customer changes the recurrence rate or revokes the scheduling?
So, my question is: how to design a recurring task scheduler with multiple asynchronous workers using Windows Azure Storage?
Perhaps the new Azure Scheduler service could help?
http://www.windowsazure.com/en-us/services/scheduler/
Some thoughts:
But what if, multiple worker roles start to run the same task?
This could very well happen. To avoid this, what you could do is have a worker role instance (any worker role instance from the pool) read from table and push messages in a queue. While this instance is doing this work, all other instances wait. To decide which instance does this work, you can make use of blob lease functionality.
Some other requirements: - The worker processes can be in different
time zones.
Not sure about this. Assuming you're talking about Cloud Services Worker Roles, they could be in different data centers but all of them will be in UTC time zone.
How many queue items should we generate?
It really depends on how much work needs to be done. You could put all messages in a queue. Only a maximum of 32 messages can be dequeued from a queue by a client at a time. So if you have say 100 tasks and thus 100 messages, each instance can only read up to 32 messages from the queue in a single call to queue service.
What if the customer changes the recurrence rate or revokes the
scheduling?
That should be OK as once the task is completed you must remove the message from the queue. Next time when the task is invoked, you can read from the table again and it will give you latest information about the task from the table.
I would continue using the Azure Table Storage, but mark the process as "in progress" before a worker starts working on it. Since ATS supports concurrency which is controlled by Etags, you can be assured that two processes won't be able to start the same process
I would, however, think about retry logic when jobs fail unexpectedly and have a process that restarts job that appear to have gone orphan

One-time jobs on Azure workers

What is the best way to do a one-time job on Azure?
Say we want to extend a table in the associated database with a double column. All the new entries will have this value computed by the worker(s) at insertion, but somebody has to take care of the entries that are already in the table. I thought of two alternatives:
a method called by the worker only if a database entry (say, "JobRun") is set to true, and the method would flip the entry to false.
a separate app that does the job, and which is downloaded and run manually using remote desktop (I cannot connect the local app to the Azure SQL server).
The first alternative is messy (how should I deal with the code at the next deployment? delete it? comment it? leave it there? also, what if I will have another job in the future? create a new database entry "Job2Run"?). The second one looks like a cheap hack. I am sure that there is a better way I could not think of.
If you want to run a job once you'll need to take into account the following:
Concurrency: While the job is running, make sure no other worker picks up the job and runs it at the same time (you can use leases for this. More info here).
Once the job is done, you'll need to keep track (in Table Storage, SQL Azure, ...) that the job completed successfully. The next time a worker tries to pick up the job, it will look in Table Storage / SQL Azure / ..., it will see that the job completed and skip the job.
Failure: Maybe your worker crashes during the job which should allow another worker to pick up the job without any issue.
In your specific use case I would also consider using a tool like dbup to manage updates to your schema and existing data with SQL Scripts. Tools like these keep track of which scripts have been executed by adding them in a table in the database.

Creating a Web Crawler using Windows Azure

I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.

How to implement critical section in Azure

How do I implement critical section across multiple instances in Azure?
We are implementing a payment system on Azure.
When ever account balance is updated in the SQL-azure, we need to make sure that the value is 100% correct.
But we have multiple webroles running, thus they would be able to service two requests concurrently from different customers, that would potentially update current balance for one single product. Thus both instances may read the old amount from database at the same time, then both add the purchase to the old value and the both store the new amount in the database. Who ever saves first will have it's change overwritten. :-(
Thus we need to implement a critical section around all updates to account balance in the database. But how to do that in Azure? Guides suggest to use Azure storage queues for inter process communication. :-)
They ensure that the message does not get deleted from the queue until it has been processed.
Even if a process crash, then we are sure that the message will be processed by the next process. (as Azure guarantee to launch a new process if something hang)
I thought about running a singleton worker role to service requests on the queue. But Azure does not guarantee good uptime when you don't run minimum two instances in parallel. Also when I deploy new versions to Azure, I would have to stop the running instance before I can start a new one. Our application cannot accept that the "critical section worker role" does not process messages on the queue within 2 seconds.
Thus we would need multiple worker roles to guarantee sufficient small down time. In which case we are back to the same problem of implementing critical sections across multiple instances in Azure.
Note: If update transaction has not completed before 2 seconds, then we should role it back and start over.
Any idea how to implement critical section across instances in Azure would be deeply appreciated.
Doing synchronisation across instances is a complicated task and it's best to try and think around the problem so you don't have to do it.
In this specific case, if it is as critical as it sounds, I would just leave this up to SQL server (it's pretty good at dealing with data contentions). Rather than have the instances say "the new total value is X", call a stored procedure in SQL where you simply pass in the value of this transaction and the account you want to update. Somthing basic like this:
UPDATE Account
SET
AccountBalance = AccountBalance + #TransactionValue
WHERE
AccountId = #AccountId
If you need to update more than just one table, do it all in the same stored procedure and wrap it in a SQL transaction. I know it doesn't use any sexy technologies or frameworks, but it's much less complicated than any alternative I can think of.

Resources