We have a list of subscriptions and we are running thread for every subscription. It's long-running process.
We are fetching subscriptions list and running thread for every subscription. We want to run new thread for newly added subscriptions without stopping old threads.
We don't want to add list of subscription into an array or list and track from that. I'm looking for better alternative to that. Every subscription thread is not linked to each other.
We want to use TPL library or TPL dataflow if possible.
Related
We would like to make our customers able to schedule recurring tasks on a daily, weekly and monthly basis. Linear scalability is really important to us, that is why we use Windows Azure Table Storage instead of SQL Azure. The current design is the following:
- Scheduling information is stored in a Table Storage table. For example: Task A, daily; Task B, weekly; ...
- There are worker processes, which run hourly and query this table. Then decide, they have to run a given task or not.
But what if, multiple worker roles start to run the same task?
Some other requirements:
- The worker processes can be in different time zones.
Windows Azure Queue Storage could solve all cuncurrency problems mentioned above, but it also introduces some new issues:
- How many queue items should we generate?
- What if the customer changes the recurrence rate or revokes the scheduling?
So, my question is: how to design a recurring task scheduler with multiple asynchronous workers using Windows Azure Storage?
Perhaps the new Azure Scheduler service could help?
http://www.windowsazure.com/en-us/services/scheduler/
Some thoughts:
But what if, multiple worker roles start to run the same task?
This could very well happen. To avoid this, what you could do is have a worker role instance (any worker role instance from the pool) read from table and push messages in a queue. While this instance is doing this work, all other instances wait. To decide which instance does this work, you can make use of blob lease functionality.
Some other requirements: - The worker processes can be in different
time zones.
Not sure about this. Assuming you're talking about Cloud Services Worker Roles, they could be in different data centers but all of them will be in UTC time zone.
How many queue items should we generate?
It really depends on how much work needs to be done. You could put all messages in a queue. Only a maximum of 32 messages can be dequeued from a queue by a client at a time. So if you have say 100 tasks and thus 100 messages, each instance can only read up to 32 messages from the queue in a single call to queue service.
What if the customer changes the recurrence rate or revokes the
scheduling?
That should be OK as once the task is completed you must remove the message from the queue. Next time when the task is invoked, you can read from the table again and it will give you latest information about the task from the table.
I would continue using the Azure Table Storage, but mark the process as "in progress" before a worker starts working on it. Since ATS supports concurrency which is controlled by Etags, you can be assured that two processes won't be able to start the same process
I would, however, think about retry logic when jobs fail unexpectedly and have a process that restarts job that appear to have gone orphan
I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.
I wonder if there's a way to use scheduled tasks with SQL Azure?
Every help is appreciated.
The point is, that I want to run a simple, single line statement every day and would like to prevent setting up a worker role.
There's no SQL Agent equivalent for SQL Azure today. You'd have to call your single-line statement from a background task. However, if you have a Web Role already, you can easily spawn a thread to handle this in your web role without having to create a Worker Role. I blogged about the concept here. To spawn a thread, you can either do it in the OnStart() event handler (where the Role instance is not yet added to the load balancer), or in the Run() method (where the Role instance has been added to the load balancer). Usually it's a good idea to do setup in the OnStart().
One caveat that might not be obvious, whether you execute this call in its own worker role or in a background thread of an existing Web Role: If you scale your Role to, say, two instances, you need to ensure that the daily call only occurs from one of the instances (otherwise you could end up with either duplicates, or a possibly-costly operation being performed multiple times). There are a few techniques you can use to avoid this, such as a table row-lock or an Azure Storage blob lease. With the former, you can use that row to store the timestamp of the last time the operation was executed. If you acquire the lock, you can check to see if the operation occurred within a set time window (maybe an hour?) to decide whether one of the other instances already executed it. If you fail to acquire the lock, you can assume another instance has the lock and is executing the command. There are other techniques - this is just one idea.
In addition to David's answer, if you have a lot of scheduled tasks to do then it might be worth looking at:
lokad.cloud - which has good handling of periodic tasks - http://lokadcloud.codeplex.com/
quartz.net - which is a good all-round scheduling solution - http://quartznet.sourceforge.net/
(You could use quartz.net within the thread that David mentioned, but lokad.cloud would require a slightly bigger architectural change)
I hope it is acceptable to talk about your own company. We have a web based service that allows you to do this. You can click this link to see more details on how to schedule execution of SQL Azure queries.
The overcome the issue of multiple roles executing the same task, you can check for role instance id and make sure that only the first instance will execute the task.
using Microsoft.WindowsAzure.ServiceRuntime;
String g = RoleEnvironment.CurrentRoleInstance.Id;
if (!g.EndsWith("0"))
{
return;
}
I've been reading bunch of articles regarding new TPL in .NET 4. Most of them recommend using Tasks as a replacement for Thread.QueueUserWorkItem. But from what I understand, tasks are not threads. So what happens in the following scenario where I want to use Producer/Consumer queue using new BlockingCollection class in .NET 4:
Queue is initialized with a parameter (say 100) to indicate number of worker tasks. Task.Factory.StartNew() is called to create a bunch of tasks.
Then new work item is added to the queue, the consumer takes this task and executes it.
Now based on the above, there is seems to be a limit of how many tasks you can execute at the same time, while using Thread.QueueUserWorkItem, CLR will use thread pool with default pool size.
Basically what I'm trying to do is figure out is using Tasks with BlockingCollection is appropriate in a scenario where I want to create a Windows service that polls a database for jobs that are ready to be run. If job is ready to be executed, the timer in Windows service (my only producer) will add a new work item to the queue where the work will then be picked up and executed by a worker task.
Does it make sense to use Producer/Consumer queue in this case? And what about number of worker tasks?
I am not sure about whether using the Producer/Consumer queue is the best pattern to use but with respect to the threads issue.
As I believe it. The .NET4 Tasks still run as thread however you do not have to worry about the scheduling of these threads as the .NET4 provides a nice interface to it.
The main advantages of using tasks are:
That you can queue as many of these up as you want with out having the overhead of 1M of memory for each queued workitem that you pass to Thread.QueueUserWorkItem.
It will also manages which threads and processors your tasks will run on to improve data flow and caching.
You can build in a hierarchy of dependancies for your tasks.
It will automatically use as many of the cores avaliable on your machine as possible.
I want to build an Azure application that has two worker roles and NO web roles. When the worker roles first start up I want ONLY ONE of the roles to do the following a single time:
Download and parse a master file then enqueue multiple "child" tasks based on the
contents of the master file
Enqueue a single master file download "child" task to run the next day
Each of the "child" tasks would then be done by both of the workers until the task queue was exhausted. Think of the whole things as "priming the pump"
This sort of thing is really easy if I add the the first "master" task manually in a queue by calling a web role but seems to be really hard to do in an auto-start mode.
Any help in this regard would be greatly appreciated!
Thanks.....
One possibility: instead of calling a web role, just load the queue directly. (It sounds like this is the sort of application you'll want to automatically spin up to do some work and then shut down again... if you're automating that, it should be trivial to also automate loading the queue.)
A (perhaps) better option: Use some sort of locking mechanism to make sure only one worker instance does the initialization work. One way to do this is to try to create the queue (or a blob, or an entity in a table). If it already exists, then the other instance is handling initialization. If the create succeeds, then it's this instance's job.
Note that it's always better to use a lease than a lock, in case the instance that's doing the initialization fails. Consider using a timeout (e.g. storing a timestamp in table storage or in the metadata of the blob or in the name of the queue...).
We did end-up with the exact same sort of problem, that's why we introduced a O/C mapper (object to cloud). Basically, you want to introduce two types of cloud services:
QueueService that consumes messages whenever available.
ScheduledService that triggers operations on a scheduled basis.
Then, as others suggested, in the cloud, you really prefer using leases instead of locks, in order to avoid your cloud app to end up freezed forever due to a temporary hardware (or infrastructure) issue.