Detecting queue deletion - azure

A customer wants to flush the queue in one go. However, neither of the batch options seems to be available in Java, so they insist on deleting the queue altogether.
The question is, how do I detect that a queue is no longer there? Is there some kind of a hook? Because my listener will simply be silent and do nothing.
(The queue is created when the worker role starts if it doesn't exist yet, so generally restarting the queue would work. The only problem is, there are many instances so it's a bit problematic to restart it.)

NamespaceManager.QueueExists should get the job done for you.

Related

Solution for user-specific background job queues

I have been researching how to efficiently solve the following use case and I am struggling to find the best solution.
Basically I have a Node.js REST API which handles requests for users from a mobile application. We want some requests to launch background tasks outside of the req/res flow because they are CPU intensive or might just take a while to execute. We are trying to implement or use any existing frameworks which are able to handle different job queues in the following way (or at least compatible with the use case):
Every user has their own set job queues (there are different kind of jobs).
The jobs within one specific queue have to be executed sequentially and only one job at a time but everything else can be executed in parallel (it would be preferable if there are no queues hogging the workers or whatever is actually consuming the tasks so all queues get more or less the same priority).
Some queues might fill up with hundreds of tasks at a given time but most likely they will be empty a lot of the time.
Queues need to be persistent.
We currently have a solution with RabbitMQ with one queue for every kind of task which all the users share. The users dump tasks into the same queues which results in them filling up with tasks from a specific user for a long time and having the rest of users wait for those tasks to be done before their own start to be consumed. We have looked into priority queues but we don't think that's the way to go for our own use case.
The first somewhat logical solution we thought of is to create temporary queues whenever a user needs to run background jobs and have them be deleted when empty. Nevertheless we are not sure if having that many queues is scalable and we are also struggling with dynamically creating RabbitMQ queues, exchanges, etc (we have even read somewhere that it might be an anti-pattern?).
We have been doing some more research and maybe the way to go would be with other stuff such as Kafka or Redis based stuff like BullMQ or similar.
What would you recommend?
If you're on AWS, have you considered SQS? There is no limit on number of standard queues created, and in flight messages can reach up to 120k. This would seem to satisfy your requirements above.
While the mentioned SQS solution did prove to be very scalable our amount of polling we would need to do or use of SNS did not make the solution optimal. On the other hand implementing a self made solution via database polling was too much for our use case and we did not have the time or computational resources to consider a new database in our stack.
Luckily, we ended up finding that the Pro version of BullMQ does have a "Group" functionality which performs a round robin strategy for different tasks within a single queue. This ended up adjusting perfectly to our use case and is what we ended up using.

node: persist data after process termination

I'm using node-cache to cache data from a CLI application that observes changes in files and caches them to avoid new data processing.
the problem is that I noticed that this cache is destroyed on each command, since each time the tool is called in the terminal a new instance is generated and the old one is destroyed. probably, the data is also destroyed.
I need to keep, for a specific TTL, two things in cache/memory, even if the process ends:
the processed data
the specific instance of fs.watcher, watching and executing caching operations
the question is: how do i do it? I've been searching for days on the internet and trying alternatives and I can't find a solution.
I need to keep ... things in cache/memory, even if the process ends
That's, pretty much by definition, not possible. When a process terminates, all its resources are freed up for use by something else (barring a memory-leak bug in the OS itself).
It sounds like you need to refactor your app into a service that can run in the background and separate front-end that can communicate with it.

Cocoa OSX app hangs on dispatch_async

My app fires off a thread specifically for checking the status of a process. It fires every 5-10 seconds:
if(!monitorTask){
MYLog(100,#"Monitor task is dead");
return;
}
dispatch_async(monitorTask,^{ // fuckup here
MYLog(150,#"...Checking iTunes");
However it also seems to hang there every so often:
Any clues how to fix/catch this? The app is beachballing but "running" in Xcode just fine - stuck on this instruction:
Are you sure that monitorTask is of type dispatch_queue_t? See these Apple guides for an example of how to create a serial queue
Though I see in your comment that you are creating the queue correctly.
It is also possible that the queue may be deallocated before you can dispatch to it. You may need to perform some memory management on your queue to ensure that it isn't deallocated before you dispatch.
Lastly it is important to note that serial queues are by-and-large used to protect shared resources. If you are just trying to perform a periodic check on a resource (a task that would never write, and only read), then you are going to be better off using a concurrent queue (you should probably just use one of the 4 given global queues,
Or because you are seemingly perpetually checking throughout the lifetime of the application, you could even look into using a Dispatch Source, more specifically a Timer Dispatch Source
I think this may be an Xcode debugger problem, because I cannot replicate it while running the app outside of Xcode :-/

Scaling and selecting unique records per worker

I'm part of a project where we are having to deal with a lot of data in a stream. It's going to be passed to Mongo and from there it needs to be processed by workers to see if it needs to be persisted, amongst other things, or discarded.
We want to scale this horizontally. My question is, what methods are there for ensuring that each worker selects a unique record, that isn't already being processed by another worker?
Is a central main worker required to hand out jobs to the sub workers, if that is the case, the bottle neck and point of failure is with that central worker, right?
Any ideas or suggestions welcome.
Thanks!
Josh
You can use findAndModify to both select and flag a document atomically, making sure that only one worker gets to process it. My experience is that this can be slow due to excessive database locking, but that experience is based on MongoDB 2.x so it may not be an issue anymore on 3.x.
Also, with MongoDB it's difficult to "wait" for new jobs/documents (you can tail the oplog, but you'd have to do this from every worker and each one will wake up and perform the findAndModify() query, resulting in the aforementioned locking).
I think that ultimately you should consider using a proper messaging solution (write data to MongoDB, write the _id to the broker, have the workers subscribe to the message queue, and if you configure things properly only one worker will get a job). Well-known brokers are RabbitMQ, nsq.io and with a bit of extra work you can even use Redis.

Creating a Web Crawler using Windows Azure

I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.

Resources