Azure Storage Queue - Retrieving hidden messages - azure

Is there a way to retrieve azure storage queue messages that are hidden? Background - I have been searching for an app/cmdlet/third party tool that would let me backup the entire queue including hidden messages (for troubleshooting purposes) but unable to find one.
I have also considered writing a powershell script to download all messages, but couldn't find a way to retrieve hidden ones.
Help will be greatly appreciated!

While I don't know if such a tool exists for Azure Storage Queues, have you considered Azure Service Bus Topics and Subscriptions for your queueing system? Under a topic and subscription model, you can set up the following architecture:
[Topic] Place messages on this queue. They get replicated to each subscription.
[Subscription1] Your backup process reads this queue and persists messages.
[Subscription2] Your application reads from this queue for normal operation.
This has a few benefits:
it decouples your backup and production systems, making it less likely that, for example, a faulty backup script ends up impacting production behavior
Locked ("hidden") messages apply only to the given subscription, so your backup queue will never have to deal with a message that is hidden or locked by the production queue.
Similar setups can certainly be achieved using storage queues, but Azure Service Bus has this sort of behavior built in.

Simple answer is that you can't download all messages from a queue. Messages that are hidden are hidden from all other callers including any 3rd party apps so you can't read those messages other than from the application which made them hidden in the 1st place.

You mention the reason for wanting to backup the queue as being for troubleshooting problems, depending on where your issues lie it might be worth taking at look at Azure Storage's Analytics capabilities. The logging infrastructure actually allows you to log every single transaction and greatly simplifies many troubleshooting scenarios. Take a look here for more information: http://blogs.msdn.com/b/windowsazurestorage/archive/tags/analytics+2d00+logging+_2600_amp_3b00_+metrics/.

Related

How does the Snowpipe cloud messaging mechanism exactly work on Azure?

I've successfully integrated Snowpipe with a container inside the Azure storage and loaded data into my target table, but now I can't exactly figure out how does Snowpipe actually works. Also, please let me know if there is already a good resource that answers this question, I'd be very grateful.
In my example, I tested a Snowpipe mechanism that uses cloud messaging. So, from my understanding, when a file is uploaded into an Azure container, Azure Event Grid sends an event message to an Azure queue, from which Snowpipe is notified that a new file is uploaded into the container. Then, Snowpipe in the background starts its loading process and imports the data into a target table.
If this is correct, I don't understand how does Azure queue informs Snowpipe about uploaded files. Is this connected to the "notification integration" inside Snowflake? Also, I don't understand what does it mean when they say on the Snowflake page that "Snowpipe copies the files into a queue, from which they are loaded into the target table...". Is this an Azure queue or some Snowflake queue?
I hope this question makes sense, any help or detailed explanation of the whole process is appreciated!
You've pretty much nailed it. to answer your specific questions... (and don't feel bad about them, this is definitely confusing)
how does Azure queue informs Snowpipe about uploaded files? Is this connected to the "notification integration" inside Snowflake?
Yes, this is the notification integration. But Azure is not "informing" the Snowpipe, it's the other way around. The Azure queue creates a notification that various other applications can subscribe to (this has no awareness of Snowflake). The notification integration on the snowflake side is snowflake's way to integrate with these external notifications
Snowpipe's queueing
Once snowflake recieves one of these notifications it puts that notification into a snowflake-side queue (or according to that page, the file itself. I was surprised by this, but the end result is the same). Snowpipes are wired up to that notification integration (as part of the create statement). The files are directed to the appropriate snowpipe based on the information in the "Stage" (also as part of the pipe create statement. I'm actually not certain if this part is a push or a pull). Then it runs the COPY INTO on that file.

Best approach for using azure service bus with azure function for ERP order import

I’m in need of some second opinions and guidance on how to use Azure Functions in combination with Azure Service Bus in the scenario described below. Coding is not an issue its about selecting the most appropriate method. Sadly, I have not found any good example of this online so now I’m reaching out for some help.
Scenario
I have an ecommerce customer that is sending a few thousand orders a day to an ERP system. The normal day operations are not an issue, but we would like to make the solution more robust to handle for example “Black Friday” surges. Currently the website can hold x amount of orders before that database is full and is forced to close or send order downstream. Currently the website sends order directly into the ERP system and it is this part I want to decouple with Azure Server Bus Queues. With this decoupling we can continue pushing new orders to the queue and consuming these at our own pace in the ERP without flooding any system.
My thoughts about how to set this up
The website can send messages directly to the Service Bus Queue. An Azure Function is bound to trigger on every new message in the queue and will send that message to the ERP system.
Same as above however the website first sends a message to an Azure Function that puts it into the queue.
The website sends messages to the queue like in point 1 or 2. Instead of binding a function to the queue we setup a scheduled function. The function will run frequently and send 1 message to the ERP system per run.
The website sends messages to the queue like in point 1 or 2. Here we do not send messages to the ERP system but instead the ERP system is the one who reads the queue. Do not like this approach but its possible to do and easy to administrate by ERP users.
Questions
If I go with point 1 or 2 above should the function responsible for delivering the message to the ERP system send 1 or multiple orders per trigger?
If I go with point 1 or 2 it should still be possible to flood the ERP system since they most likely trigger at the same time they get put in?
If the ERP system is down and the queue grows, do I need a separate scheduled function to handle the queue until it is empty?
We do not have to discuss the dead letter queue here, that is another topic.
How would you approach this or if you have done a similar solution what method did you use?
Thank you for your guidance much appriciated!
We've learned a lot in the past couple of years working with Azure Functions and Service Bus to solve similar scenarios you mentioned above. You're definitely on the right track regards to wanting to decouple in case of a surge. To give you some peace of mind with your choice to use Azure Service Bus, we normally push hundreds of events a minute through our topics and subscriptions and it holds up pretty well.
Let me just share some of the lessons we learned:
The concurrent number of incoming requests within the same second
was one of our breaking points. The website when written properly
will easily accommodate multiple incoming requests but we learned
about "port exhaustion" related to outbound web requests to our
Azure Function. Review the scope and lifetime of your web client and
the limits of your app service plan / web server.
If you choose to use a consumption plan for your Azure Function, be
aware that it sometimes takes a long time to start. Whatever is
hitting the function will have to implement retries (probably a good
practice anyway).
A Service Bus Message has a size limit (which can be increased, but
there's still a limit). We randomly hit it with one of our payloads
that contains a bulk of information. Know the worst case scenario
payload size you may encounter.
In the event something goes wrong and there are tens of thousands of
messages in the queue, there is no easy way to query what's in
there. Make sure you're fine with that otherwise consider
doing fast writes into a database that can be queried.
The Azure Function can be triggered by Service Bus and can spawn
multiple concurrent executions of the code (which is desired) and
with a limit. Be aware of any limitations with code to update your
ERP. You will have no control over Service Bus triggers.
Be conscious about the function's storage account, functions with
same name will have their trigger settings and locks stepping on
each other (dev vs. prod environment).
Connections to Azure Service Bus will sometimes fail, just the
nature of services hosted in the cloud. It only happens a few times
and recovers after a few seconds.
Consider doing this:
Website -> Azure API Management Gateway -> Azure Function A -> Service Bus -> Azure Function B -> ERP
Azure API Management with AppInsights enabled is a nice extra layer allowing you to secure, monitor, and route to your Function A. In cases where you need to route incoming requests to some emergency bucket it's a life saver.
Consider allowing function 1 to accept an array of your items. Enable AppInsights, add code for telemetry, providing preview of throughput in terms of orders.
Function B with a configurable timer trigger and some app configuration for number of messages to process from the queue. Allows you to throttle flow of data to your ERP. This may be debatable as you won't be able to scale this function out with multiple instances, but I'm assuming the original concern was to control the pace. Also enable same AppInsights, telemetry, logging, etc.
I'm hoping I don't draw too much criticism from this. We learned the hard way and eventually received some really good guidance from Azure architects and engineers later.

Difference between EventHub and Topic in Azure

I am building an application that needs to be able to create events, publish them, and have multiple consumers. The tutorials I have found so far suggest that Azure Topics are the right thing to use for this (multiple publishers and multiple subscribers), but I noticed an option in my Azure portal for EventHub and it seems like a highly scalable solution that may be a newer implementation of Pub/Sub for Azure. I have been searching for documentation comparing the two and haven't really found anything. Can someone explain why I would choose one of these solutions over the other.
The scenario I have is many clients in a Multi-Tenant application may create events at any time, those events need to be published to "n" subscribers for consumption. The subscribers need to be able to change without any change to the application (i.e. subscribers should be able to subscribe themselves to events without modifying publisher code).
Thanks for the help.
Event Hubs have below advantage as compared with Azure ServiceBus:
Its client-side cursor: Developers can use a client-side pointer, or
offset, to retrieve messages from a specific point in the event
stream
Partitioned consumer support: Throughput Unit and inbound messages
can be targeted at a specific partition through a user-defined
partition key value.
Significant time-based retention options

Multiple Instances of Azure Worker Roles for non-transaction integration tasks

We have an upcoming project where we'll need to integrate with 3rd parties over a variety of transports to get data from them.
Things like WCF Endpoints & Web API Rest Endpoints are fine.
However in 2 scenario's we'll need to either pick up auto-generated emails containing xml from a pop3 account OR pull the xml files from an External SFTP account.
I'm about to start prototyping these now, but I'm wondering are there any standard practices, patterns or guidelines about how to deal with these non-transactional systems, in a multi-instance worker role environment. i.e.
What happens if 2 workers connect to the pop account at the same time or the same FTP at the same time.
What happens if 1 worker deletes the file from the FTP while another is in mid-download.
Controlling duplication shouldn't be an issue, as we'll be logging everything on application side to a database, and everything should be uniquely identifiable so we'll be able to add if-not-exists-create-else-skip logic to the workers but I'm just wondering is there anything else I should be considering to make it more resilient/idempotent.
Just thinking out loud, since the data is primarily files and emails one possible thing you could do is instead of directly processing them via your worker roles first thing you do is save them in blob storage. So there would be some worker role instances which will periodically poll the POP3 server / SFTP site and pull the data from the there and push them in blob storage. When the blob is written, same instance can delete the data from the source as well. With this approach you don't have to worry about duplicate records because blob will be overwritten (assuming each message/file has a unique identifier and the name of the blob is that identifier).
Once the file is in your blob storage, you can write a message in a Windows Azure Queue which has details about this blob (may be blob URL etc.). Then using 'Get' semantics of Windows Azure Queues, your worker role instances start fetching and processing these messages. Because of Get semantic, once a message is fetched from the queue it becomes invisible to other callers (worker roles instances in this case). This way you could take care of duplicate message processing.
UPDATE
So I'm trying to combat against two competing instances pulling the same file at the same moment from the SFTP
For this, I'll pitch my favorite Master/Slave Concept:). Essentially the idea is that each instance will try to acquire a lease on a single blob. The instance which acquires the lease becomes the master and others slave. Master would fetch the data from SFTP while slaves will wait. I've described this concept in my blog post which you can read here: http://gauravmantri.com/2013/01/23/building-a-simple-task-scheduler-in-windows-azure/, though the context of the blog is somewhat different.
have a look the recently released Cloud Design Patterns. you might be able to find the corresponding pattern and sample code for what you need.

Windows Azure Inter-Role communication

I want to create an Azure application which does the following:
User is presented with a MVC 4 website (web role) which shows a list of commands.
When the user selects a command, it is broadcast to all worker roles.
Worker roles process the task, store the results and notify web role
Web role displays the combined results of the worker roles
From what I've been reading there seem to be two ways of doing this: the Windows Azure Service Bus or using Queues. Each worker role also stores the results in the database.
The Service Bus seems more appropriate with its publish/subscribe model, so all worker roles would get the same command and roughly the same time. Queues seem easier to use though.
Can the service bus be used locally with the emulator when developing? I am using a free trial and cannot keep the application constantly whilst still developing. Also, when using queues how can you notify the web role that processing is complete?
I agree. ServiceBus is a better choice for this messaging requirement. You could, with some effort, do the same with queues. But, you'll be writing a lot of code to implement things that the ServiceBus already gives you.
There is not a local emulator for ServiceBus like there is for the Azure Strorage service (queues/tables/blobs). However, you could still use the ServiceBus for messaging between roles while they are running locally in your development environment.
As for your last question about notifying the web role that processing is complete, there are a several ways to go here. Just a few thoughts (not exhaustive list)...
Table storage where the web role can periodically check the status of the unit of work.
Another ServiceBus Queue/topic for completed work.
Internal endpoints. You'll have to have logic to know if it's just an update from worker role N or if it is indicating a completed unit of work for all worker roles.
I agree with Rick's answer, but would also add the following things to think about:
If you choose the Service Bus Topic approach then as each worker role comes online it would need to generate a subscription to the topic. You'll need to think about subscription maintenance of when one of the workers has a failure and is recycled, or any number of reasons why a subscription may be out there.
Telling the web role that all the workers are complete is interesting. The options Rick provides are good ones, but you'll need to think about some things here. It means that the web role needs to know just how many workers are out there or some other mechanism to decide when all have reported done. You could have the situation of five worker roles receieving a message and start working, then one of them starts to repeatedly fail processing. The other four report their completion but now the web role is waiting on the fifth. How long do you wait for a reply? Can you continue? What if you just told the system to scale down and while the web role thinks there are 5 there is now only 4. These are things you'll need to to think about and they all depend on your requirements.
Based on your question, you could use either queue service and get good results. But each of them are going to have different challenges to overcome as well as advantages.
Some advantages of service bus queues is that it provides blocking receipt with a persistent connection (up to 100 connections), it can monitor messages for completion, and it can send larger messages (256KB).
Some advantages of storage queues over the service bus solution is that it's slightly faster (if 15 ms matters to you), you can use a single storage system (since you'll probably be using Storage for blob and table services anyways), and simple auto-scaling. If you need to auto-scale your worker roles based on the load, passing the the requests through a storage queue makes auto-scaling trivial -- you just setup auto-scaling in the Azure Cloud Service UI under the scale tab.
A more in-depth comparison of the two azure queue services can be found here: http://msdn.microsoft.com/en-us/library/hh767287.aspx
Also, when using queues how can you notify the web role that processing is complete?
For the Azure Storage Queues solution, I've written a library that can help: https://github.com/brentrossen/AzureDistributedService.
It provides a proxy layer that facilitates RPC style communication from web roles to worker roles and back through Storage Queues.

Resources