How does the Snowpipe cloud messaging mechanism exactly work on Azure? - azure

I've successfully integrated Snowpipe with a container inside the Azure storage and loaded data into my target table, but now I can't exactly figure out how does Snowpipe actually works. Also, please let me know if there is already a good resource that answers this question, I'd be very grateful.
In my example, I tested a Snowpipe mechanism that uses cloud messaging. So, from my understanding, when a file is uploaded into an Azure container, Azure Event Grid sends an event message to an Azure queue, from which Snowpipe is notified that a new file is uploaded into the container. Then, Snowpipe in the background starts its loading process and imports the data into a target table.
If this is correct, I don't understand how does Azure queue informs Snowpipe about uploaded files. Is this connected to the "notification integration" inside Snowflake? Also, I don't understand what does it mean when they say on the Snowflake page that "Snowpipe copies the files into a queue, from which they are loaded into the target table...". Is this an Azure queue or some Snowflake queue?
I hope this question makes sense, any help or detailed explanation of the whole process is appreciated!

You've pretty much nailed it. to answer your specific questions... (and don't feel bad about them, this is definitely confusing)
how does Azure queue informs Snowpipe about uploaded files? Is this connected to the "notification integration" inside Snowflake?
Yes, this is the notification integration. But Azure is not "informing" the Snowpipe, it's the other way around. The Azure queue creates a notification that various other applications can subscribe to (this has no awareness of Snowflake). The notification integration on the snowflake side is snowflake's way to integrate with these external notifications
Snowpipe's queueing
Once snowflake recieves one of these notifications it puts that notification into a snowflake-side queue (or according to that page, the file itself. I was surprised by this, but the end result is the same). Snowpipes are wired up to that notification integration (as part of the create statement). The files are directed to the appropriate snowpipe based on the information in the "Stage" (also as part of the pipe create statement. I'm actually not certain if this part is a push or a pull). Then it runs the COPY INTO on that file.

Related

Capture SEE (Server-Sent Events) in Azure

I am having trouble to identify the best "tool" to solve the problem. I am using a python library which publishes to its data via Sever-Sent-Events (SSE) (see https://github.com/wattsight/wapi-python/blob/development/wapi/events.py).
I would like to constantly listen for new events. However, I am not sure which tool in Azure in appropriate. An Azure Function would have to run continously which seems like a misuse, SignalR requires control over the "sender" of events and I don't know if EventHub would be able to manage that job.
Thank you for letting my learn from your experience.
The azure Eventhub is the right service to receive new events. Besides that, it can also provide other benefits like scalability, events storage etc.
And you can also consider using azure function with eventhub trigger. But it has a limitation about the incoming message max size.

Monitor the amount of blobs entering into an Azure container

Basically I have a storage account with a containers that contain blobs of unhandled errors. My task is to somehow generate a metric that will be able to show how many blobs were uploaded to that container every hour. I tried using the Azure built in metrics, but it seems like that might limit me to the entire storage account and not just one container. I did some research on Power BI and thought that might be a good place to start, but again I came up empty.
If anyone has a good starting place for me, that would be incredible. I'm assuming that this will end up being something that requires some SQL queries, or perhaps something I can do programatically in Visual Studio. Apologies if this was posted in the wrong place, but it seemed like the best fit from my opinion.
Thanks!
You should take a look at Azure Event Grid with Blob Storage Integration. In short, whenever a blob is created, an event will be raised by Azure Event Grid. You can consume this event and post the event data to an HTTP endpoint (or call an Azure Function) which can save this information about this event in some persistent storage (Azure Tables for example). You can then create reports by querying this data.
For more information about this, you may find this link helpful: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview.

Troubleshooting Issues with the Stream Analytics

We have an Azure Stream Analytics configured with input data configured from an Event Hub and output is configured to be written to Table Storage.
The even hub gets the messages from the API (custom code written within the API to send messages to eventhub.
The issue which we are facing is Stream Analytics is getting stopped once in a while and we don't have any trace of error. We are clueless about the reason for the failure. Definitely the input format might be incorrect. Is there any way we can see the messages that is present in the eventhub?
In case you have the chance to use blob containers within your storage account, there might be two possible solutions to begin with:
Configure "Diagnostics logs" for the Stream Analytics component and log everything to a blob container within your storage account. You can specifically activate three different settings: Execution, Authoring and AllMetrics. The corresponding blob containers will be created automatically by Stream Analytics. I was able to find errors within the Execution container in my storage account in the past.
You can define an Input (named for example Raw-Storage-Input) retrieving all Event Hub messages and write its data into a Blob Container Output (named 'Raw-Storage-Output') within your storage account by doing something along the lines of SELECT * INTO [Raw-Storage-Output] FROM [Raw-Storage-Input];. By doing this you might be able to even write the faulty messages to your blob container before Stream Analytics fails. However, this might not always work reliably.
There are probably more sophisticated ways I'm not aware of, but these options provided some help in the past for me.

Which Azure service to use for processing data from Event Hub?

I would appreciate some help picking out the best suited Azure services for my scenario - I am just beginning with Azure services and my knowledge is pretty limited.
I have data from multiple sources, and of different shapes, coming into an Event Hub. I need to subscribe to the events from the Event Hub and, based on their format, process them and ultimately save them into an SQL Database. All components - events consumers, the SQL Database - need to be hosted in the cloud.
How would I implement this in an "Azure Orientated Architecture"?
In an off cloud application, I would have competing consumers subscribing to the Event Hub. They would be some console applications or Windows services, and each would be processing the events asynchronously (this is further simplified by the event processing being idempotent).
Ideally, the Azure equivalent of the above consumers would scale up and down automatically, so I would like to not have to use VMs that host console applications (where I would need to keep an eye on the VM's resources myself). Scaling and deployment wise they would have to behave like App Services, however I'm under the impression that those are just for web applications. I've also briefly looked at Web Jobs, but those seem to be polling data at various intervals, whereas I need a proper event subscriber that the Event Hub pushes data into.
Any help will be greatly appreciated!
Thank you.
Later Edit:
I've looked into Web Jobs and they do allow continuous
processing, hence looks like they can be used as automatically
scaling subscribers.
Ideally I would like to write the code for
the subscribers in F#. C# is the other option if that is not
available.
You can see my post regarding IoT Hub. Its basically the same for Event Hub.
(each of the examples in the post can be used on Event Hubs).
https://stackoverflow.com/a/38682324/6659347
In addition, For Event Hub you can also use Azure Function which has an Event Hub trigger - a function that will run whenever an event hub receive a new event. And it will also answer your requirement of scaling.
Make sure that if you are working with multiple consumers make use of the Event Hub Consumer Groups so each consumer can read the stream independently.
I'd say use a WebJob in combination with an EventProcessor. I wrote some demo code that can easily be transferred to a WebJob: https://github.com/DeHeerSoftware/SemanticLogging.EventHub/tree/master/SemanticLogging.EventHub.Processor
See https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/#receive-messages-with-eventprocessorhost for official documentation.
I've created a WebJob myself using this approach. Works like a charm.

Azure Storage Queue - Retrieving hidden messages

Is there a way to retrieve azure storage queue messages that are hidden? Background - I have been searching for an app/cmdlet/third party tool that would let me backup the entire queue including hidden messages (for troubleshooting purposes) but unable to find one.
I have also considered writing a powershell script to download all messages, but couldn't find a way to retrieve hidden ones.
Help will be greatly appreciated!
While I don't know if such a tool exists for Azure Storage Queues, have you considered Azure Service Bus Topics and Subscriptions for your queueing system? Under a topic and subscription model, you can set up the following architecture:
[Topic] Place messages on this queue. They get replicated to each subscription.
[Subscription1] Your backup process reads this queue and persists messages.
[Subscription2] Your application reads from this queue for normal operation.
This has a few benefits:
it decouples your backup and production systems, making it less likely that, for example, a faulty backup script ends up impacting production behavior
Locked ("hidden") messages apply only to the given subscription, so your backup queue will never have to deal with a message that is hidden or locked by the production queue.
Similar setups can certainly be achieved using storage queues, but Azure Service Bus has this sort of behavior built in.
Simple answer is that you can't download all messages from a queue. Messages that are hidden are hidden from all other callers including any 3rd party apps so you can't read those messages other than from the application which made them hidden in the 1st place.
You mention the reason for wanting to backup the queue as being for troubleshooting problems, depending on where your issues lie it might be worth taking at look at Azure Storage's Analytics capabilities. The logging infrastructure actually allows you to log every single transaction and greatly simplifies many troubleshooting scenarios. Take a look here for more information: http://blogs.msdn.com/b/windowsazurestorage/archive/tags/analytics+2d00+logging+_2600_amp_3b00_+metrics/.

Resources