Basically I have a storage account with a containers that contain blobs of unhandled errors. My task is to somehow generate a metric that will be able to show how many blobs were uploaded to that container every hour. I tried using the Azure built in metrics, but it seems like that might limit me to the entire storage account and not just one container. I did some research on Power BI and thought that might be a good place to start, but again I came up empty.
If anyone has a good starting place for me, that would be incredible. I'm assuming that this will end up being something that requires some SQL queries, or perhaps something I can do programatically in Visual Studio. Apologies if this was posted in the wrong place, but it seemed like the best fit from my opinion.
Thanks!
You should take a look at Azure Event Grid with Blob Storage Integration. In short, whenever a blob is created, an event will be raised by Azure Event Grid. You can consume this event and post the event data to an HTTP endpoint (or call an Azure Function) which can save this information about this event in some persistent storage (Azure Tables for example). You can then create reports by querying this data.
For more information about this, you may find this link helpful: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-event-overview.
Related
I've successfully integrated Snowpipe with a container inside the Azure storage and loaded data into my target table, but now I can't exactly figure out how does Snowpipe actually works. Also, please let me know if there is already a good resource that answers this question, I'd be very grateful.
In my example, I tested a Snowpipe mechanism that uses cloud messaging. So, from my understanding, when a file is uploaded into an Azure container, Azure Event Grid sends an event message to an Azure queue, from which Snowpipe is notified that a new file is uploaded into the container. Then, Snowpipe in the background starts its loading process and imports the data into a target table.
If this is correct, I don't understand how does Azure queue informs Snowpipe about uploaded files. Is this connected to the "notification integration" inside Snowflake? Also, I don't understand what does it mean when they say on the Snowflake page that "Snowpipe copies the files into a queue, from which they are loaded into the target table...". Is this an Azure queue or some Snowflake queue?
I hope this question makes sense, any help or detailed explanation of the whole process is appreciated!
You've pretty much nailed it. to answer your specific questions... (and don't feel bad about them, this is definitely confusing)
how does Azure queue informs Snowpipe about uploaded files? Is this connected to the "notification integration" inside Snowflake?
Yes, this is the notification integration. But Azure is not "informing" the Snowpipe, it's the other way around. The Azure queue creates a notification that various other applications can subscribe to (this has no awareness of Snowflake). The notification integration on the snowflake side is snowflake's way to integrate with these external notifications
Snowpipe's queueing
Once snowflake recieves one of these notifications it puts that notification into a snowflake-side queue (or according to that page, the file itself. I was surprised by this, but the end result is the same). Snowpipes are wired up to that notification integration (as part of the create statement). The files are directed to the appropriate snowpipe based on the information in the "Stage" (also as part of the pipe create statement. I'm actually not certain if this part is a push or a pull). Then it runs the COPY INTO on that file.
I have the following requirements, where I consider using Azure LogicApp:
Files placed in Azure Blob Storage must be migrated into a custom place (it can be different from case to case)
Amount of files is something about 1 000 000
When the process is over, we should have a report saying how many records (files) failed
If the process stopped somewhere in the middle, the next run must take only files that have not been migrated
The process must be fast as it can be and files must be migrated within N hours
But what makes me worried is the fact that I cannot find any examples or articles (including official Azure Documentation) where the same thing is achieved by Azure LogicApp.
I have some ideas about my requirements and Azure Logic App:
I think that I must use pagination for dealing with this amount of files because Azure Logic App will not be able to read millions of file names - https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-exceed-default-page-size-with-pagination
I can add a record into Azure Table Storage to track failed migrations (something like creating a record to say that the process started and updating it when the file is moved to the destination)
I have no ideas how I can restart the Azure Logic App without using a custom tracking mechanism (for instance it can be the same Azure Table Storage instance)
And the question about splitting the work across several units is still open
Do you think that Azure Logic App is the right choice for my needs or I should consider something else? If Azure LogicApp can work for me, could you please share your thoughts and ideas on how I can achieve the given requirements?
I don't think logic app is a good solution for you to implement the requirement because the amount of files is about 1000000, that's too much. For this requirement, I suggest you to use Azure Data Factory.
To migrate data in azure blob according data factory, you can refer to this document
We are using an Azure Storage account to store some files that shall be downloaded by our app on the users demand.
Even though there should be no write operations (at least none I could think of), we are exceeding the included write operations just some days into the billing period (see image).
Regarding the price it's still within limits, but I'd still like to know whether this is normal and how I can analyze the matter. Besides the storage we are using
Functions and
App Service (mobile app)
but none of them should cause that many write operations. I've checked the logs of our functions and none of those that access the queues or the blobs have been active lately. There are are some functions that run every now and then, but only once every few minutes and those do not access the storage at all.
I don't know if this is related, but there is a kind of periodic ingress on our blob storage (see the image below). The period is roundabout 1 h, but there is a baseline of 100 kB per 5 min.
Analyzing the metrics of the storage account further, I found that there is a constant stream of 1.90k transactions per hour for blobs and 1.3k transactions per hour for queues, which seems quite exceptional to me. (Please not that the resolution of this graph is 1 h, while the former has a resolution of 5 minutes)
Is there anything else I can do to analyze where the write operations come from? It kind of bothers me, since it does not seem as if it's supposed to be like that.
I 've had the exact same problem; after enabling Storage Analytics and inspecting the $logs container I found many log entries that indicate that upon every request towards my Azure Functions, these write operations occur against the following container object:
https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease
In my Azure Functions code I do not explicitly write in any of container or file as such but I have the following two Application Settings configured:
AzureWebJobsDashboard
AzureWebJobsStorage
So I filled a support ticker in Azure with the following questions:
Are the write operation triggered by these application settings? I
believe so but could you please confirm.
Will the write operation stop if I delete these application settings?
Could you please describe, in high level, in what context these operations occur (e.g. logging? resource locking, other?)
and I got the following answers from Azure support team, respectively:
Yes, you are right. According to the logs information, we can see “https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease”.
This azure-webjobs-hosts folder is associated with function app and it’s created by default as well as creating function app. When function app is running, it will record these logs in the storage account which is configured with AzureWebJobsStorage.
You can’t stop the write operations because these operations record necessary logs to storage account used by Azure Functions runtime. Please do not remove application setting AzureWebJobsStorage. The Azure Functions runtime uses this storage account connection string for all functions except for HTTP triggered functions. Removing this Application Settings will cause your function app unable to start. By the way, you can remove AzureWebJobsDashboard and it will stop Monitor rather than the operation above.
These operations is to record runtime logs of function app. These operations will occur when our backend allocates instance for running the function app.
Best place to find information about storage usage is to make use of Storage Analytics especially Storage Analytics Logging.
There's a special blob container called $logs in the same storage account which will have detailed information about every operation performed against that storage account. You can view the blobs in that blob container and find the information.
If you don't see this blob container in your storage account, then you will need to enable storage analytics on your storage account. However considering you can see the metrics data, my guess is that it is already enabled.
Regarding the source of these write operations, have you enabled diagnostics for your Functions and App Service? These write diagnostics logs to blob storage. Also, storage analytics is also writing to the same account and that will also cause these write operations.
For my case, I have a Azure App Insight which took 10K transactions on its storage per mintues for functions and app services, even thought there are only few https requests among them. I'm not sure what triggers them, but once I removed app insights, everything becomes normal.
We have an Azure Stream Analytics configured with input data configured from an Event Hub and output is configured to be written to Table Storage.
The even hub gets the messages from the API (custom code written within the API to send messages to eventhub.
The issue which we are facing is Stream Analytics is getting stopped once in a while and we don't have any trace of error. We are clueless about the reason for the failure. Definitely the input format might be incorrect. Is there any way we can see the messages that is present in the eventhub?
In case you have the chance to use blob containers within your storage account, there might be two possible solutions to begin with:
Configure "Diagnostics logs" for the Stream Analytics component and log everything to a blob container within your storage account. You can specifically activate three different settings: Execution, Authoring and AllMetrics. The corresponding blob containers will be created automatically by Stream Analytics. I was able to find errors within the Execution container in my storage account in the past.
You can define an Input (named for example Raw-Storage-Input) retrieving all Event Hub messages and write its data into a Blob Container Output (named 'Raw-Storage-Output') within your storage account by doing something along the lines of SELECT * INTO [Raw-Storage-Output] FROM [Raw-Storage-Input];. By doing this you might be able to even write the faulty messages to your blob container before Stream Analytics fails. However, this might not always work reliably.
There are probably more sophisticated ways I'm not aware of, but these options provided some help in the past for me.
I'm writing a small app that reads and writes from Azure Blob Storage (Images, documents, etc.)
I need to implement some logging that will log activities such as:
file uploaded
File deleted
File updates
etc.
So, basically I need my log to look something like this:
User John Doe Create a container "containerName" on 2016-05-05
User Mike Smith removed a blob test.jpg
etc...
UserIds and other additional info will be sent through method.
Example: CreateImage(String CreatedBy)
Question:
What is the best way to store and create such type of logs? The easiest one is to have SQL database with table Audit and all necessary columns. But I know that Azure has Azure Diagnostics. Can that be used to store and query logs? For example, I will need to see all file manipulations by user, by date, etc.
I would go using one of these ways:
1) Azure Storage Tables for logs. Here, you may store everything you need regarding logs. Then, if you need a functionality to get/filter/etc, you may look into LINQ to Azure Tables or even LINQPad if you need the desktop-ready software. However, some design considerations should be taken into account - design guidance is here.
2) Application Insights. Using custom events functionality, you may go with the powerful logging and then, on the portal, see how it is going. You may attach some metadata to the custom event, and then aggregate/filter/see that using convenient web interface. Or connect log4net to the AI, if you want to stream logs to the AI. AI may export its logs into the Azure Storage continuously, so you may take that and dive into it later.
IMHO, i would not say that SQL Database is the appropriate store for logs - it looks like too much (in terms of resources, maybe price, etc) for me for storing the logs in the full-fledged DB. Not very relevant, but interesting reading about working with a lot of records.