Azure Event Grid: Find out if a blob is being overwritten - azure

I have wired an EventGridTrigger to Azure Functions and listen on Microsoft.Storage.BlobCreated changes. So far, it seems to be working fine. However, I have observed that multiple events will be triggered for the same blob if the client overwrites it. I need to do some server-side processing only once per blob creation. Is there any metadata available to us to see how many times a blob has been overwritten?
As a workaround, I'm thinking of saving blob URI to a Cosmos container as a primary key to see if it's ever been processed before, but this sounds like overkill for something this trivial.

Here are 3 possible solutions:
Store the blob uri to an Azure Table (same as your sugestion with Cosmos but cheaper)
Turn on blob versioning an check that it is the first version of the blob before processing. Turning on versioning has a cost.
Check the code that updates the blob. Is it possible to us the BlockBlobClient and only update the affected block? Will this avoid BlobCreated being triggered on updated.
I would start with checking if option 3 was possible. If not do option 1.

Related

Email notification about new files on Azure File share

What are possible ways to implement such scenario?
I can think of some Azure function which will periodically check the share for new files. Are there any other possibilities.
I have been thinking also about duplicating the files to Blob storage and generate the notifications from there.
Storage content trigger is by default available for blobs. If you look for migrating to blob storage, then you can utilise BlobTrigger Azure function. In case of file trigger in File Share, the below are my suggestions as requested:
A TimerTrigger Azure function that acts as a poll to check for new file in that time frame the previous trigger occured.
Recurrence trigger in logic app to poll and check for new contents.
A continuous WebJob to continuously poll the File Share checking for new contents.
In my opinion, duplicating the files to Blob storage and making your notification work may not be a great option, because such operation once again requires a polling mechanism which can be achieved with options like a few mentioned above, but is still unnecessary.

Azure Functions - Table Storage Trigger with Azure Functions

I need a way to trigger the Azure functions when an entity is added to the Azure Table storage. Is there a way to do this ? When I tried to add a new Azure function, I did not see any Azure Table storage trigger. I see there is Queue and Blob triggers available.
If there is no support for the Azure table storage trigger, then should I need to have a Http trigger and have the Azure Table storage as input binding ?
Thanks
There is no trigger binding for Table Storage.
Here's a detailed view on what is supported by the different bindings available today:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings#overview
If there is no support for the Azure table storage trigger, then should I need to have a Http trigger and have the Azure Table storage as input binding ?
Yes, this approach would work and would allow you to pass the table data as an input while relying on a separate trigger. Depending on the type of clients you are working with, and your requirements, using a queue trigger is also another good option.
#venki What the Fabio Cavalcante said to you is really true. Azure Function doesn't have a trigger option for Storage Table. But, whether your business needs store the data into the Storage Table and you as a Developer decide to use Azure Function into your architecture, you're able to configure you Function to use data that will come from Storage Table as a Input to your Function! This works really well.
But, There is another way to configure your Function to have "automagically" trigger, using Storage Queue (for small business) or Service Bus (for a business that needs a mechanism more robust)

Azure Data Factory Only Retrieve New Blob files from Blob Storage

I am currently copying blob files from an Azure Blob storage to an Azure SQL Database. It is scheduled to run every 15 minutes but each time it runs it repeatedly imports all blob files. I would rather like to configure it so that it only imports if any new files have arrived into the Blob storage. One thing to note is that the files do not have a date time stamp. All files are present in a single blob container. New files are added to the same blob container. Do you know how to configure this?
I'd preface this answer with a change in your approach may be warranted...
Given what you've described your fairly limited on options. One approach is to have your scheduled job maintain knowledge of what it has already stored into the SQL db. You loop over all the items within the container and check if it has been processed yet.
The container has a ListBlobs method that would work for this. Reference: https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/
foreach (var item in container.ListBlobs(null, true))
{
// Check if it has already been processed or not
}
Note that the number of blobs in the container may be an issue with this approach. If it is too large consider creating a new container per hour/day/week/etc to hold the blobs, assuming you can control this.
Please use CloudBlobContainer.ListBlobs(null, true, BlobListingDetails.Metadata) and check CloudBlob.Properties.LastModified for each listed blob.
Instead of a copy activity, I would use a custom DotNet activity within Azure Data Factory and use the Blob Storage API (some of the answers here have described the use of this API) and Azure SQL API to perform your copy of only the new files.
However, with time, your blob location will have a lot of files, so, expect that your job will start taking longer and longer (after a point taking longer than 15 minutes) as it would iterate through each file every time.
Can you explain your scenario further? Is there a reason you want to add data to the SQL tables every 15 minutes? Can you increase that to copy data every hour? Also, how is this data getting into Blob Storage? Is another Azure service putting it there or is it an external application? If it is another service, consider moving it straight into Azure SQL and cut out the Blob Storage.
Another suggestion would be to create folders for the 15 minute intervals like hhmm. So, for example, a sample folder would be called '0515'. You could even have a parent folder for the year, month and day. This way you can insert the data into these folders in Blob Storage. Data Factory is capable of reading date and time folders and identifying new files that come into the date/time folders.
I hope this helps! If you can provide some more information about your problem, I'd be happy to help you further.

Cold storage/WORM on Azure?

I have a requirement that states certain data in my application must be write once, read many. Is there a construct or best practice approach on Azure that would facilitate this?
We do not have any best practice on this topic but I can think of two options which might work:
Option 1: Use Append Block, a new blob type which is currently available with the newest storage service version. All writes to an Append Blob happen at the end of the blob. Updating and deleting existing blocks is not supported. To modify an Append Blob, you add blocks to the end of the blob via the new Append Block operation. Each appended block is accessible immediately. http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx has more information about Storage Append Blob.
Option 2: You can take snapshots of your blobs very frequently as snapshots are read-only copy of your original blobs. However, someone who has an authorized access to snapshots will be able to delete them.
We now have cold storage refer http://www.zdnet.com/article/microsoft-launches-cool-blob-azure-storage-at-1c-per-gb/. Hope this helps

automatic backups of Azure blob storage?

I need to do an automatic periodic backup of an Azure blob storage to another Azure blob storage.
This is in order to guard against any kind of malfunction in the software.
Are there any services which do that? Azure doesn't seem to have this
As #Brent mentioned in the comments to Roberto's answer, the replicas are for HA; if you deleted a blob, that delete is replicated instantly.
For blobs, you can very easily create asynchronous copies to a separate blob (even in a separate storage account). You can also make snapshots which capture a blob at a current moment in time. At first, snapshots don't cost anything, but if you start modifying the blocks/pages referred to by the snapshot, then new blocks/pages are allocated. Over time, you'll want to start purging your snapshots. This is a great way to keep data "as-is" over time and revert back to a snapshot if there's a malfunction in your software.
With queues, the malfunction story isn't quite the same, as typically you'd only have a small number of queue items present (at least that's the hope; if you have thousands of queue messages, this is typically a sign that your software is falling behind). In any event: You could, when writing queue messages, write your queue messages to blob storage, for archive purposes, in case there's a malfunction. I wouldn't recommend using blob- based messaging for scaling/parallel processing, since they don't have the mechanisms in place that queues do, but you could use them manually in case of malfunction.
There's no copy function for tables. You'd need to write to two tables during your write.
Azure keeps 3 redundant copies of your data in different locations in the same data centre where your data is hosted (to guard against hardware failure).
This applies to blob, table and queue storage.
Additionally, You can enable geo-replication on all of your storage. Azure will automatically keep redundant copies of your data in separate data centres. This guards against anything happening to the data centre itself.
See Here

Resources