Concurrency problem with Azure Blob Storage function - azure

Azure function is processing multiple files in blob storage at same time.
This is causing duplicate data creation in dynamics CRM because azure function is processing multiple file in parallel execution. Can someone help me , how i can restrict azure function to process one file at a time?

According to the section Trigger - concurrency and memory usage of the offical document Azure Blob storage bindings for Azure Functions, as the figure below.
The blob trigger uses a queue internally, so the maximum number of
concurrent function invocations is controlled by the queues
configuration in host.json. The default settings limit concurrency to
24 invocations. This limit applies separately to each function that
uses a blob trigger.
So you can follow the content of the host.json template file as below to set the queues.batchSize value to 1 to restrict Azure Function with Blob Trigger to process one file per time.
As references, there are two similar SO threads which you can also refer to.
Azure Functions - Limiting parallel execution
Throttling Azure Storage Queue processing in Azure Function App

Related

Azure Blob Triggers sometime taking too much time to get triggered

I am using App service plan for azure function, and have added blob triggers but when any file is uploaded to blob container,Functions are not triggering .or sometime its taking too much time , then after it start triggering.
Any suggestion will be appreciated
It should trigger the function as and when new files is uploaded to blob container.
This should be the case of cold-start
As per the note here
When you're using a blob trigger on a Consumption plan, there can be
up to a 10-minute delay in processing new blobs. This delay occurs
when a function app has gone idle. After the function app is running,
blobs are processed immediately. To avoid this cold-start delay, use
an App Service plan with Always On enabled, or use the Event Grid
trigger.
For your case, you need to consider Event-Grid trigger instead of a blob trigger, Event trigger has the built-in support for blob-events as well.
Since you say that you are already running the functions on an App Service plan, it's likely that you don't have the Always On setting enabled. You can do this on the App from the Application Settings -> General Settings tab on the portal:
Note that Always On is only applicable to Az Functions bound to an App Service plan - it isn't available on the serverless Consumption plan.
Another possible cause is if you don't clear the blobs out of the container after you process it.
From here:
If the blob container being monitored contains more than 10,000 blobs (across all containers), the Functions runtime scans log files to watch for new or changed blobs. This process can result in delays. A function might not get triggered until several minutes or longer after the blob is created.
And when using the Consumption Plan, here's another link warning about the potential of delays.

How can I find the source of my Hot LRS Write Operations on Azure Storage Account?

We are using an Azure Storage account to store some files that shall be downloaded by our app on the users demand.
Even though there should be no write operations (at least none I could think of), we are exceeding the included write operations just some days into the billing period (see image).
Regarding the price it's still within limits, but I'd still like to know whether this is normal and how I can analyze the matter. Besides the storage we are using
Functions and
App Service (mobile app)
but none of them should cause that many write operations. I've checked the logs of our functions and none of those that access the queues or the blobs have been active lately. There are are some functions that run every now and then, but only once every few minutes and those do not access the storage at all.
I don't know if this is related, but there is a kind of periodic ingress on our blob storage (see the image below). The period is roundabout 1 h, but there is a baseline of 100 kB per 5 min.
Analyzing the metrics of the storage account further, I found that there is a constant stream of 1.90k transactions per hour for blobs and 1.3k transactions per hour for queues, which seems quite exceptional to me. (Please not that the resolution of this graph is 1 h, while the former has a resolution of 5 minutes)
Is there anything else I can do to analyze where the write operations come from? It kind of bothers me, since it does not seem as if it's supposed to be like that.
I 've had the exact same problem; after enabling Storage Analytics and inspecting the $logs container I found many log entries that indicate that upon every request towards my Azure Functions, these write operations occur against the following container object:
https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease
In my Azure Functions code I do not explicitly write in any of container or file as such but I have the following two Application Settings configured:
AzureWebJobsDashboard
AzureWebJobsStorage
So I filled a support ticker in Azure with the following questions:
Are the write operation triggered by these application settings? I
believe so but could you please confirm.
Will the write operation stop if I delete these application settings?
Could you please describe, in high level, in what context these operations occur (e.g. logging? resource locking, other?)
and I got the following answers from Azure support team, respectively:
Yes, you are right. According to the logs information, we can see “https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease”.
This azure-webjobs-hosts folder is associated with function app and it’s created by default as well as creating function app. When function app is running, it will record these logs in the storage account which is configured with AzureWebJobsStorage.
You can’t stop the write operations because these operations record necessary logs to storage account used by Azure Functions runtime. Please do not remove application setting AzureWebJobsStorage. The Azure Functions runtime uses this storage account connection string for all functions except for HTTP triggered functions. Removing this Application Settings will cause your function app unable to start. By the way, you can remove AzureWebJobsDashboard and it will stop Monitor rather than the operation above.
These operations is to record runtime logs of function app. These operations will occur when our backend allocates instance for running the function app.
Best place to find information about storage usage is to make use of Storage Analytics especially Storage Analytics Logging.
There's a special blob container called $logs in the same storage account which will have detailed information about every operation performed against that storage account. You can view the blobs in that blob container and find the information.
If you don't see this blob container in your storage account, then you will need to enable storage analytics on your storage account. However considering you can see the metrics data, my guess is that it is already enabled.
Regarding the source of these write operations, have you enabled diagnostics for your Functions and App Service? These write diagnostics logs to blob storage. Also, storage analytics is also writing to the same account and that will also cause these write operations.
For my case, I have a Azure App Insight which took 10K transactions on its storage per mintues for functions and app services, even thought there are only few https requests among them. I'm not sure what triggers them, but once I removed app insights, everything becomes normal.

Scalable Azure Function with blob trigger

I made an Azure Function on a Consumption Plan with a blob trigger. Then I add lots of files to the blob and I expect the Azure Function to be invoked every time a file is added to the trigger.
And because I use Azure Function and Consumption Plan, I would expect that there is no scalability problem, right? WRONG.
I can easily add files to the blob faster than the Azure Function can process them. Hundred users can add to the blob but there seems to be only one instance of the Azure Function working at any one time. Meaning it can easily fall behind.
I thought the platform would just create more instances of the Azure Function as needed. Well, it seems it doesn't.
Any advice how I can configure my Azure Function to be truly scalable with a blob trigger?
This is because you are affecting with cold-start
As per the note here
When you're using a blob trigger on a Consumption plan, there can be
up to a 10-minute delay in processing new blobs. This delay occurs
when a function app has gone idle. After the function app is running,
blobs are processed immediately. To avoid this cold-start delay, use
an App Service plan with Always On enabled, or use the Event Grid
trigger.
For your case, you need to consider Event-Grid trigger instead of a blob trigger, Event trigger has the built-in support for blob-events as well.
When to consider Event Grid?
Use Event Grid instead of the Blob storage trigger for the following scenarios:
Blob storage accounts
High scale
Minimizing cold-start delay
Read more here
Update in 2020
Azure Function has a new tier/plan called a premium where you can avoid the Cold Start
E.g, YouTube Video

What is the mechanism that prevents a scaled out Azure Function trigger by the same blob multiple times

Scenario:
An azure function hosted on an app service plan and scaled out to 5 instances. The Azure function is triggered by Blob.
Question:
Is there any documentation that explains the mechanism that prevents a Scaled out Azure Function process the same blob multiple times? I am asking because there is more than one instance of the function is running.
Agree with#Peter, here are my understandings for references, correct me if it doesn't make sense.
Blob trigger mechanism related info is stored in the Azure storage account for our Function app (defined by the app setting AzureWebJobsStorage). Locks locate in a blob container named azure-webjobs-hosts and there's a queue azure-webjobs-blobtrigger-<FunctionAppName> for internal use.
See another part in the same comment.
Normally only 1 of N host instances is scanning for new blobs (based on a singleton host id lock). When it finds a new blob it adds a queue message for it and one of the N hosts processes it.
So in the first step--scanning for new blobs, scale out feature doesn't participate. The singleton host id lock is implemented by blob lease as #Peter mentioned (check blob locks/<FunctoinAppName>/host in azure-webjobs-hosts).
Once internal queue starts receiving messages of new blobs, scale out feature begins to work as host instances fetch and process messages together. When a blob message is being processed it can't be seen by other instances and would be deleted later.
Besides, to ensure that blob processed never triggers function later(e.g. in next turn of scanning), another mechanism is blob receipts.
As far as I can tell blob leases are used.
It is backed by this comment made by a MS engineer working on the Azure Functions team.
The singleton mechanism used under the covers to ensure only one host processes a blob is based on the HostId. In regular scale out scenarios, the HostId is the same for all instances, so they collaborate via blob leases behind the scenes using the same lock blob scoped to the host id.

Azure EventHubs EventProcessorHost tries to acess Azure storage queue

After enabling app insights on a webjobs which listens for events on an EventHub using the EventProcessor class, we see that it tries continuously to access a set of non-existing queues in the configured blob storage account. We have not configured any queues on this account.
There's no reference to a queue anywhere in my code, and it is my understanding that the EventProcessorHost uses blob storage and not queues in order to maintain state. So: Why is it trying to access queues?
The queue access that you're seeing comes from the JobHost itself, not from any specific trigger type like EventHubs. The WebJobs SDK uses some storage resources itself behind the scenes for its own operation, e.g. control queues to track its own work, blobs for storage of log information shown in the Dashboard, etc.
In the specific case you mention above, those control queues that are being accessed are part of our Dashboard Invoke/Replay/Abort support. We have an open issue here in our repo tracking potential improvements we can make in this area. Please feel free to chime in on that issue.

Resources