Azure EventHubs EventProcessorHost tries to acess Azure storage queue - azure

After enabling app insights on a webjobs which listens for events on an EventHub using the EventProcessor class, we see that it tries continuously to access a set of non-existing queues in the configured blob storage account. We have not configured any queues on this account.
There's no reference to a queue anywhere in my code, and it is my understanding that the EventProcessorHost uses blob storage and not queues in order to maintain state. So: Why is it trying to access queues?

The queue access that you're seeing comes from the JobHost itself, not from any specific trigger type like EventHubs. The WebJobs SDK uses some storage resources itself behind the scenes for its own operation, e.g. control queues to track its own work, blobs for storage of log information shown in the Dashboard, etc.
In the specific case you mention above, those control queues that are being accessed are part of our Dashboard Invoke/Replay/Abort support. We have an open issue here in our repo tracking potential improvements we can make in this area. Please feel free to chime in on that issue.

Related

No Event Grid events triggering when uploading files to Azure Blob Storage -- why?

I set up a simple scenario in Azure using a Storage Account, a Function App, and an Event Grid System Trigger. Blob uploads into the Storage Account should cause the Event Grid System Trigger to send a BlobCreated event to trigger the Azure Function.
I can see that the Event Grid System Topic appears to be configured for the correct storage account according to the overview page in the Azure Portal:
I have a subscription created for the Event Grid System Topic, and it subscribes to all of the events the storage account can generate as I can see in the Azure Portal. This shows all 6 event types enabled, so I'm not filtering them out.
Despite this, when I upload blobs into a container I created in my storage account and watch for the events to show up in the metrics on my Event Grid System Topic, or see my Azure Function trigger, no events appear to ever be generated. Some interesting points about my storage account which may be worth mentioning are:
I am using a premium storage account
I am using a private vnet for my storage account
I suspected the network, but to rule that out I changed my storage account back to public and tried again but it didn't change the behavior. From everything I can tell from documentation, this should be working. Any ideas why it isn't?
I work at MS in the SDK team, and I reached to an EventGrid team member directly for opinion:
I looked into our service logs for last two weeks and I could not find
any events for this topic/event-subscription.
Can you please provide specific time and region when you are
uploading/deleting/editing the blobs to help investigating? Also, is
this specific to this storage account? Was this working before or this
scenario working for other storage accounts? Can you please open a
support ticket to handle this properly.
Thanks! In any doubt on the process, feel free to reply to me, we'll monitor this thread
[Edit: more info from Storage team]
We communicated with Azure Storage team and they confirmed that the behavior as described is by design and expected. Here are some additional details from Azure Storage Team:
The issue is that the customer is using a Premium_LRS StorageV2
account. These accounts only support premium page blobs and premium
disks.
If the customer wants to store block blobs in the premium tier, they
need to create a BlockBlobStorage account.
See subscript 5 in this table:
https://learn.microsoft.com/en-us/azure/storage/common/storage-account-overview

How can I find the source of my Hot LRS Write Operations on Azure Storage Account?

We are using an Azure Storage account to store some files that shall be downloaded by our app on the users demand.
Even though there should be no write operations (at least none I could think of), we are exceeding the included write operations just some days into the billing period (see image).
Regarding the price it's still within limits, but I'd still like to know whether this is normal and how I can analyze the matter. Besides the storage we are using
Functions and
App Service (mobile app)
but none of them should cause that many write operations. I've checked the logs of our functions and none of those that access the queues or the blobs have been active lately. There are are some functions that run every now and then, but only once every few minutes and those do not access the storage at all.
I don't know if this is related, but there is a kind of periodic ingress on our blob storage (see the image below). The period is roundabout 1 h, but there is a baseline of 100 kB per 5 min.
Analyzing the metrics of the storage account further, I found that there is a constant stream of 1.90k transactions per hour for blobs and 1.3k transactions per hour for queues, which seems quite exceptional to me. (Please not that the resolution of this graph is 1 h, while the former has a resolution of 5 minutes)
Is there anything else I can do to analyze where the write operations come from? It kind of bothers me, since it does not seem as if it's supposed to be like that.
I 've had the exact same problem; after enabling Storage Analytics and inspecting the $logs container I found many log entries that indicate that upon every request towards my Azure Functions, these write operations occur against the following container object:
https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease
In my Azure Functions code I do not explicitly write in any of container or file as such but I have the following two Application Settings configured:
AzureWebJobsDashboard
AzureWebJobsStorage
So I filled a support ticker in Azure with the following questions:
Are the write operation triggered by these application settings? I
believe so but could you please confirm.
Will the write operation stop if I delete these application settings?
Could you please describe, in high level, in what context these operations occur (e.g. logging? resource locking, other?)
and I got the following answers from Azure support team, respectively:
Yes, you are right. According to the logs information, we can see “https://[function-name].blob.core.windows.net:443/azure-webjobs-hosts/locks/linkfunctions/host?comp=lease”.
This azure-webjobs-hosts folder is associated with function app and it’s created by default as well as creating function app. When function app is running, it will record these logs in the storage account which is configured with AzureWebJobsStorage.
You can’t stop the write operations because these operations record necessary logs to storage account used by Azure Functions runtime. Please do not remove application setting AzureWebJobsStorage. The Azure Functions runtime uses this storage account connection string for all functions except for HTTP triggered functions. Removing this Application Settings will cause your function app unable to start. By the way, you can remove AzureWebJobsDashboard and it will stop Monitor rather than the operation above.
These operations is to record runtime logs of function app. These operations will occur when our backend allocates instance for running the function app.
Best place to find information about storage usage is to make use of Storage Analytics especially Storage Analytics Logging.
There's a special blob container called $logs in the same storage account which will have detailed information about every operation performed against that storage account. You can view the blobs in that blob container and find the information.
If you don't see this blob container in your storage account, then you will need to enable storage analytics on your storage account. However considering you can see the metrics data, my guess is that it is already enabled.
Regarding the source of these write operations, have you enabled diagnostics for your Functions and App Service? These write diagnostics logs to blob storage. Also, storage analytics is also writing to the same account and that will also cause these write operations.
For my case, I have a Azure App Insight which took 10K transactions on its storage per mintues for functions and app services, even thought there are only few https requests among them. I'm not sure what triggers them, but once I removed app insights, everything becomes normal.

What is the mechanism that prevents a scaled out Azure Function trigger by the same blob multiple times

Scenario:
An azure function hosted on an app service plan and scaled out to 5 instances. The Azure function is triggered by Blob.
Question:
Is there any documentation that explains the mechanism that prevents a Scaled out Azure Function process the same blob multiple times? I am asking because there is more than one instance of the function is running.
Agree with#Peter, here are my understandings for references, correct me if it doesn't make sense.
Blob trigger mechanism related info is stored in the Azure storage account for our Function app (defined by the app setting AzureWebJobsStorage). Locks locate in a blob container named azure-webjobs-hosts and there's a queue azure-webjobs-blobtrigger-<FunctionAppName> for internal use.
See another part in the same comment.
Normally only 1 of N host instances is scanning for new blobs (based on a singleton host id lock). When it finds a new blob it adds a queue message for it and one of the N hosts processes it.
So in the first step--scanning for new blobs, scale out feature doesn't participate. The singleton host id lock is implemented by blob lease as #Peter mentioned (check blob locks/<FunctoinAppName>/host in azure-webjobs-hosts).
Once internal queue starts receiving messages of new blobs, scale out feature begins to work as host instances fetch and process messages together. When a blob message is being processed it can't be seen by other instances and would be deleted later.
Besides, to ensure that blob processed never triggers function later(e.g. in next turn of scanning), another mechanism is blob receipts.
As far as I can tell blob leases are used.
It is backed by this comment made by a MS engineer working on the Azure Functions team.
The singleton mechanism used under the covers to ensure only one host processes a blob is based on the HostId. In regular scale out scenarios, the HostId is the same for all instances, so they collaborate via blob leases behind the scenes using the same lock blob scoped to the host id.

Is it possible to send messages to an Azure Service Bus Queue or Event Hub from a U-SQL script?

Is it possible for an Azure U-SQL script to put messages on an Azure Service Bus Queue or an Azure Event Hub? Please cite some documentation, if you can find it (since I can't find it).
As stated, this is not allowed.
A possible workaround would be to have the u-sql script output a file with messages to blob storage and have an azure function pick those up and send them to an Azure Service Bus Queue or an Azure Event Hub.
I got my answer here.
U-SQL scripts cannot access any external services, including Azure services such as web apps (with only a few exceptions like ADLS and WASB storage). This is to prevent an unintended DDOS attack, since U-SQL will automatically scale that request across potentially hundreds or thousands of nodes, all running over potentially millions of rows and requesting simultaneously. Please see Michael Rys' answer here for more information.

Is it possible to create a public queue in Windows Azure?

In Windows Azure it's possible to create public Blob Container. Such a container can be accessed by anonymous clients via the REST API.
Is it also possible to create a publicly accessible Queue?
The documentation for the Create Container operation explains how to specify the level of public access (with the x-ms-blob-public-access HTTP header) for a Blob Container. However, the documentation for the Create Queue operation doesn't list a similar option, leading me to believe that this isn't possible - but I'd really like to be corrected :)
At this time, Azure Queues cannot be made public.
As you have noted, this "privacy" is enforced by requiring all Storage API calls made in RE: to queues to be authenticated with a signed request from your key. There is no "public" concept similar to public containers in blob store.
This would follow best practice in that even in the cloud you would not want to expose the internals of your infrastructure to the outside world. If you wanted to achieve this functionality, you could expose a very thin/simple "layer" app on top of queues. A simple WCF REST app in a web role could expose the queuing operations to your consumers, but handle the signing of api requests internally so you would not need the queues to be public.
You are right, the Azure storage queues won't be publicly accessible like the blobs (Uris). However you may still be able to achieve a publicly consumable messaging infrastructure with the appfabric service bus.
I think the best option would be to setup a worker role and provide access to the queue publicly in that manner. Maybe with AppFabric Service Bus for extra connectivity/interactivity with external sources.
? Otherwise - not really clear what the scope might be. The queue itself appears to be locked away at this time. :(

Resources