I have two azure functions that exist in the same azure function app and they are both connected to the same instance of application insights:
TimerFunction uses a TimerTrigger and executes every 60 seconds and logs each log type for testing purposes.
BlobFunction uses a BlobTrigger and its functionality is irrelevant for this question.
It appears that when BlobFunction is enabled (it isn't being triggered by the way), it clogs up the application insights with polling, as I don't receive some of the log messages written in TimerFunction. If I disable BlobFunction, then the logs I see in the development tools monitor for TimerFunction are all there.
This is shown in the screenshot below. TimerFunction and BlobFunction were both running until I disabled BlobFunction at 20:24, where you can clearly see the logs working "normally", then at 20:26 I re-enabled BlobFunction and the logs written by TimerFunction are again intermittent, and missing my own logged info.
Here is the sample telemetry from the live metrics tab:
Am I missing something glaringly obvious here? What is going on?
FYI: My host.json file does not set any log levels, I took them all out in the process of testing this and it is currently a near-skeleton. I also changed the BlobFunction to use a HttpTrigger instead, and the issue disappeared, so I'm 99% certain it's because of the BlobTrigger.
EDIT:
I tried to add an Event Grid trigger instead as Peter Bons suggested, but my resource group shows no storage account for some reason. The way the linked article shows, and the way this video shows (https://www.youtube.com/watch?v=0sEzimJYhME&list=WL) just don't work for me. The options are just different, as shown below:
It is normal behavior that the polling is cluttering your logs. You can of course set a log level in host.json to filter out those message, though you might loose some valueable other logging as well.
As for possible missing telemetry: it could very well be that some logs are dropped due to sampling that is enabled by default. I would also not be suprised if some logging is not shown on the portal. I've personally experienced logging being delayed up to 10 minutes or not available at all in the azure function log page on the portal. Try a direct query in App Insights as well.
Or you can go directly to the App Insights resource and create some queries yourself that filter out those messages using Search or Logs.
The other option is to not rely on polling using the blobtrigger but instead use an event grid trigger that invocates the function once a blob is added. Here is an example of calling a function when an image is uploaded to an azure storage blob container. Because there is no polling involved this is a much more efficient way of reacting to storage events.
Related
I have a long running (node.js) orchestrator in Azure Function App that calls a couple hundred activity functions. Sometimes with a group of 5 or so running in parallel with context.df.Task.all. I find that it will run steadily for about two hours then the function app itself seems to abruptly stop. The logs stop displaying in the log stream. And the records in my database that the activity functions are supposed to be writing stop writing. There are no exceptions in the logs. It will remain paused or stalled like this indefinitely... until I restart the function app. Then it will come back to to life and resume where it stopped before for a time and then stop again.
Does this behavior sound familiar to anyone?
Should I update the extension bundle to [4.0.0, 5.0.0)
Could my storage account be the problem? Should I create a new one?
We are using the "Premium Plan", Could I be running up against a limit of some kind? If so what and what should I tell the IT team to increase.
As far as I know,
Should I update the extension bundle to [4.0.0, 5.0.0)
I believe this issue is not related to extension bundles because this is regarding on the usage compatible extensions, libraries, packages used in the Function App and extension bundle is versioned where each version comprises of Rich set of supported binding extensions to be installed based on the version of the Function App.
If any timeout value is defined in the host.json, make it as unbounded (-1) as the function project is deployed/hosting in the premium plan for the longer timeout duration of function executions.
Could my storage account be the problem? Should I create a new one?
Instead of creating a new account, you can increase the quota of the Storage account to 5 PiB.
If Storage account is in consideration, then make sure that both the function app and storage account are in same region to reduce latency issues.
Also, in production environment - it is better to allocate a separate storage account for each azure function app.
We are using the "Premium Plan", Could I be running up against a limit of some kind? If so what and what should I tell the IT team to increase.
Also, you mentioned in the question that the function app stalls, with no executions after stalling and works by restart from where it has paused. I have seen some points mentioned by Microsoft even the long running functions hosted in premium plan will stops with no executions like your scenario:
Refer to the MS Doc for more information.
I posted this on github earlier, but hoping I would get an answer here from the wider community.
In the azure portal, one can look at live logs of a Function app in the Log Streaming tab. I have noticed that this often doesn't work for weeks, and I am wondering if I am doing something obviously wrong. More details below:
I have a function that receives messages from a service bus. I am able to see the logs in application insights, and I can see that it's processing the requests as expected. The problem is that I don't see any logs in the "Log Streaming" tab in the portal. See image below:
The above image indicates that no lines were logged between 6:17:46 and 6:18:46. However, see the below image from application insights logs, and you can see there were clearly several requests that were processed during this time (and several log lines written).
I tried Edge browser and Chrome, and also private tab in Edge, but I see the same behavior.
Note that I end up seeing this for extended period of time, and then sometimes it resolves itself. For example, I noticed it not work for weeks in June. Then, surprisingly I saw it work for at least a week in the beginning of July. But now I noticed it is not working.
Also, I see this behavior across all the function apps that I currently have. (So it's not limited to just one app).
I am using Azure Functions v2 .net core, C#.
I run a system on top of a bunch of Azure Functions and I'm just tidying some last threads up. I mostly abandoned the logging provided out of the box by Azure functions because I found the flush timings to be super irregular and I also wanted to consolidate the logs from all of my functions into one spot and be able to query them. This all works for the most part but I have one annoying use-case remaining where if a function binding is faulty (e.g. the azure function method signature is wrong because someone checked garbage into Git) the function won't be invoked and even the log for the function wont be invoked but the error will instead be placed into a different file (the host log).
Now I guess I can just access the storage account that backs up the azure function and pull the host log from there but I was wondering if there was a better means of directly controlling/intercepting the logging in Azure Functions. Does anyone know if there is at least a way of getting it to flush more quickly?
You can see host logs as well as function logs in associated Application Insights:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-monitoring#other-categories
Our C# web app, running on Azure, uses System.Diagnostics.Trace to write trace statements for debugging/troubleshooting. Once we enable blob storage for these logs (using the "Application Logging (blob)" option in the Azure portal), the response time for our application slows down considerably. If I turn this option off, the web app speeds up again (though obviously we don't get logs in blob storage anymore).
Does anyone know if this is expected? We certainly write a lot of trace statements on every request (100 or so per request), but I would not think this was unusual for web application. Is there some way to diagnose why enabling blob storage for the logs dramatically slows down the execution of these trace statements? Is writing the trace statement synchronous with the logs being updated in blob storage, for instance?
I was unable to find any information about how logging to blob storage in Azure was implemented. However, this is what I was able to deduce:
I confirmed that disabling the global lock had no effect. Therefore, the performance problem was not directly related to lock contention.
I also confirmed that if I turn AutoFlush off, the performance problem did not occur.
From further cross referencing the source code for the .NET trace API, my conclusion is that it appears that when you enable blob storage for logs, it injects some kind of trace listener into your application (the same way you might add a listener in web.config) and it synchronously writes every trace statement it receives to blob storage.
As such, it seems that there are a few ways to workaround this behavior:
Don't turn on AutoFlush, but manually flush periodically. This will prevent the synchronous blob writes from interrupting every log statement.
Write your own daemon that will periodically copy local log files to blob storage or something like this
Don't use this blob storage feature at all but instead leverage the tracing functionality in Application Insights.
I ended up doing #3 because, as it turns out, we already had Application Insights configured and on, we just didn't realize it could handle trace logging and querying. After disabling sampling for tracing events, we now have a way to easily query for any log statement remotely and get the full set of traces subject to any criteria (keyword match, all traces for a particular request, all traces in a particular time period, etc.) Moreover, there is no noticeable synchronous overhead to writing log statements with the Application Insights trace listener, so nothing in our application has to change (we can continue using the .NET trace class). As a bonus, since Application Insights tracing is pretty flexible with the tracing source, we can even switch to another higher performance logging API (e.g. ETW or log4net) if needed and Application Insights still works.
Ultimately, you should consider using Application Insights for storing and querying your traces. Depending on why you wanted your logs in blob storage in the first place, it may or may not meet your needs, but it worked for us.
When an Azure worker role stops (either because of an unhandled exception or because Run() finishes), what happens to local diagnostic information that has not yet been transferred? Microsoft documentation says diagnostics are transferred to storage at scheduled intervals or on demand, neither of which can cover an unhandled exception. Does this mean diagnostic information is always lost in this case? This seems particularly odd because crash dumps are part of the diagnostic data (set up by default in DiagnosticMonitorConfiguration.Directories). How then can you ever get a crash dump back (related to this question)?
To me it would be logical if diagnostics were also transferred when a role terminates, but this is not my experience.
It depends on what you mean by 'role stops'. The Diagnostic Monitor in SDK 1.3 and later is implemented as a background task that has no dependency on the RoleEntryPoint. So, if you mean your RoleEntryPoint is reporting itself as unhealthy or something like that, then your DiagnosticMonitor (DM) will still be responsive and will send data according to the configuration you have setup.
However, if you mean that a role stop is a scale down operation (shutting down the VM), then no, there is no flush of the data on disk. At that point, the VM is shutdown and the DM with it. Anything not already flushed (transferred) can be considered lost.
If you are only rebooting the VM, then in theory you will be connected back to the same resource VHDs that hold the buffered diagnostics data so you would not lose it, it would be transferred on next request. I am pretty sure that sticky storage is enabled on it, so it won't be cleaned on reboot.
HTH.
The diagnostic data is stored locally before it is transferred to storage. So that information is available to you there; you can review/verify this by using RDP to check it out.
I honestly have not tested to see if it gets transferred after the role stops. However, you can request transfers on demand. So using that approach, you could request the logs/dumps to be transferred one more time after the role has stopped.
I would suggest checking out a tool like Cerebrata Azure Diagnostics Manager to request on demand transfer of your logs, and also analyze the data.
I answered your other question as well. Part of my answer was to add the event that would allow you to change your logging and transfer settings on the fly.
Hope this helps
I think it works like this: local diagnostic data is stored in the local storage named "DiagnosticStore", which I guess has cleanOnRoleRecycle set to false. (I don't know how to verify this last bit - LocalResource has no corresponding attribute.) When the role is recycled that data remains in place and will eventually be uploaded by the new diagnostic monitor (assuming the role doesn't keep crashing before it can finish).