Application insights Dependency failures due to Table storage concurrency exception - azure

I've a background process which updates data in table storage, in certain conditions due to concurrency, updates to the table storage fails with status code:409 "pre condition failed" and it is handled in the back ground process, even though it is handled in the code, this is appearing in the App insights Dependencies as a failure. Is there a better way to handle these exceptions so that it doesn't appear as a failure, since it is handled in the code.

If you are using Azure Table Storage and Application Insights in your code, you may get some conflicts errors like 409.
If you are using Azure.Data.Table SDK and you are checking if the table is exist or not if not create it.
In this case CreateIfNotExists() the SDK simply try to create a new table and it hides the error if the table is available. This you are not noticed in your code you may get the dependency failure in Application Insights. Application insights will catch this information in a log.
To avoid the dependency failure, you can follow the below steps
Catch the type of Configure events in Application Insights
Manually check before your script runs in a background if the table exists or not before creating it.
References
Failing Dependencies from Application Insights logging
Conflict Error while using Application Insights and Table Storage

Related

Azure documentdb create/query document dependency error in Application Insights

I'm currently seeing many Dependency errors in Azure application insights, and I'm having trouble determining the root cause.
I currently have an API deployed as an app service within azure. The API is connected to a CosmosDB account for basic CRUD operations. While monitoring the default application insights, I've run across several Dependency Errors:
Type: Azure DcumentDB
Name: Create/Querydocument
Call status: false
res: undefined
This behavior seems to be very intermittent (maybe a problem with concurrency), but does not seem to actually be causing API errors as the query itself still appears to be completed successfully. Any thoughts on the root cause of the issue, or how to get details regarding the error specifically would be greatly appreciated.
Here is a screenshot of the end-to-end transaction for reference:
Dependency Error
Is your app running on Windows? Is it compiled as X64/Release?
The "failure" is related to this: https://learn.microsoft.com/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp#use-local-query-plan-generation
Your app seems to be performing cross-partition queries, when the SDK is either not running on Windows or not built as x64 or when not all the DLLs that come with the nuget package are copied, it needs to do an HTTP request to obtain the Query Plan.
What you are seeing is the SDK retrying the Query Plan request because for some reason, you are having high latency (500ms is quite high for an HTTP request).

Blob trigger affecting application insight logging in azure functions

I have two azure functions that exist in the same azure function app and they are both connected to the same instance of application insights:
TimerFunction uses a TimerTrigger and executes every 60 seconds and logs each log type for testing purposes.
BlobFunction uses a BlobTrigger and its functionality is irrelevant for this question.
It appears that when BlobFunction is enabled (it isn't being triggered by the way), it clogs up the application insights with polling, as I don't receive some of the log messages written in TimerFunction. If I disable BlobFunction, then the logs I see in the development tools monitor for TimerFunction are all there.
This is shown in the screenshot below. TimerFunction and BlobFunction were both running until I disabled BlobFunction at 20:24, where you can clearly see the logs working "normally", then at 20:26 I re-enabled BlobFunction and the logs written by TimerFunction are again intermittent, and missing my own logged info.
Here is the sample telemetry from the live metrics tab:
Am I missing something glaringly obvious here? What is going on?
FYI: My host.json file does not set any log levels, I took them all out in the process of testing this and it is currently a near-skeleton. I also changed the BlobFunction to use a HttpTrigger instead, and the issue disappeared, so I'm 99% certain it's because of the BlobTrigger.
EDIT:
I tried to add an Event Grid trigger instead as Peter Bons suggested, but my resource group shows no storage account for some reason. The way the linked article shows, and the way this video shows (https://www.youtube.com/watch?v=0sEzimJYhME&list=WL) just don't work for me. The options are just different, as shown below:
It is normal behavior that the polling is cluttering your logs. You can of course set a log level in host.json to filter out those message, though you might loose some valueable other logging as well.
As for possible missing telemetry: it could very well be that some logs are dropped due to sampling that is enabled by default. I would also not be suprised if some logging is not shown on the portal. I've personally experienced logging being delayed up to 10 minutes or not available at all in the azure function log page on the portal. Try a direct query in App Insights as well.
Or you can go directly to the App Insights resource and create some queries yourself that filter out those messages using Search or Logs.
The other option is to not rely on polling using the blobtrigger but instead use an event grid trigger that invocates the function once a blob is added. Here is an example of calling a function when an image is uploaded to an azure storage blob container. Because there is no polling involved this is a much more efficient way of reacting to storage events.

ASP.NET WebApp in Azure using lots of CPU

We have a long running ASP.NET WebApp in Azure which has no real endpoints exposed – it serves a single functional purpose primarily reading and manipulating database data, effectively a batched, scheduled task, triggered by a timer every 30 seconds.
The app runs fine most of the time but we are seeing occasional issues where the CPU load for the app goes close to the maximum for the AppServicePlan, instantaneously rather than gradually, and stops executing any more timer triggers and we cannot find anything explicitly in the executing code to account for it (no signs of deadlocks etc. and all code paths have try/catch so there should be no unhandled exceptions). More often than not we see errors getting a connection to a database but it’s not clear if those are cause or symptoms.
Note, this is the only resource within the AppService Plan. The Azure SQL database is in the same region and whilst utilised by other apps is very lightly used by them and they also exhibit none of the issues seen by the problem app.
It feels like this is infrastructure related but we have been unable to find anything to explain what is happening so if anyone has any suggestions for where we should be looking they would be gratefully received. We have enabled basic Application Insights (not SDK) but other than seeing CPU load spike prior to loss of app response there is little information of interest given our limited knowledge of how to best utilise Insights.
According to your description, I thought of two points to troubleshoot your problem. First of all, you can track the running status of your program through the code, and put a log at the beginning and end of your batch scheduled tasks to record the status of each run. If possible, record request and response information and start and end information. This can completely record the time and running status of your task.
Secondly, you can record logs before the program starts database operations, and whether the database connection is successful. The best case is to be able to record, what business will trigger CPU load when operating, and track the specific operating conditions, in order to specifically analyze what causes the database connection failure.
Because you cannot reproduce your problem, you can only guess the cause of the problem. If you still can't find where the problem is through the above two points, then modify your timer appropriately, and let the program trigger once every 5 minutes instead of 30s.

Performance impact of writing Azure diagnostic logs to blob storage

Our C# web app, running on Azure, uses System.Diagnostics.Trace to write trace statements for debugging/troubleshooting. Once we enable blob storage for these logs (using the "Application Logging (blob)" option in the Azure portal), the response time for our application slows down considerably. If I turn this option off, the web app speeds up again (though obviously we don't get logs in blob storage anymore).
Does anyone know if this is expected? We certainly write a lot of trace statements on every request (100 or so per request), but I would not think this was unusual for web application. Is there some way to diagnose why enabling blob storage for the logs dramatically slows down the execution of these trace statements? Is writing the trace statement synchronous with the logs being updated in blob storage, for instance?
I was unable to find any information about how logging to blob storage in Azure was implemented. However, this is what I was able to deduce:
I confirmed that disabling the global lock had no effect. Therefore, the performance problem was not directly related to lock contention.
I also confirmed that if I turn AutoFlush off, the performance problem did not occur.
From further cross referencing the source code for the .NET trace API, my conclusion is that it appears that when you enable blob storage for logs, it injects some kind of trace listener into your application (the same way you might add a listener in web.config) and it synchronously writes every trace statement it receives to blob storage.
As such, it seems that there are a few ways to workaround this behavior:
Don't turn on AutoFlush, but manually flush periodically. This will prevent the synchronous blob writes from interrupting every log statement.
Write your own daemon that will periodically copy local log files to blob storage or something like this
Don't use this blob storage feature at all but instead leverage the tracing functionality in Application Insights.
I ended up doing #3 because, as it turns out, we already had Application Insights configured and on, we just didn't realize it could handle trace logging and querying. After disabling sampling for tracing events, we now have a way to easily query for any log statement remotely and get the full set of traces subject to any criteria (keyword match, all traces for a particular request, all traces in a particular time period, etc.) Moreover, there is no noticeable synchronous overhead to writing log statements with the Application Insights trace listener, so nothing in our application has to change (we can continue using the .NET trace class). As a bonus, since Application Insights tracing is pretty flexible with the tracing source, we can even switch to another higher performance logging API (e.g. ETW or log4net) if needed and Application Insights still works.
Ultimately, you should consider using Application Insights for storing and querying your traces. Depending on why you wanted your logs in blob storage in the first place, it may or may not meet your needs, but it worked for us.

What happens to Azure diagnostic information when a role stops?

When an Azure worker role stops (either because of an unhandled exception or because Run() finishes), what happens to local diagnostic information that has not yet been transferred? Microsoft documentation says diagnostics are transferred to storage at scheduled intervals or on demand, neither of which can cover an unhandled exception. Does this mean diagnostic information is always lost in this case? This seems particularly odd because crash dumps are part of the diagnostic data (set up by default in DiagnosticMonitorConfiguration.Directories). How then can you ever get a crash dump back (related to this question)?
To me it would be logical if diagnostics were also transferred when a role terminates, but this is not my experience.
It depends on what you mean by 'role stops'. The Diagnostic Monitor in SDK 1.3 and later is implemented as a background task that has no dependency on the RoleEntryPoint. So, if you mean your RoleEntryPoint is reporting itself as unhealthy or something like that, then your DiagnosticMonitor (DM) will still be responsive and will send data according to the configuration you have setup.
However, if you mean that a role stop is a scale down operation (shutting down the VM), then no, there is no flush of the data on disk. At that point, the VM is shutdown and the DM with it. Anything not already flushed (transferred) can be considered lost.
If you are only rebooting the VM, then in theory you will be connected back to the same resource VHDs that hold the buffered diagnostics data so you would not lose it, it would be transferred on next request. I am pretty sure that sticky storage is enabled on it, so it won't be cleaned on reboot.
HTH.
The diagnostic data is stored locally before it is transferred to storage. So that information is available to you there; you can review/verify this by using RDP to check it out.
I honestly have not tested to see if it gets transferred after the role stops. However, you can request transfers on demand. So using that approach, you could request the logs/dumps to be transferred one more time after the role has stopped.
I would suggest checking out a tool like Cerebrata Azure Diagnostics Manager to request on demand transfer of your logs, and also analyze the data.
I answered your other question as well. Part of my answer was to add the event that would allow you to change your logging and transfer settings on the fly.
Hope this helps
I think it works like this: local diagnostic data is stored in the local storage named "DiagnosticStore", which I guess has cleanOnRoleRecycle set to false. (I don't know how to verify this last bit - LocalResource has no corresponding attribute.) When the role is recycled that data remains in place and will eventually be uploaded by the new diagnostic monitor (assuming the role doesn't keep crashing before it can finish).

Resources