I'm currently seeing many Dependency errors in Azure application insights, and I'm having trouble determining the root cause.
I currently have an API deployed as an app service within azure. The API is connected to a CosmosDB account for basic CRUD operations. While monitoring the default application insights, I've run across several Dependency Errors:
Type: Azure DcumentDB
Name: Create/Querydocument
Call status: false
res: undefined
This behavior seems to be very intermittent (maybe a problem with concurrency), but does not seem to actually be causing API errors as the query itself still appears to be completed successfully. Any thoughts on the root cause of the issue, or how to get details regarding the error specifically would be greatly appreciated.
Here is a screenshot of the end-to-end transaction for reference:
Dependency Error
Is your app running on Windows? Is it compiled as X64/Release?
The "failure" is related to this: https://learn.microsoft.com/azure/cosmos-db/sql/performance-tips-query-sdk?tabs=v3&pivots=programming-language-csharp#use-local-query-plan-generation
Your app seems to be performing cross-partition queries, when the SDK is either not running on Windows or not built as x64 or when not all the DLLs that come with the nuget package are copied, it needs to do an HTTP request to obtain the Query Plan.
What you are seeing is the SDK retrying the Query Plan request because for some reason, you are having high latency (500ms is quite high for an HTTP request).
Related
I am using Azure Application Insight on App Service
As shown in the picture taken from Application Insight, sometimes the API will wait for long time before executing the real operation which is after 22 minutes
Any idea why it will "wait and do nothing"? Its just on and off getting this senario
Thanks
The issue may be caused by various factors, including
Please verify the Application Insights SDK version; if it is old, please upgrade it to the latest version.
It appears that ServerTelemetryChannel uses sync-over-async calls when disposing due to async calls in long-running operations. **GetAwaiter().The calling thread is also blocked by GetResult()**Find reference in the tutorial.
I've a background process which updates data in table storage, in certain conditions due to concurrency, updates to the table storage fails with status code:409 "pre condition failed" and it is handled in the back ground process, even though it is handled in the code, this is appearing in the App insights Dependencies as a failure. Is there a better way to handle these exceptions so that it doesn't appear as a failure, since it is handled in the code.
If you are using Azure Table Storage and Application Insights in your code, you may get some conflicts errors like 409.
If you are using Azure.Data.Table SDK and you are checking if the table is exist or not if not create it.
In this case CreateIfNotExists() the SDK simply try to create a new table and it hides the error if the table is available. This you are not noticed in your code you may get the dependency failure in Application Insights. Application insights will catch this information in a log.
To avoid the dependency failure, you can follow the below steps
Catch the type of Configure events in Application Insights
Manually check before your script runs in a background if the table exists or not before creating it.
References
Failing Dependencies from Application Insights logging
Conflict Error while using Application Insights and Table Storage
We are running NodeJS in the App Engine standard environment and while we try to be perfect programmers, we some times have a bug, the issue we're running into app engine completely crashes the server every time and throws a 203 error.
We've tried to do all the standard error handling things for Node, but it seems like app engine is a special case. Has anyone seen this or handled this issue before?
As it is stated in the answer https://stackoverflow.com/a/51769527/10301041:
"The error 203 means that Google App Engine detected that the RPC channel has closed unexpectedly and shuts down the instance. The request failure is caused by the instance shutting down."
An error in your code can be the cause of that. Other cause might be one of the project quotas.
If you still running on the issue and you can't identify the source of the error I would suggest to contact GCP support, as it is also suggested in the answer above.
I have an app service running that has 8 instances running in the service plan.
The app is written in asp dotnet core, it's an older version than is currently available.
Occasionally I have an issue where the servers start returning a high number of 5xx errors after a period of sustained load.
It appears that only one instance is having an issue - which is causing the failed request rate to climb.
I've noticed that there is a corresponding increase in the "locally written bytes" on the instance that is having problems - I am not writing any data locally so I am confused as to what this metric is actually measuring. In addition the number of open connections goes high and then stays high - rebooting the problematic instance doesn't seem to achieve anything.
The only thing I suspect is that we are copying data from a user's request straight into Azure Blob Store using the UploadFromStreamAsync from the HttpRequest.Body - with the data coming from a mobile phone app.
Microsoft support suggested we swapped to using local cache as an option to reduce issues with storage, however this has not resolved the issue.
Can anyone tell me what is the "locally written bytes" actually measuring? There is little documentation on this metric that I can find in google.
We run an App Service at Azure that is configured with 8 nodes. After the latest restart of the application, only 1 node is responding. We can judge that looking at live stream data in Application Insights. Requests from the clients mostly fail because they are directed to the dead nodes.
We run Windows environment with Java and Tomcat.
Any idea what could go wrong?
Sorry for a vague request guys. That was kind of emergency. Anyway, the problem is resolved, and I hope this information can be useful for someone getting into same kind of trouble.
The problem appeared to be with Azure infrastructure that experienced some failure and switched to a failover storage resource. Our application could not be started after that. Advised by the Microsoft's support, we downgraded our service plan, and then upgraded it again, thus causing all nodes be created from scratch. That helped.