Getting time discrepancy in Application Insight and Postman - azure

Context: During performance evaluation we have noticed that there is a discrepancy in the response time between Postman and Application Insight.
During debugging we found that the response time given obtained from Postman Vs Application Insight are different for the same transaction. We noticed is for different transactions. Why it is so?
Current Configuration: We have application insight for an application which is running within an app service and "Adaptive Sampling" is enabled for Application insight.
Note: We are tracing the transaction with tracing ID.
Comparisions Chart:
Smaller Evaluation:
Medium Evaluation:
When the system is under stress the response time goes to min on postman, but in application insight the response time are change drastically.
May be the difference in the response time is lesser but when the number of transactions is more and when we see the 95th (refer the example below) and 99th percentile the game completely changed. Any suggestion on this? Am I missing something?
For example:
Postman Result:
Application Insight:

Posting the suggestion provided by Peter-Bonsas an answer so that it will be helpful for other community members.
From the below screenshot application insights won't take the actual latency of sending the request to the api and getting the response back. The time which we will be seeing in app insights is total execution time on the azure side

Related

Azure slow communication between APIs

In some 1-5% of our requests, we are seeing slow communication between APIs (REST API requests). Both APIs are developed by us and hosted on Azure, each app service on its own app service plan in the same region, P1v2 tier.
What we are seeing on application insights is that POST or GET requests on origin API can take a few seconds to execute, while real execution time on destination API is only a few milliseconds.
Examples (first line POST request on origin, second execution time on destination API): slow req 1, slow req 2
Our best guess is that the time difference is lost in communication between components. We don't have an explanation for it since the payload is really small and in most cases, communication takes less than 5 milliseconds.
We dismiss the possible explanation it could be due to component cold start since it happens during constant load and no horizontal scaling was performed.
Do you have any idea what might cause it or how to do additional analysis in order to discover it?
If you're running multiple sites on the App Service Plan, then enable the "Always On" setting for your web app > All Settings > Application Settings > Click on Always On
See here for details: https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
When Always On is off, the site is shut down after 20 minutes of inactivity to free up resources for any additional websites that might be using the same App Service Plan.
The amount of information it needs to collect, process and then present itself requires some time, and involve internal calls as well, that is why considering the server load and usage, it takes around 6 to 7 seconds sometimes even more.
To Troubleshoot that latency, try this steps, provided by Microsoft.

Differences between Insights response time and App Service response time

I noticed a difference in AVG and MAX response time between Insights (Performance / View in Logs / Response Time) and the response time that I see in the (Overview / Response Time) of the Web App.
The values on OverView are higher (seems that there are some requests that take 20 seconds and mess up the average.
I have no traces of these requests looking in the Insights response time or trying to search the requests that take more time.
Insights values seem more correct because seems that the web app is working properly. What could be that difference?
There is a similar question with yours: What is the difference between Azure Monitor 'Response time' and AppInsights 'Duration'?
Response time in overview page is the Azure Monitor metric, just like the reply described:
Azure Monitoring is gathering statistings on the web server as a
whole. That's why it is reporting on CPU and memory usage in addition
to response times.
Response time in performance is the Application Insights metric
Application Insights calculate average request duration for all
requests, without considering if calculated before request hits the
load balancer.

First request for Google Cloud NodeJs API taking more time than subsequent requests

I have a nodejs based aplication running as a Google App Engine application. It accesses the database using node-postgres module. I have noticed the following:
The first request that I am making from my machine (using postman) is taking longer (around 800 ms- 1.5 seconds). However, the subsequent requests that I am making are taking much lesser time (around 200 ms - 350 ms).
I am unable to pinpoint the exact reason for this happening. It could be due to the following reasons:
A new connection is initiated the first time I make a request to the server.
There is some issue with the database fetching using node-postgres (But since the problem occurs only at the first instance, this is more unlikely).
I am worried about this issue because logs are showing me that almost 20% of my requests are taking around 2 seconds. When I viewed the logs for some of the time taking requests, they seemed to be instantiating a new process which was leading to the longer wait time.
What can I do to investigate further and resolve this issue?
Your first request take more time than the others because App Engine standard has a startup time for a new instance. This time is really short, but there is. You need to add the time to set up the connection to the database. This is why you have a longer response time for the first request.
To understand better the app engine start time you can read the Best practices for App Engine startup time doc (little bit old but I think really clear). And to perform profiling for your app engine application you can read in this Medium public blog.
After this, you can set up a Stackriver dashboard to understand if your 20% of slow requests are due to the start of a new app engine instance.

Azure - App Availability percentage is Zero

our Api app is in UAT on Azure with service plan (Standard 3 large). What should we do if App Availability is Zero. It is getting slow response or timeout issue. When i restart the application it is up to normal. (We are using Parallel Language programming.(Async/Await)
How to find the route cause from it for slowness issue.
Ensure that Always On feature is enabled.
Such problems may be caused by application level issues, such as:
network requests taking a long time
application code or database queries being inefficient
application using high memory/CPU
application crashing due to an exception
You could enable web server diagnostics to fetch more details on the issue.
Detailed Error Logging - Detailed error information for HTTP status codes that indicate a failure (status code 400 or greater). This may contain information that can help determine why the server returned the error code.
Failed Request Tracing - Detailed information on failed requests, including a trace of the IIS components used to process the request and the time taken in each component. This can be useful if you are attempting to improve web app performance or isolate what is causing a specific HTTP error.
Web Server Logging - Information about HTTP transactions using the W3C extended log file format. This is useful when determining overall web app metrics, such as the number of requests handled or how many requests are from a specific IP address.
Also, Azure Application Insights collects telemetry from your application to help analyze its operation and performance. You can use this information to identify problems that may be occurring or to identify improvements to the application that would most impact users. This tutorial takes you through the process of analyzing the performance of both the server components of your application and the perspective of the client: https://learn.microsoft.com/en-us/azure/application-insights/app-insights-tutorial-performance
Ref: https://learn.microsoft.com/en-us/azure/app-service/app-service-web-troubleshoot-performance-degradation

Azure App Insights Sampling (ItemCount)

I have a question about Azure App Insights Sampling.
If itemCount field is greater than 1 for a log item, does it mean that there was an exactly the SAME request and it was sampled?
My logs have one request that sends this message with itemCount = 2. And this request has ended with OptimisticConcurrencyException, so my transaction has been roll-backed. In this transaction I send a message to 3rd party service.
The most interesting is that they told me they've got 2 messages from my service and my database has been updated (so transaction has been committed).
All of it became clear, if there were 2 requests and one of them returned 200 code, and another returned 500. But app insights log item abot OptimisticConcurrencyException has value itemCount = 2, which means that this exception was thrown twice (for both requests).
Furthermore Beside this I don't see any other requests, that could change data, that request was changing.
So could anybody explain me how app insights samples requests and errors?
This really depends on how/where your sampling occurred, as sampling could have occurred at 3 different places depending on how you have your app configured.
There's a fair amount of documentation about the various layers of sampling, but hypothetically:
The sampling algorithm decides which telemetry items to drop, and which ones to keep (whether it's in the SDK or in the Application Insights service). The sampling decision is based on several rules that aim to preserve all interrelated data points intact, maintaining a diagnostic experience in Application Insights that is actionable and reliable even with a reduced data set. For example, if for a failed request your app sends additional telemetry items (such as exception and traces logged from this request), sampling will not split this request and other telemetry. It either keeps or drops them all together. As a result, when you look at the request details in Application Insights, you can always see the request along with its associated telemetry items.
Update:
I got some more details from people on the team that do the sampling, and it works like this:
Sampling ratio is determined by the number of events per second occurring in the app
The AI SDK randomly selects requests to be sampled when the request begins (so, it is not known whether it will fail or succeed)
AI SDK assigns itemCount=<sampling ratio>
This would then explain the behavior you are seeing, when two requests (success + failure) were counted as two failures: the failed request was sampled "in", and so in telemetry, you'd have 2 failed requests (one request with itemCount=2) instead of a failed and a successful, because the successful one got sampled away.

Resources