Sampling Intervals for Azure Metrics - azure

Could you please help understand how azure metrics are calculated. For an example, we can see the request/total request graph, and a number at a given point in time (say 6000 at 4.34 PM) for Azure API Management. The request count has no meaning at a given point in time, but a measure for a given "period" of time. When i research, i found that the number represent the number of request received during the sampling interval.However, no further data is available on what the sampling interval is. Azure portal/metrics graphs has no setting to view/change sampling interval either.
So what's the sampling interval used for Azure metrics? or what does the total request count meaning at a given point in time?
(However, the application insight metrics allow you to set the sampling interval)
could you please shed some light? thanks

I think the comment from AjayKumar-MSFT is correct, so I summarize if it could help others:
Typically - ‘Requests’ - The total number of requests regardless of their resulting HTTP status code and where as ‘Requests In Application Queue’ is the number of requests in the application request queue. You can always change the 'chart Setting' for much detailed info, by going into the metric and ellipsis (...) /settings

Related

Azure App Service metrics aggregation for requests: why are Sum and Count different?

When looking at the metrics from our app services in Azure, I'm very confused at Sum and Count's aggregation metrics for requests. They should be the same, according to the MS tech doc.
Count: The number of measurements captured during the aggregation interval.
When the metric is always captured with the value of 1, the count aggregation is equal to the sum aggregation. This scenario is common when the metric tracks the count of distinct events and each measurement represents one event. The code emits a metric record every time a new request arrives.
And this MS tech doc as well.
Though not the case in this example, Count is equal to Sum in cases where a metric is always captured with the value of 1. This is common when a metric tracks the occurrence of a transactional event--for example, the number of HTTP failures mentioned in a previous example in this article.
So, let say, for a specific period, if there are 10 HTTP requests, the count of requests is 10, then the sum of requests is also 10.
But ours are all different. Below are one web app service's Sum and Count metrices, you can see they are very different. But why?
From offical restapi, we can see that count and sum are still different.
If you want more explanation, you can refer to the following post, or raise a support for help.
Related Post:
Azure App Service Metrics - How to interpret Sum vs. Count related to requests?

Different reporting frequency for different types of metrics in micrometer

Can I set different reporting frequency for different types of metrics in micrometer? For example I want send endpoints metrics with 10s step and the others with 5s step.
There's a property for reporting frequency per meter registry but AFAICT, there's no concept for reporting frequency per meter. If you'd like to pursue this, you can create an issue to request the feature in its issue tracker: https://github.com/micrometer-metrics/micrometer/issues

Azure VM stats - Network In/Out - what are the measurements?

I feel perturbed, but I don't understand the measurement Azure uses for Network In/Out and a few other things.
On Azure portal -> my VM -> Metrics -> [Host] Network In/Out, it says that it is measured in bytes, but then it also draws graph over time. If it were plain, bytes, it should be cumulative and therefore grow indefinitely, but it isn't, therefore I am inclined to believe it is measured per second or something like that. But Azure docs claim that it is bytes and not bytes per second (link here)
Am I missing something obvious?
I am inclined to believe the data is in bytes per minute. At least for mine it appears that way. I set my graph for a 10 minute interval. Taking the mouse off the graph the total bytes show at the bottom. Hovering over the individual sample points (10 in total) they average between 31-34MB each. Adding them up you get close to the total for the graph interval 326MB. 10*32.5 is very close to the this total leading me to believe that each interval on the graph is a sum of the individual interval (1 minute). That is what I am seeing anyway. Terrible documentation from Microsoft. Why not just specify this in the (i) hover point on the individual graph?
#eddyP23 - if you add up all your points in your graph it appears you would come to the same conclusion. Each point is a sum of the interval (1 minute). I am not sure how else to read this.
If it were bytes per second the data total for the complete interval would be vastly larger. 10 minute interval
Sorry for the delay.
therefore I am inclined to believe it is measured per second or
something like that. But Azure docs claim that it is bytes and not
bytes per second
You can find the Network In here:
The Network In (bytes per second) used for monitor your VM's performance.

Performance testing - Jmeter results

I am using Jmeter (started using it a few days ago) as a tool to simulate a load of 30 threads using a csv data file that contains login credentials for 3 system users.
The objective I set out to achieve was to measure 30 users (threads) logging in and navigating to a page via the menu over a time span of 30 seconds.
I have set my thread group as:
Number of threads: 30
Ramp-up Perod: 30
Loop Count: 10
I ran the test successfully. Now I'd like to understand what the results mean and what is classed as good/bad measurements, and what can be suggested to improve the results. Below is a table of the results collated in the Summary report of Jmeter.
I have conducted research only to find blogs/sites telling me the same info as what is defined on the jmeter.apache.org site. One blog (Nicolas Vahlas) that I came across gave me some very useful information,but still hasn't help me understand what to do next with my results.
Can anyone help me understand these results and what I could do next following the execution of this test plan? Or point me in the right direction of an informative blog/site that will help me understand what to do next.
Many thanks.
According to me, Deviation is high.
You know your application better than all of us.
you should focus on, avg response time you got and max response frequency and value are acceptable to you and your users? This applies to throughput also.
It shows average response time is below 0.5 seconds and maximum response time is also below 1 second which are generally acceptable but that should be defined by you (Is it acceptable by your users). If answer is yes, try with more load to check scaling.
In you requirement it is mentioned that you need have 30 concurrent users performing different actions. The response time of your requests is less and you have ramp-up of 30 seconds. Can you please check total active threads during the test. I believe the time for which there will be 30 concurrent users in system is pretty short so the average response time that you are seeing seems to be misleading. I would suggest you run a test for some more time so that there will be 30 concurrent users in the system and that would be correct reading as per your requirements.
You can use Aggregate report instead of summary report. In performance testing
Throughput - Requests/Second
Response Time - 90th Percentile and
Target application resource utilization (CPU, Processor Queue Length and Memory)
can be used for analysis. Normally SLA for websites is 3 seconds but this requirement changes from application to application.
Your test results are good, considering if the users are actually logging into system/portal.
Samples: This means the no. of requests sent on a particular module.
Average: Average Response Time, for 300 samples.
Min: Min Response Time, among 300 samples (fastest among 300 samples).
Max: Max Response Time, among 300 samples (slowest among 300 samples).
Standard Deviation: A measure of the variation (for 300 samples).
Error: failure %age
Throughput: No. of request processed per second.
Hope this will help.

Logstash metrics plugin: What does events.rate_5m mean?

This is should be a fairly easy question for Logstash veterans.
When I use the metrics plugin, what does events.rate_5m mean?
Does it mean: Number of events per second in a 5 minute window?
Does it mean: Number of events every 5 minutes?
Also, what's the difference between using this over timer.rate_5m?
The documentation isn't very clear and I have problems understanding it.
Thanks in advance!
Logstash uses the Metriks library to generate the metrics.
According to that site:
A meter that measures the mean throughput and the one-, five-, and fifteen-minute exponentially-weighted moving average throughputs.
and
A timer measures the average time as well as throughput metrics via a meter.
A meter counts events and a timer is used to look at durations (you have to pass a name and a value into a timer).
To answer your specific question, the rate_5m is the per second rate over the last 5 minute sliding window.

Resources