Azure CPU metric (Live Metrics Stream) - azure

Below shows the 'CPU Total' as displayed on the Azure Live Metrics page for our Web App which is scaled out to 4 x S3 instances.
It's not clear to me (despite much research) if this CPU Total is a percentage of the max CPU available for the instance or something else. I have noticed that the CPU Total has crept above 100% from time to time, which makes me question if it is a percentage of the total.
If this metric is not a % of the total: is there anywhere in the Azure portal that will show you the % CPU usage of your servers as a % and not of a % multiplied by core count or anything else?

Related

How to configure an Azure alert for the max cpu held over x time period

I currently have an alert that triggers on average cpu over 80% over the last 15 minutes. However, we have instances that stay over 100% for hours or days while others are very look leaving the average under 80%. I want to find those instances that are higher than 80% for an extended period of time. But what I don't want is a spike and using the max seems only do just that, spend out alerts for spikes and not constants. Anyway I can get such an alert configured?

Why ElasticBeanstalk CPU alarm checks maximum not average?

I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js elasticbeanstalk. While installing Node.js dependencies, the EC2 instance is using a lot of CPU resources.
However, I found that the "average" CPU usage didn't exceed this threshold, while the "maximum" CPU utilization exceeded this threshold. Why the elasticbeanstalk alarm occurs even if the average cpu utilization doesn't exceed the threshold?
[
Why is it happening? I'm tired of false positive CPU alarms :(
How do I solve this problem?
I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js
elasticbeanstalk.
It means that the cloud watch alarm trigger will take an average of 1 minute and trigger the alarm if it crosses 60.
In the first screenshot, it seems that the CPU utilization was high from almost 9:57 until 10:07 for 10 minutes.
In the second screenshot, it shows the average was a max of 30 during this period. Let's do some math to understand it:
Cpu utilization was not consistently high, the graph shows the peak recorded and if the CPU is utilised 90% for 3 seconds and 10% for 57 seconds, the average will be 27% high for 1 minute.
The above case is almost similar to it. That's why you see different graphs in maximum and average.

Prometheus. CPU process time total to % percent

We started using Prometheus and Grafana as the main tools for monitoring our Service Fabric cluster. For targeting Prometheus we use wmi_exporter, with predefined parameters: CPU, system, process, service, memory, etc. Our main goal was to start monitoring our product services on the node group each instance in Azure Service Fabric.
For instance, we are using this PQuery to calculate total CPU usage in %:
100 - (avg by (hostname) (irate(wmi_cpu_time_total{scaleset="name",mode="idle" }[5m])) * 100) and metrics +- looks realistic.
Until we started to write queries for services.
For services, sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~"processes"}[5m])) * 100, and metrics seems to be not realistic time to time, especially it is obvious after you compare it with total CPU time %. I found out an article regarding multiplying to 100 for getting % from CPU time, but in this case, I get metrics around 170% or more. Perhaps I need to divide it into the number of CPU cores?
Regarding query, I'm using the sum process because I get two different metrics for one process in two modes, user and privileged.
Can anyone please help me with the correct calculation for CPU process time total metric and transforming them to perc. ?
Thank you, I would be grateful for any help!
I hope this will help!
The result is pretty much the same as the Windows performance manager.
So, for CPU % for running services (tasks, processes):
sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~"processes"}[5m])) * 100 / 2 (number of CPU cores)
First, you summarize all metrics for the running process, the exporter provides results for the same process ID: user and kernel mode metrics, so it needs to be summarized. The same must be done for hostname (instance, etc.). In my case, I have Azure scale sets, from 2 to 5 instances. It must be multiplied on 100 to get % and divide on number of CPU cores.
Cheers!

Azure Analytics: Difference Between Log vs standard based metrics

We are using AppService on Azure which has application insights enabled. While looking at CPU usage we found that while log based metrics that average CPU is 40-80% while standard based metrics is showing CPU usage for same period and resource to be 150-300%.
Can someone explain why there is so much difference? and how come CPU usage go till 300% ?
CPU can be counted in cores (max value = #NumCores * 100) or normalized (average across all cores). For instance, if your app runs on 4 core virtual machine, then 75% overall CPU utilization will map to 300% CPU-core utilization.
I guess in your case one metric is normalized and another isn't.

azure what is Time Aggregation Max or AVG

I am using SQL Azure SQL Server for my App. My app was in was working perfectly till recently and the MAX dtu usage has been 100% but the AVG DTU usage ois around 50%.
Which value should i monitor to scale the services, MAX or AVG?
I found on the net after lots of searching:
CPU max/min and average within that 1 minute. As 1 minute (60 seconds) is the finest granularity, if you chose for example max, if the CPU has touched 100% even for 1 second, it will be shown 100% for that entire minute. Perhaps the best is to use the Average. In this case the average CPU utilization from 60 seconds will be shown under that 1 minute metric.
which sorta helped me with what it all meant, but thanks to bradbury9 too for your input.

Resources