We are using AppService on Azure which has application insights enabled. While looking at CPU usage we found that while log based metrics that average CPU is 40-80% while standard based metrics is showing CPU usage for same period and resource to be 150-300%.
Can someone explain why there is so much difference? and how come CPU usage go till 300% ?
CPU can be counted in cores (max value = #NumCores * 100) or normalized (average across all cores). For instance, if your app runs on 4 core virtual machine, then 75% overall CPU utilization will map to 300% CPU-core utilization.
I guess in your case one metric is normalized and another isn't.
Related
We are running GRAFANA/PROMETHEUS to monitor our CPU metrics and find aggregated CPU Usage of all cpus. the problem is we have enabled hyperthreading and when we stress CPU the percentage exceeds from 100%. my question is how to limit that cpu usage to show only usage in 100% not more even if cpu is highly utilized.
P.S i have tried setting the max and min limit in grafana but still the graph spikes goes above that limit.
Kindly give me the right query for this problem.
The queries I have tried are given below.
sum(irate(node_cpu_seconds_total{instance="localhost",job="node", mode!="idle"}[5m]))*100
100 - avg(irate(node_cpu_seconds_total{instance="localhost",job="node", mode!="idle"}[5m]))*100
and other similar queries we have tried.
If all you want is to "cap" a variable or expression result to a maximum value (that is, 100) you could simply use the Prometheus function clamp_max.
Thus, you could do:
clamp_max(<expr>, 100)
This is probably the most helpful query.
(1 - avg(irate(node_cpu_seconds_total{instance="$instance",job="$job",mode!="idle"}[5m])))*100
Replace your instance IP and your node exporter job name.
I have moved a azure SQL database from one subscription to another.
After moving the database the cpu utilisation is showing zero .
Cpu may have been dead.
Please help me to fix the issue.
screenshot attached
Try using the below DMV ( in SSMS /Query Editor) :
SELECT
AVG(avg_cpu_percent) AS 'Average CPU Utilization In Percent',
MAX(avg_cpu_percent) AS 'Maximum CPU Utilization In Percent',
AVG(avg_data_io_percent) AS 'Average Data IO In Percent',
MAX(avg_data_io_percent) AS 'Maximum Data IO In Percent',
AVG(avg_log_write_percent) AS 'Average Log Write I/O Throughput Utilization In Percent',
MAX(avg_log_write_percent) AS 'Maximum Log Write I/O Throughput Utilization In Percent',
AVG(avg_memory_usage_percent) AS 'Average Memory Usage In Percent',
MAX(avg_memory_usage_percent) AS 'Maximum Memory Usage In Percent'
FROM sys.dm_db_resource_stats;
I was able to see 76 percent cpu utilization using this query, still it's not showing on portal and Microsoft internal team is looking into it but at least I was able to see that cpu is not dead and it is responding.
screenshot of result
How to debug this:
Fire up SSMS and run a few queries. Some SELECT will be alright.
If you can retrieve data is all good. You probably moved to a higher tier with better performances.
We started using Prometheus and Grafana as the main tools for monitoring our Service Fabric cluster. For targeting Prometheus we use wmi_exporter, with predefined parameters: CPU, system, process, service, memory, etc. Our main goal was to start monitoring our product services on the node group each instance in Azure Service Fabric.
For instance, we are using this PQuery to calculate total CPU usage in %:
100 - (avg by (hostname) (irate(wmi_cpu_time_total{scaleset="name",mode="idle" }[5m])) * 100) and metrics +- looks realistic.
Until we started to write queries for services.
For services, sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~"processes"}[5m])) * 100, and metrics seems to be not realistic time to time, especially it is obvious after you compare it with total CPU time %. I found out an article regarding multiplying to 100 for getting % from CPU time, but in this case, I get metrics around 170% or more. Perhaps I need to divide it into the number of CPU cores?
Regarding query, I'm using the sum process because I get two different metrics for one process in two modes, user and privileged.
Can anyone please help me with the correct calculation for CPU process time total metric and transforming them to perc. ?
Thank you, I would be grateful for any help!
I hope this will help!
The result is pretty much the same as the Windows performance manager.
So, for CPU % for running services (tasks, processes):
sum by (process,hostname)(irate(wmi_process_cpu_time_total{scaleset="name", process=~"processes"}[5m])) * 100 / 2 (number of CPU cores)
First, you summarize all metrics for the running process, the exporter provides results for the same process ID: user and kernel mode metrics, so it needs to be summarized. The same must be done for hostname (instance, etc.). In my case, I have Azure scale sets, from 2 to 5 instances. It must be multiplied on 100 to get % and divide on number of CPU cores.
Cheers!
I am using SQL Azure SQL Server for my App. My app was in was working perfectly till recently and the MAX dtu usage has been 100% but the AVG DTU usage ois around 50%.
Which value should i monitor to scale the services, MAX or AVG?
I found on the net after lots of searching:
CPU max/min and average within that 1 minute. As 1 minute (60 seconds) is the finest granularity, if you chose for example max, if the CPU has touched 100% even for 1 second, it will be shown 100% for that entire minute. Perhaps the best is to use the Average. In this case the average CPU utilization from 60 seconds will be shown under that 1 minute metric.
which sorta helped me with what it all meant, but thanks to bradbury9 too for your input.
Below shows the 'CPU Total' as displayed on the Azure Live Metrics page for our Web App which is scaled out to 4 x S3 instances.
It's not clear to me (despite much research) if this CPU Total is a percentage of the max CPU available for the instance or something else. I have noticed that the CPU Total has crept above 100% from time to time, which makes me question if it is a percentage of the total.
If this metric is not a % of the total: is there anywhere in the Azure portal that will show you the % CPU usage of your servers as a % and not of a % multiplied by core count or anything else?