What is the difference between Disk IO Utilisation and Disk IO Saturation in Grafana? - performance-testing

Our product uses the grafana tool to monitor a multi node, cpu, executor application. I don't know much as I am new in this field. At a particular moment I noticed that the "Disk IO Saturation" is close to 100% while the Utilisation is very less. Can anyone explain the difference between these two in more depth or point to any resource from where I can read about it?

Related

How to identify the resource bound of the system

I want to test the performance of the databases above different storages (i.e., SSD-1, SSD-2, HDD-1, HDD-2...). To expose the difference in performance above different storages, I want to make the database workload disk-I/O-intensive (using write-intensive workload).
But how to confirm that the system is bounded by disk I/O rather than something else like CPU, memory accesses, etc. Does simply using top to see the CPU usage (Usage-in-percent/CPU-cores < 100%) works?
For example, I run
sysbench fileio --file-num=16 --file-total-size=2G --file-fsync-all=on --file-test-mode=seqwr --time=30 run
and monitor with top:
PID COMMAND %CPU TIME #TH #WQ #PORT MEM PURG CMPRS PGRP
33998 sysbench 54.8 00:03.29 2/1 0 11 1316K 0B 0B 3399
Does this mean the system is bounded by disk-IO?
Thanks!
Your question seems a bit vague. Are you asking about a way to identify the bottleneck of a generic system, or just a particular database, or disk performance of a different drives??
I would say its possible to design a benchmark that stresses disk more than other things, but probably not possible to propose a general algorithm to identify system bottleneck.
the way you design/engineer a particular benchmark against a particular application usually involves at least some high level familiarity of what the application is doing and then reason why a particular benchmark stresses resource A and not resource B. you might need to do other control benchmarks to conclude resource B is really not bottlenecked and nothing else is also ... like all things engineering, iteration is your friend.
If you are trying to get an idea on say "how much does this particular SSD improve the performance of my postgres database??" that really depends on the application workload. the answer maybe none at all or quite a lot.
what i can tell you is that the sysbench fileio benchmark will say nothing about that question.
conversely the sysbench OLTP benchmark probably tells you not much about a "disk constraint database workload" because i think it would do too much table locking and lock contention in general for it to be disk constrained.
For example some recent benchmarking I was doing I found pg13 to be capable of doing nearly 400k (yes nearly half a million) 100 byte record insertions per second, on a old 8 core gaming pc with consumer SSD which was quite a bit more than the 1000 transactions per second reported by sysbench OLTP benchmark

What is the difference between Memory and IO bandwidth and how do we measure each one?

What is the difference between memory and io bandwidth, and how do you measure each one?
I have so many assumptions, forgive the verbosity of this two part question.
The inspiration for these questions came from: What is the meaning of IB read, IB write, OB read and OB write. They came as output of Intel® PCM while monitoring PCIe bandwidth where Hadi explains:
DATA_REQ_OF_CPU is NOT used to measure memory bandwidth but i/o bandwidth.
I’m wondering if the difference between mem/io bandwidth is similar to the difference between DMA(direct memory addressing) & MMIO(memory mapped io) or if the bandwidth of both IS io bandwidth?
I’m trying to use this picture to help visualize:
(Hopefully I have this right) In x86 there are two address spaces: Memory and IO. Would IO bandwidth be the measure between cpu (or dma controller) to the io device, and then memory bandwidth would be between cpu and main memory? All data in these two scenarios running through the memory bus? Just for clarity, we all agree the definition of the memory bus is the combination of address and data bus? If so that part of the image might be a little misleading...
If we can measure IO bandwidth with Intel® Performance Counter Monitor (PCM) by utilizing the pcm-iio program, how would we measure memory bandwidth? Now I’m wondering why they would differ if running through the same wires? Unless I just have this all wrong. The github page for a lot of this test code is a bit overwhelming: https://github.com/opcm/pcm
Thank you
The DATA_REQ_OF_CPU event cannot be used to measure memory bandwdith for the following reasons:
Not all inbound memory requests from an IIO controller are serviced by a memory controller because a request could also be serviced by the LLC (or an LLC in case of multiple sockets). Note, however, on Intel processors that don't support DDIO, IO memory read requests may cause speculative read requests to be sent to memory in parallel with the LLC lookup.
The DATA_REQ_OF_CPU event has many subevents. The inbound memory metrics measured by the pcm-iio tool don't include all types of memory requests. Specifically, they don't include atomic memory reads and writes and IOMMU memory requests, which may consume memory bandwdith.
Some subevents count non-memory requests. For example, there are peer-to-peer requests (from one IIO to another).
An IO device may want to access memory on a NUMA node that is different from the node to which it's connected. In this case, it will consume memory bandwidth on a different NUMA node.
Now I realize the statement you quoted is a little ambiguous; I don't remember whether I was talking specifically about the metrics measured by pcm-iio or the event in general or whether "memory bandwdith" refers to total memory bandwidth or only the portion consumed by IO devices attached to an IIO. Although the statement interpreted in any of these ways is correct for the reasons mentioned above.
The pcm-iio tool only measures IO bandwdith. Use instead the pcm-memory tool for measuring memory bandwdith, which utilizes the performance events of the IMCs. It appears to me that the none of the PCM tools can measure memory bandwdith consumed by IO devices, which requires using the CBox events.
The main source of information on uncore performance events is the Intel uncore manuals. You'll find nice figures in the Introduction chapters of these manuals that show how the different units of a processor are connected to each other.

How to check disk read or write utilization

I want to know disk read / write utilization
especially, only read or write utilization or write utilization
I tried using iostat but it shows entire utilization,
How to check only disk read or only write utilization (The progress is consist of read + write works)
Perhaps iotop is the right tool for you.
On the left you can see which process causes how much load and the hdparm test I ran in the session on the right is clearly shown there.
If you wish to get the result in a graphical format then go with munin monitoring tool http://munin-monitoring.org/ It provide graphical information about Disk IO per device, Disk latency per device, Disk utilization and Disk throughput.

Profiling resource usage - CPU, memory, hard-drive - of a long-running process on Linux?

We have a process that takes about 20 hours to run on our Linux box. We would like to make it faster, and as a first step need to identify bottlenecks. What is our best option to do so?
I am thinking of sampling the process's CPU, RAM, and disk usage every N seconds. So unless you have other suggestions, my specific questions would be:
How much should N be?
Which tool can provide accurate readings of these stats, with minimal interference or disruption from the fact that the tool itself is running?
Any other tips, nuggets of wisdom, or references to other helpful documents would be appreciated, since this seems to be one of these tasks where you can make a lot of time-consuming mistakes and false-starts as a newbie.
First of all, what you want and what you are asking is completely different.
Monitoring is required when you are running it for first time i.e. when you don't know its resource utilization (CPU, Memory,Disk etc.).
You can follow below procedure to drill down the bottleneck,
Monitor system resources (Generally 10-20 seconds interval should be fine with Munin, ganglia or other tool).
In this you should be able to identify if your hw is bottleneck or not i.e are you running out of resources Ex. 100% cpu util, very low memory, high io etc.
If this your case then probably think about upgrading hw or tuning the existing.
Then you tune your application/utility. Use profilers/loggers to find out which method, process is taking time. Try to tune that process. If you have single threaded codes then probably use parallelism. If DB etc. are involved try to tune your queries, DB params.
Then again run test with monitoring to drill down more :)
I think a graph representation should be helpful for solving your problem and i advice you Munin.
It's a resource monitoring tool with a web interface. By default it monitors disk IO, memory, cpu, load average, network usage... It's light and easy to install. It's also easy to develop your own plugins and set alert thresholds.
http://munin-monitoring.org/
Here is an example of what you can get from Munin : http://demo.munin-monitoring.org/munin-monitoring.org/demo.munin-monitoring.org/

Scaling in Windows Azure for IO Performance

Windows Azure advertises three types of IO performance levels:
Extra Small : Low
Small: Moderate
Medium and above: High
So, if I have an IO bound application (rather than CPU or Memory bound) and need at least 6 CPUs to process my work load - will I get better IO performance with 12-15 Extra Smalls, 6 Smalls, or 3 Mediums?
I'm sure this varies based on applications - is there an easy way to go about testing this? Are there any numbers that give a better picture of how much of an IO performance increase you get as you move to large instance roles?
It seems like the IO performance for smaller roles could be equivalent to the larger ones, they are just the ones that get throttled down first if the overall load becomes too great. Does that sound right?
Windows Azure compute sizes offer approx. 100Mbps per core. Extra Small instances are much lower, at 5Mbps. See this blog post for more details. If you're IO-bound, the 6-Small setup is going to offer far greater bandwidth than 12 Extra-Smalls.
When you talk about processing your workload, are you working off a queue? If so, multiple worker roles, each being Small instance, could then each work with a 100Mbps pipe. You'd have to do some benchmarking to determine if 3 Mediums gives you enough of a performance boost to justify the larger VM size, knowing that when workload is down, your "idle" cost footprint per hour is now 2 cores (medium, $0.24) vs 1 (small, $0.12).
As I understand it, the amount of IO allowed per-core is constant and supposed to be dedicated. But I haven't been able to get formal confirmation of this. This likely is different for x-small instances which operatin in a shared mode and not dedicated like the other Windows Azure vm instances.
I'd imagine what you suspect is in fact true, that even being IO-bound varies by application. I think you could accomplish your goal of timing by using Timers and writing the output to a file on storage you could then retrieve. Do some math to figure out you can process X number of work units / hour by cramming as many through a small then a medium instance as possible. If your work unit size drastically fluctuates, you might have to do some averaging too. I would always prefer smaller instances if possible and just spin up more copies as you have need for more firepower.

Resources