Linux iostat "averaged" result over a period of time - linux

I am using the iostat utility on my RedHat Linux server to monitor the performance of a disk. When I use "iostat -xd sdh 1", I get the perf result printed every one second. When I use "iostat -xd sdh 5", I get the perf result printed every five second. My feeling is the latter command is printing a snapshot of the perf every five second, rather than averaging over the past 5 seconds. Am I correct in my understanding?
If so, is there a way I can make iostat print the perf. number averaged over n seconds, or is there some other utility that will do that.
Currently, the perf number is fluctuating within a range, and I want to get a somewhat "stable" number. I am hoping that averaging over a period of time will give me such a number.
Thank you,
Ahmed.

Related

How to find the average CPU of a process over X amount of time in Linux

I'm trying to find out how I can get \ calculate the CPU utilization of a specific process over X amount of time ( I write my code in python over a Linux based system ).
What I want to get for example is the average CPU of a process in the last hour\day\10 minutes...
Is there a command or a calculation I can run?
*I can't run a command like "top" in the background for X time and calculate the CPU, I need it to be in one set of commands or calculation.
I tried top research on top command but I didn't found useful info for my case.
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu - give the average consumption on the process lifetime
Can there be a way to use uptime or proc[pid]\stat to calculate this?
Thanks,
What about using pidstat?
$ pidstat -p 12345 10
Will output the stats for pid 12345 every 10 seconds. This include CPU%
From there you can run it on background, and redirect output to a file:
$ pidstat -p 12345 10 > my_pid_stats.txt &
Here is a Link with several examples. There is a lot of flexibility in the output, so you can probably customize it to better suit your needs:
https://www.thegeekstuff.com/2014/11/pidstat-examples/
pidstat is part of the sysstat package on ubuntu, in case you decide to install it.

linux perf record: difference between count (-c) and frequency (-F) options

I'm trying to understand what the -c and -F options of perf record really do but I cannot explain what I'm seeing. I'm running these commands:
perf record -a -F <frequency> sleep 1
and
perf record -a -c <count> sleep 1
trying different values of frequency and count. The results I get are
the following
In the first table I set the frequency and in the second one the count. How do frequency and count affect the number of events? I thought the number of events was independent on the frequency and count, but clearly it's not the case. What does perf actually do?
Count and frequency are two fundamental switches that tune the rate of sampling when using perf record (which does sampling internally).
Count
When you run perf record -c <number>, you are specifying the sample period (where "number" is the sample period). That is, for every "number"th occurrence of the event a sample will be recorded. The sample will be recorded when the performance counter that keeps track of the number of events has overflowed.
I am guessing you are obtaining the number of events with the help of perf report. Note that perf report will never report the actual number of events, but only an approximate. The number of events will keep changing as you keep tweaking the sample period. perf report will only read the perf.data file that perf record generates, and based on the size of the file generated, it makes an assumption of the number of samples recorded (by knowing the size of a sample recorded in memory). The actual number of events recorded is obtained by -
Number of events = Fixed Sample Period * Number of samples collected
where Fixed Sample Period is what you specified with perf record -c.
Frequency
This is the other way around to express the sampling period, that is to specify the average rate of samples per second (frequency) - which you can do using perf record -F. So perf record -F 1000 will record around 1000 samples per second and these samples will be generated when the hardware/PMU counter corresponding to the event overflows. This means that the kernel will dynamically adjust the sampling period to make sure that the sampling process adheres to the sampling frequency.
This is how the sample period gets updated dynamically.
Higher the sampling frequency, higher the number of samples collected (almost proportionately).
The variation in the sampling period can be seen by running the command -
sudo perf report -D -i perf.data | fgrep RECORD_SAMPLE
Whenever the sampling period keeps varying, the total number of events will keep incrementing with the variation in the sampling period. And when the sampling period remains fixed, the total number of events remain fixed and is obtained by the formula showed above. The total number of events will be approximate in both the cases.

PERF STAT does not count memory-loads but counts memory-stores

Linux Kernel : 4.10.0-20-generic (also tried this on 4.11.3)
Ubuntu : 17.04
I have been trying to collect stats of memory-accesses using perf stat. I am able to collect stats for memory-stores but the count for memory-loads return me a 0 value.
The below is the details for memory-stores :-
perf stat -e cpu/mem-stores/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
158,115,510 cpu/mem-stores/u
0.559922797 seconds time elapsed
For memory-loads, I get a 0 count as can be seen below :-
perf stat -e cpu/mem-loads/u ./libquantum_base.arnab 100
N = 100, 37 qubits required
Random seed: 33
Measured 3277 (0.200012), fractional approximation is 1/5.
Odd denominator, trying to expand by 2.
Possible period is 10.
100 = 4 * 25
Performance counter stats for './libquantum_base.arnab 100':
0 cpu/mem-loads/u
0.563806170 seconds time elapsed
I cannot understand why this does not count properly. Should I use a different event in any way to get proper data ?
The mem-loads event is mapped to the MEM_TRANS_RETIRED.LOAD_LATENCY_GT_3 performance monitoring unit event on Intel processors. The events MEM_TRANS_RETIRED.LOAD_LATENCY_* are special and can only be counted by using the p modifier. That is, you have to specify mem-loads:p to perf to use the event correctly.
MEM_TRANS_RETIRED.LOAD_LATENCY_* is a precise event and it only makes sense to be counted at the precise level. According to this Intel article (emphasis mine):
When a user elects to sample one of these events, special hardware is
used that can keep track of a data load from issue to completion.
This is more complicated than simply counting instances of an event
(as with normal event-based sampling), and so only some loads are
tracked. Loads are randomly chosen, the latency determined for each,
and the correct event(s) incremented (latency >4, >8, >16, etc). Due
to the nature of the sampling for this event, only a small percentage
of an application's data loads can be tracked at any one time.
As you can see, MEM_TRANS_RETIRED.LOAD_LATENCY_* by no means count the total number of loads and it is not designed for that purpose at all.
If you want to to determine which instructions in your code are issuing load requests that take more than a specific number of cycles to complete, then MEM_TRANS_RETIRED.LOAD_LATENCY_* is the right performance event to use. In fact, that is exactly the purpose of perf-mem and it achieves its purpose by using this event.
If you want to count the total number of load uops retired, then you should use L1-dcache-loads, which is mapped to the MEM_UOPS_RETIRED.ALL_LOADS performance event on Intel processors.
On the other hand, mem-stores and L1-dcache-stores are mapped to the exact same performance event on all current Intel processors, namely, MEM_UOPS_RETIRED.ALL_STORES, which does count all retired store uops.
So in summary, if you are using perf-stat, you should (almost) always use L1-dcache-loads and L1-dcache-stores to count retired loads and stores, respectively. These are mapped to the raw events you have used in the answer you posted, only more portable because they also work on AMD processors.
I have used a Broadwell(CPU e5-2620) server machine to collect all of the below events.
To collect memory-load events, I had to use a numeric event value. I basically ran the below command -
./perf record -e "r81d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
Here r81d0 represents the raw event for counting "memory loads amongst all instructions retired". "u" as can be understood represents user-space.
The below command, on the other hand,
./perf record -e "r82d0:u" -c 1 -d -m 128 ../../.././libquantum_base 20
has "r82d0:u" as a raw event representing "memory stores amongst all instructions retired in userspace".

How does perf associate events to functions?

More precisely how does the perf tool associate PMU events to functions
i already realized that when the kernel perf subsystem records the event counters it also records the Program Counter (PC) so it can associate the count to a function.
However to really get fine grain result, you need to sample the counters in a very high rate, otherwise you may associate counters to a group of functions.
But reading the counters and writing the sampled data (counters, PC, call-stack) to the perf mmap space is very intrusive.
I read in some sources that this sampling only happens when the PMU counters overflow, but this is can be very coarse unless i am setting the counters to overflow very quickly
what am i missing here ?
perf record is statistical profiling tool, it either program hardware performance event monitor unit (PMU) to overflow after some number of counts (for example with -e cycles -c 1000000 write -1000000 to counter and enable counting cycles; with -F or without freq/period argument it will autotune value), on overflow interrupt perf will reprogram it for next count. So it will have several hundreds or few thousands events per second. Or it can use OS timer interrupt (-e task-clock) to get periodic samples. On every sample (or on interrupt from hardware PMU) perf will record current PC (EIP) and/or callstack; and it does not record current value of counter (check full dump of data stored in the perf.data with perf script or perf script -D; or code of sample event dumping - there is sample->ip but not current count of PMU).
perf report will parse perf.data to get all PC recorded in it. It will count how many times each PC was sampled to build histogram [PC] -> sample_count. Every PC will be associated with the exact function it belongs (perf report will parse memory map, as mmap events are recorded in perf.data too, open every binary used, find symbols table of every binary).
Actual code of perf report is in linux/tools/perf/builtin-report.c: cmd_report/__cmd_report -> perf_session__process_events -> some magic -> process_sample_event to record all mentioned in perf.data ip (PC) values with hist_entry_iter__add(&iter, &al, rep->max_stack, rep); into histogram with hist_iter__report_callback:
hist_entry__inc_addr_samples(he, evsel->idx, al->addr);
. . . (perf/util/annotate.c) __symbol__inc_addr_samples
611 h->addr[offset]++;
Then it will output collected histogram with report__browse_hists -> perf_evlist__tty_browse_hists -> hists__fprintf_nr_sample_events(hists, rep, evname, stdout);.
Every sample is already associated with exact function (and bit inexact instruction inside it because of out-of-order nature of CPUs and not-precise PMU overflow event), and this is how statistical profiling works. When your program runs for short time (less than second) and/or you have too low sampling frequency, you may have few samples recorded in perf.data. But if you has more than several hundreds samples, you can find most cpu-heavy functions (they probably have pareto rule and runs for around several dozens percents of program run time. When you want to see smaller functions (around several percent of running time), use thousands or tens or thousands samples and do some statistical estimations (you will not get correct percent of function which runs for 0.1% of time when you have 100 or 1000 samples).

Is the very first top command result on Linux credible?

When I execute shell scripts bash.sh like this:
for((i = 0; i<=5 ;i++))
do
./test.sh &
done
and test.sh like this:
for((i = 0; i<=10000000 ;i++))
do
echo 'hh'
done
the cpu us usage should be very high.
But when I use top command to check the result, the cpu us value of the very first time of the top command result, before it refreshes, is very small.
So the cpu usage is not credible!
Are the other values credible the first time top command result refresh?
The load average reported on the top right is the average over the last 1, 5 and 15 minutes (from the man top page):
system load avg over the last 1, 5 and 15 minutes
So this value will be averaged over the last minute, and as a result your process with a high CPU usage will not have an immediate effect on it.
The %CPU column shows the usage since the last update:
The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time
So on the first time, the %CPUs are off and will change when the screen updates.

Resources