How to keep the default events when using `perf stat` with custom events - linux

When the perf stat command is used, many default events are measured. For example, when I run perf stat ls, I obtain the following output:
Performance counter stats for 'ls':
0,55 msec task-clock # 0,598 CPUs utilized
0 context-switches # 0,000 /sec
0 cpu-migrations # 0,000 /sec
99 page-faults # 179,071 K/sec
2 324 694 cycles # 4,205 GHz
1 851 372 instructions # 0,80 insn per cycle
357 918 branches # 647,403 M/sec
12 897 branch-misses # 3,60% of all branches
0,000923884 seconds time elapsed
0,000993000 seconds user
0,000000000 seconds sys
Now, let's suppose I also want to measure the cache-references and cache-misses events.
If I run perf stat -e cache-references,cache-misses, the output is:
Performance counter stats for 'ls':
101 148 cache-references
34 261 cache-misses # 33,872 % of all cache refs
0,000973384 seconds time elapsed
0,001014000 seconds user
0,000000000 seconds sys
Is there a way to add events with the -e flag, but also keep the default events shown when not using -e (without having to list all of them explicitly in the command) ?

Related

Linux perf record not generating any samples

I am trying to profile my userspace program on aria10 fpga board (with 2 ARM Cortex A9 CPUs) which has PMU support. I am running windriver linux version 9.x. I built my kernel with almost all of the CONFIG_ options people suggested over the internet. Also, my pgm is compiled with –fno-omit-frame-pointer and –g options.
What I see is that ‘perf record’ doesn’t generate any samples at all. ‘perf stat true’ output looks to be valid though (not sure what to make out of it). Does anyone have suggestion/ideas why I am not seeing any sample being generated?
~: perf record --call-graph dwarf -- my_app
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]
~: perf report -g graph --no-children
Error:
The perf.data file has no samples!
To display the perf.data header info, please use --header/--header-only options.
~: perf stat true
Performance counter stats for 'true':
1.095300 task-clock (msec) # 0.526 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
22 page-faults # 0.020 M/sec
1088056 cycles # 0.993 GHz
312708 instructions # 0.29 insn per cycle
29159 branches # 26.622 M/sec
16386 branch-misses # 56.20% of all branches
0.002082030 seconds time elapsed
I don't use a VM in this setup. Arria10 is intel FPGA with 2 ARM CPUs that supports PMU.
Edit:
1. I realize now that ARM CPU has HW PMU support (opposite to what I mentioned earlier). Even with HW PMU support, I am not able to do 'perf record' successfully.
This is an old question, but for people who find this via search:
perf record -e cpu-clock <command>
works for me. The problem seems to be that th default event (cycles) is not available

CPU Usage Server and Monitoring info

I need to check, using a shell script, possibly without installing any particular package (OS:Linux Suse 12), the total CPU % usage in order to monitor the level without pass the critical threshold.
It is a Huge server with 2x E5-2667 v4 8/core.
Looking over the questions I found something and I tried it:
1-top -bn1 | grep "Cpu(s)" | \sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \awk '{print 100 - $1"%"}'
2-CPU_LOAD=$(sar -P ALL 1 2 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}')
I also tried to do 100-idle from iostat
Is that really correct on a multi cpu/multi core system?
Is correct calculate the cpu total usage by using the cpu load from the uptime?
Using the code i got an avg of single core, While i need a result of a total CPU used in %
Regards,
Thanks
You are probably better implementing the solution completely in awk:
top -bn1 | awk -F, '/id/ { for (i=1;i<=NF;i++) { if ( $i ~ /[[:digit:]]{2}.[[:digit:]][[:blank:]]+id/ ) { split($i,arry," ");print arry[1]" - idle" }'
Take the output from top and then check for any output containing id. If the condition is met, take each comma delimited piece of data on the line and pattern match against 2 numbers, a decimal and then one or more numbers, a blank and then id. If this is the case, split the variable based on a blank space into an array and print the first element.
If you would like to get any detailed stats you might also use perf.
In this example you may see the number of all CPU cycles during 1 second:
-bash-4.1# perf stat -a sleep 1
Performance counter stats for 'system wide':
4002.822144 task-clock (msec) # 3.999 CPUs utilized (100.00%)
22809 context-switches # 0.006 M/sec (100.00%)
1332 cpu-migrations # 0.333 K/sec (100.00%)
23794 page-faults # 0.006 M/sec
5409531172 cycles # 1.351 GHz (100.00%)
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
3874289082 instructions # 0.72 insns per cycle (100.00%)
715152901 branches # 178.662 M/sec (100.00%)
20583742 branch-misses # 2.88% of all branches
1.001065623 seconds time elapsed
You may also check uptime.
Uptime gives a one line display of the following information. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.

perf get time elasped with field separator option

I have a program which parses the output of the linux command perf. It requires the use of option -x, (the field separator option. I want to extract elapsed time (not task-time or cpu-clock) using perf. However when I use the -x option, the elapsed time is not present in the output and I cannot find a corresponding perf event. Here are the sample outputs
perf stat ls
============
Performance counter stats for 'ls':
0.934889 task-clock (msec) # 0.740 CPUs utilized
6 context-switches # 0.006 M/sec
0 cpu-migrations # 0.000 K/sec
261 page-faults # 0.279 M/sec
1,937,910 cycles # 2.073 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,616,944 instructions # 0.83 insns per cycle
317,016 branches # 339.095 M/sec
12,439 branch-misses # 3.92% of all branches
0.001262625 seconds time elapsed //here we have it
Now with field separator option
perf stat -x, ls
================
2.359807,task-clock
6,context-switches
0,cpu-migrations
261,page-faults
1863028,cycles
<not supported>,stalled-cycles-frontend
<not supported>,stalled-cycles-backend
1670644,instructions
325047,branches
12251,branch-misses
Any help is appreciated
# perf stat ls 2>&1 >/dev/null | tail -n 2 | sed 's/ \+//' | sed 's/ /,/'
0.002272536,seconds time elapsed
Starting with kernel 5.2-rc1, a new event called duration_time is exposed by perf statto solve exactly this problem. The value of this event is exactly equal to the time elapsed value, but the unit is nanoseconds instead of seconds.

Perf does not support some performance events

I want to measure stalled cycles for my application using perf.
When I try: perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.218456 s, 2.3 GB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
218.420011 task-clock # 0.995 CPUs utilized
25 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
255 page-faults # 0.001 M/sec
821,183,099 cycles # 3.760 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,526,427,190 instructions # 1.86 insns per cycle
292,281,624 branches # 1338.163 M/sec
1,013,837 branch-misses # 0.35% of all branches
0.219551862 seconds time elapsed
As you can see, I'm getting for stalled-cycles* events. I couldn't find a solution or explanation for this online.
My kernel version is 3.2.0-59, perf version is 3.2.54, and my CPU is an i7-3770.

Calculate the average of several "time" commands in Linux

I'm profiling a program on Linux, using the "time" command. The problem is it's output is not very statistically relevant as it does only run the program once. Is there a tool or a way to get an average of several "time" runs? Possibly aswel together with statistical information such as deviation?
Here is a script I wrote to do something similar to what you are looking for. It runs the provided command 10 times, logging the real, user CPU and system CPU times to a file, and echoing tham after each command output. It then uses awk to provide averages of each of the 3 columns in the file, but does not (yet) include standard deviation.
#!/bin/bash
rm -f /tmp/mtime.$$
for x in {1..10}
do
/usr/bin/time -f "real %e user %U sys %S" -a -o /tmp/mtime.$$ $#
tail -1 /tmp/mtime.$$
done
awk '{ et += $2; ut += $4; st += $6; count++ } END { printf "Average:\nreal %.3f user %.3f sys %.3f\n", et/count, ut/count, st/count }' /tmp/mtime.$$
Use hyperfine.
For example:
hyperfine 'sleep 0.3'
Will run the command sleep 0.3 multiple times, then output something like this:
hyperfine 'sleep 0.3'
Benchmark #1: sleep 0.3
Time (mean ± σ): 306.7 ms ± 3.0 ms [User: 2.8 ms, System: 3.5 ms]
Range (min … max): 301.0 ms … 310.9 ms 10 runs
perf stat does this for you with the -r (-repeat=<n>) option, with average and variance.
e.g. using a short loop in awk to simulate some work, short enough that CPU frequency ramp-up and other startup overhead might be a factor (Idiomatic way of performance evaluation?), although it seems my CPU ramped up to 3.9GHz pretty quickly, averaging 3.82 GHz.
$ perf stat -r5 awk 'BEGIN{for(i=0;i<1000000;i++){}}'
Performance counter stats for 'awk BEGIN{for(i=0;i<1000000;i++){}}' (5 runs):
37.90 msec task-clock # 0.968 CPUs utilized ( +- 2.18% )
1 context-switches # 31.662 /sec ( +-100.00% )
0 cpu-migrations # 0.000 /sec
181 page-faults # 4.776 K/sec ( +- 0.39% )
144,802,875 cycles # 3.821 GHz ( +- 0.23% )
343,697,186 instructions # 2.37 insn per cycle ( +- 0.05% )
93,854,279 branches # 2.476 G/sec ( +- 0.04% )
29,245 branch-misses # 0.03% of all branches ( +- 12.79% )
0.03917 +- 0.00182 seconds time elapsed ( +- 4.63% )
(Scroll to the right for variance.)
You can use taskset -c3 perf stat ... to pin the task to a specific core (#3 in that case) if you have a single-threaded task and want to minimize context-switches.
By default, perf stat uses hardware perf counters to profile things like instructions, core clock cycles (not the same thing as time on modern CPUs), and branch misses. This has pretty low overhead, especially with the counters in "counting" mode instead of perf record causing interrupts to statistically sample hot spots for events.
You could use -e task-clock to just use that event without using HW perf counters. (Or if your system is in a VM, or you didn't change the default /proc/sys/kernel/perf_event_paranoid, perf might not be able to ask the kernel to program any anyway.)
For more about perf, see
https://www.brendangregg.com/perf.html
https://perf.wiki.kernel.org/index.php/Main_Page
For programs that print output, it looks like this:
$ perf stat -r5 echo hello
hello
hello
hello
hello
hello
Performance counter stats for 'echo hello' (5 runs):
0.27 msec task-clock # 0.302 CPUs utilized ( +- 4.51% )
...
0.000890 +- 0.000411 seconds time elapsed ( +- 46.21% )
For a single run, (the default with no -r), perf stat will show time elapsed, and user / sys. But -r doesn't average those, for some reason.
Like the commenter above mentioned, it sounds like you may want to use a loop to run your program multiple times, to get more data points. You can use the time command with the -o option to output the results of the time command to a text file, like so:
time -o output.txt myprog

Resources