I have a program which parses the output of the linux command perf. It requires the use of option -x, (the field separator option. I want to extract elapsed time (not task-time or cpu-clock) using perf. However when I use the -x option, the elapsed time is not present in the output and I cannot find a corresponding perf event. Here are the sample outputs
perf stat ls
============
Performance counter stats for 'ls':
0.934889 task-clock (msec) # 0.740 CPUs utilized
6 context-switches # 0.006 M/sec
0 cpu-migrations # 0.000 K/sec
261 page-faults # 0.279 M/sec
1,937,910 cycles # 2.073 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,616,944 instructions # 0.83 insns per cycle
317,016 branches # 339.095 M/sec
12,439 branch-misses # 3.92% of all branches
0.001262625 seconds time elapsed //here we have it
Now with field separator option
perf stat -x, ls
================
2.359807,task-clock
6,context-switches
0,cpu-migrations
261,page-faults
1863028,cycles
<not supported>,stalled-cycles-frontend
<not supported>,stalled-cycles-backend
1670644,instructions
325047,branches
12251,branch-misses
Any help is appreciated
# perf stat ls 2>&1 >/dev/null | tail -n 2 | sed 's/ \+//' | sed 's/ /,/'
0.002272536,seconds time elapsed
Starting with kernel 5.2-rc1, a new event called duration_time is exposed by perf statto solve exactly this problem. The value of this event is exactly equal to the time elapsed value, but the unit is nanoseconds instead of seconds.
Related
When the perf stat command is used, many default events are measured. For example, when I run perf stat ls, I obtain the following output:
Performance counter stats for 'ls':
0,55 msec task-clock # 0,598 CPUs utilized
0 context-switches # 0,000 /sec
0 cpu-migrations # 0,000 /sec
99 page-faults # 179,071 K/sec
2 324 694 cycles # 4,205 GHz
1 851 372 instructions # 0,80 insn per cycle
357 918 branches # 647,403 M/sec
12 897 branch-misses # 3,60% of all branches
0,000923884 seconds time elapsed
0,000993000 seconds user
0,000000000 seconds sys
Now, let's suppose I also want to measure the cache-references and cache-misses events.
If I run perf stat -e cache-references,cache-misses, the output is:
Performance counter stats for 'ls':
101 148 cache-references
34 261 cache-misses # 33,872 % of all cache refs
0,000973384 seconds time elapsed
0,001014000 seconds user
0,000000000 seconds sys
Is there a way to add events with the -e flag, but also keep the default events shown when not using -e (without having to list all of them explicitly in the command) ?
I am trying to profile my userspace program on aria10 fpga board (with 2 ARM Cortex A9 CPUs) which has PMU support. I am running windriver linux version 9.x. I built my kernel with almost all of the CONFIG_ options people suggested over the internet. Also, my pgm is compiled with –fno-omit-frame-pointer and –g options.
What I see is that ‘perf record’ doesn’t generate any samples at all. ‘perf stat true’ output looks to be valid though (not sure what to make out of it). Does anyone have suggestion/ideas why I am not seeing any sample being generated?
~: perf record --call-graph dwarf -- my_app
^C
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data ]
~: perf report -g graph --no-children
Error:
The perf.data file has no samples!
To display the perf.data header info, please use --header/--header-only options.
~: perf stat true
Performance counter stats for 'true':
1.095300 task-clock (msec) # 0.526 CPUs utilized
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
22 page-faults # 0.020 M/sec
1088056 cycles # 0.993 GHz
312708 instructions # 0.29 insn per cycle
29159 branches # 26.622 M/sec
16386 branch-misses # 56.20% of all branches
0.002082030 seconds time elapsed
I don't use a VM in this setup. Arria10 is intel FPGA with 2 ARM CPUs that supports PMU.
Edit:
1. I realize now that ARM CPU has HW PMU support (opposite to what I mentioned earlier). Even with HW PMU support, I am not able to do 'perf record' successfully.
This is an old question, but for people who find this via search:
perf record -e cpu-clock <command>
works for me. The problem seems to be that th default event (cycles) is not available
I need to check, using a shell script, possibly without installing any particular package (OS:Linux Suse 12), the total CPU % usage in order to monitor the level without pass the critical threshold.
It is a Huge server with 2x E5-2667 v4 8/core.
Looking over the questions I found something and I tried it:
1-top -bn1 | grep "Cpu(s)" | \sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \awk '{print 100 - $1"%"}'
2-CPU_LOAD=$(sar -P ALL 1 2 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}')
I also tried to do 100-idle from iostat
Is that really correct on a multi cpu/multi core system?
Is correct calculate the cpu total usage by using the cpu load from the uptime?
Using the code i got an avg of single core, While i need a result of a total CPU used in %
Regards,
Thanks
You are probably better implementing the solution completely in awk:
top -bn1 | awk -F, '/id/ { for (i=1;i<=NF;i++) { if ( $i ~ /[[:digit:]]{2}.[[:digit:]][[:blank:]]+id/ ) { split($i,arry," ");print arry[1]" - idle" }'
Take the output from top and then check for any output containing id. If the condition is met, take each comma delimited piece of data on the line and pattern match against 2 numbers, a decimal and then one or more numbers, a blank and then id. If this is the case, split the variable based on a blank space into an array and print the first element.
If you would like to get any detailed stats you might also use perf.
In this example you may see the number of all CPU cycles during 1 second:
-bash-4.1# perf stat -a sleep 1
Performance counter stats for 'system wide':
4002.822144 task-clock (msec) # 3.999 CPUs utilized (100.00%)
22809 context-switches # 0.006 M/sec (100.00%)
1332 cpu-migrations # 0.333 K/sec (100.00%)
23794 page-faults # 0.006 M/sec
5409531172 cycles # 1.351 GHz (100.00%)
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
3874289082 instructions # 0.72 insns per cycle (100.00%)
715152901 branches # 178.662 M/sec (100.00%)
20583742 branch-misses # 2.88% of all branches
1.001065623 seconds time elapsed
You may also check uptime.
Uptime gives a one line display of the following information. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.
I am trying to familiarize myself with perf and run it against various programs I wrote.
When I launch it against program that is 100% single threaded, perf shows that it takes two cores on machine (task-clock event).
Here's the example output:
perf stat -a --per-core python3 test.py
Performance counter stats for 'system wide':
S0-C0 1 19004.951263 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C0 1 5,582 context-switches (100.00%)
S0-C0 1 19 cpu-migrations (100.00%)
S0-C0 1 3,746 page-faults
S0-C0 1 <not supported> cycles
S0-C0 1 <not supported> stalled-cycles-frontend
S0-C0 1 <not supported> stalled-cycles-backend
S0-C0 1 <not supported> instructions
S0-C0 1 <not supported> branches
S0-C0 1 <not supported> branch-misses
S0-C1 1 19004.950059 task-clock (msec) # 1.000 CPUs utilized (100.00%)
S0-C1 1 6,752 context-switches (100.00%)
S0-C1 1 25 cpu-migrations (100.00%)
S0-C1 1 935 page-faults
S0-C1 1 <not supported> cycles
S0-C1 1 <not supported> stalled-cycles-frontend
S0-C1 1 <not supported> stalled-cycles-backend
S0-C1 1 <not supported> instructions
S0-C1 1 <not supported> branches
S0-C1 1 <not supported> branch-misses
19.004688019 seconds time elapsed
It even shows that simple sleep command takes two cores on my computer and I can't explain this. I understand that OS scheduler can reassign active core for any process, but in this case CPU utilization would reflect that.
Can anyone explain this?
According to man page of perf stat subocmmand, you have -a option to profile full system:
http://man7.org/linux/man-pages/man1/perf-stat.1.html
-a, --all-cpus
system-wide collection from all CPUs (default if no target is
specified)
In this "system-wide" mode perf stat (and perf record too) will count events on (or profile for record) all CPUs in the system. When used without additional argument of command, perf will run until interrupted by Ctrl-C. With argument of command, perf will count/profile until the command works. Typical usage is
perf stat -a sleep 10 # Profile counting every CPU for 10 seconds
perf record -a sleep 10 # Profile with cycles every CPU for 10 seconds to perf.data
For getting stats of single command use single process profiling (without -a option)
perf stat python3 test.py
For profiling (perf record) you may run without -a option; or you may use -a and later do some manual filtering in perf report, focusing only on the pids/tids/dsos of your application (This can be very useful if command to profile uses some interprocess requests to other daemons to do lot of CPU work).
--per-core, -A, -C <cpulist>, --per-socket options are only for system-wide -a mode. Try --per-thread with -p pid attach to process option.
I want to measure stalled cycles for my application using perf.
When I try: perf stat -B dd if=/dev/zero of=/dev/null count=1000000
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB) copied, 0.218456 s, 2.3 GB/s
Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000':
218.420011 task-clock # 0.995 CPUs utilized
25 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
255 page-faults # 0.001 M/sec
821,183,099 cycles # 3.760 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1,526,427,190 instructions # 1.86 insns per cycle
292,281,624 branches # 1338.163 M/sec
1,013,837 branch-misses # 0.35% of all branches
0.219551862 seconds time elapsed
As you can see, I'm getting for stalled-cycles* events. I couldn't find a solution or explanation for this online.
My kernel version is 3.2.0-59, perf version is 3.2.54, and my CPU is an i7-3770.