why does perf record and annotate not work? - linux

I'm stumped, I read the perf tutorial and am trying to do a simple test beyond "perf stat" which works. However perf record either doesnt work ,or perf annotate shows no samples recorded. Running perf
For example(im running with sudo because without it i get a bunch of errors which i will post at the end):
sudo perf record -e cycles,instructions,cache-misses -a -c 1 ./FooExe
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 1.794 MB perf.data (~78393 samples) ]
.
sudo perf report -D -i perf.data |grep RECORD_SAMPLE |wc -l
Failed to open /tmp/perf-23796.map, continuing without symbols
20486
.
sudo perf annotate -d ./FooExe
the perf.data file has no samples! Press any key
So thats as far as i get. I tried to rebuild perf for my ssystem from source but that didnt seem to help either.
Im using Ubuntu 14.04 kernel 3.19.0-49-generic. This is on intel i7 I4510U cpu . I made sure to compile my program with symbols , but i get the same results regardless of which application i try to profile.
-- if i run without sudo :
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Cannot read kernel map
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv

I just tried your command. The problem was that you used -a to profile all processes system-wide, so it never ran ./FooExe. You can confirm this with strace -f perf ... ./FooExe, and note the lack of any execve system call. And also the fact that it returns instantly, even if FooExe should have taken several seconds.
Here's an example of recording samples for a busy-loop awk command:
perf record -e cycles,instructions,cache-misses awk 'BEGIN{for(i=0;i<40000000;i++){}}'
Now perf report works. You don't need to specify the executable for the report command, because perf.data only has data for the one executable.
This works the same way with the ocperf.py wrapper, but you could record events for more uarch-specific events using symbolic names (instead of looking up codes and numeric arguments in -e):
$ ocperf.py record -e cycles,cache-misses,uops_dispatched_port.port_0 awk 'BEGIN{for(i=0;i<40000000;i++){}}'
perf record -e cycles,cache-misses,cpu/event=0xa1,umask=0x1,name=uops_dispatched_port_port_0,period=2000003/ awk 'BEGIN{for(i=0;i<40000000;i++){}}'
(warning lines about kernel symbols)
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.352 MB perf.data (7819 samples)
$ ocperf.py report

Related

LLC-Miss Report in the Linux Perf_Events

I ran the following command in the shell:
sudo perf record -F 60000 -e LLC-misses:u mplayer video.mp4
After the program finished execution, I ran this command:
sudo perf report
The report shows the locations of code (in MPlayer and the associated shared libraries) in the userspace with the percentage of LLC misses. Is it possible to determine the origin of the misses? i.e., if they are instruction misses or data misses?
One more question: when I change LLC-misses:u to LLC-misses:up the generated report will be empty. What is wrong with the PEBS switch addition?

perf: why don't I have "syscall" counters?

There are apparently some counters in Linux perf like syscall:sys_enter_select, but on my system perf list does not show any of them
Evidence that other people do have these counters is here: http://www.brendangregg.com/blog/2014-07-03/perf-counting.html
If I run perf top -e 'syscalls:sys_enter_*' it says:
Can't open event dir: Permission denied
invalid or unsupported event: 'syscalls:sys_enter_*'
Other event types (the ones in perf list) work fine.
What do I need to do to access syscall counters in perf? I'm using Linux kernel and perf version 3.10 on x86_64.
Some perf counters, including all the syscall ones, are only available to the root user. sudo perf list will show all the counters, including syscall ones assuming the kernel is built with CONFIG_HAVE_SYSCALL_TRACEPOINTS (see Grisha Levit's answer regarding that).
So, to make perf top -e 'syscalls:sys_enter_*' work, run it under sudo--even if you do not need sudo for other counters like cycles.
You must have an old version of perf to go with your crusty old 3.10 kernel.
On a modern system (x86-64 Arch Linux with Linux 4.15.8-1-ARCH, and a matching version of perf), perf answers this question for you:
$ perf stat -e 'syscalls:sys_enter_*stat*' ls -l
event syntax error: 'syscalls:sys_enter_*stat*'
\___ can't access trace events
Error: No permissions to read /sys/kernel/debug/tracing/events/syscalls/sys_enter_*stat*
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug/tracing'
Run 'perf list' for a list of valid events
...
$ ll /sys/kernel/debug/ -d
drwx------ 33 root root 0 Mar 14 00:02 /sys/kernel/debug/
Interesting that you could make it world-readable, similar to how you can put kernel.perf_event_paranoid = 0 and kernel.yama.ptrace_scope = 0 in /etc/sysctl.d/99-local.conf for convenient debugging / tracing / profiling on a single-user desktop without using root all the time.
These will be missing if the kernel was not built with CONFIG_HAVE_SYSCALL_TRACEPOINTS.
You can check like so:
# grep TRACEPOINTS "/boot/config-$(uname -r)"
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y

Linux perf tool run issues

I am using perf tool to bench mark one of my projects. The issue I am facing is that wo get automatihen I run perf tool on my machine, everything works fine.
However, I am trying to run perf in automation servers to make it part of my check in process but I am getting the following error from automation servers
WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,
check /proc/sys/kernel/kptr_restrict.
Samples in kernel functions may not be resolved if a suitable vmlinux
file is not found in the buildid cache or in the vmlinux path.
Samples in kernel modules won't be resolved at all.
If some relocation was applied (e.g. kexec) symbols may be misresolved
even with a suitable vmlinux or kallsyms file.
Error:
Permission error - are you root?
Consider tweaking /proc/sys/kernel/perf_event_paranoid:
-1 - Not paranoid at all
0 - Disallow raw tracepoint access for unpriv
1 - Disallow cpu events for unpriv
2 - Disallow kernel profiling for unpriv
fp: Terminated
I tried changing /proc/sys/kernel/perf_event_paranoid to -1 and 0 but still see the same issue.
Anybody seen this before? Why would I need to run the command as root? I am able to run it on my machine without sudo.
by the way, the command is like this:
perf record -m 32 -F 99 -p xxxx -a -g --call-graph fp
You can't use -a (full system profiling) and sample kernel from non-root user: http://man7.org/linux/man-pages/man1/perf-record.1.html
Try running it without -a option and with event limited to userspace events by :u suffix:
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cycles:u
Or use software event for virtualized platforms without PMU passthrough
perf record -m 32 -F 99 -p $PID -g --call-graph fp -e cpu-clock:u

Building a flamegraph of multiple node.js processes while using --perf_basic_prof_only_functions

We currently use node clustering in order to get the most out of our machines and would like to be able to profile all processes simultaneously (only the function calls, we're using --perf_basic_prof_only_functions). While getting the information and building flamegraphs work fine, we seem to get a lot of entries for [perf-$PID.map] making it seem as though were either missing some invocation to tell one of the tools to account for multiple perf files.
Specifically, we're doing something similar to the following:
sudo perf record -F 99 -o perf.data -p $PIDS -g -- sleep 30
sudo perf script -i perf.data > out.nodestacks
# Using http://github.com/brendangregg/FlameGraph
./stackcollapse-perf.pl < ../out.nodestacks | ./flamegraph.pl > ../flame.svg
But looking at the output of perf script there are lots of entries similar to:
3881ddc630da [unknown] (/tmp/perf-20350.map)
3881dc5aae44 [unknown] (/tmp/perf-20350.map)
3881dc7d7275 [unknown] (/tmp/perf-20350.map)
3881dc7d6f4b [unknown] (/tmp/perf-20350.map)
3881dc7d6953 [unknown] (/tmp/perf-20350.map)
Has anyone else run into this issue? Thanks!
Did you try with --perf_basic_prof (instead of --perf_basic_prof_only_functions)?
At least that resolved for me some missing, not translated entries.
In my case this were entries like:
Builtin:JSEntryTrampoline
Stub:JSEntryStub

How to measure IOPS for a command in linux?

I'm working on a simulation model, where I want to determine when the storage IOPS capacity becomes a bottleneck (e.g. and HDD has ~150 IOPS, while an SSD can have 150,000). So I'm trying to come up with a way to benchmark IOPS in a command (git) for some of it's different operations (push, pull, merge, clone).
So far, I have found tools like iostat, however, I am not sure how to limit the report to what a single command does.
The best idea I can come up with is to determine my HDD IOPS capacity, use time on the actual command, see how long it lasts, multiply that by IOPS and those are my IOPS:
HDD ->150 IOPS
time df -h
real 0m0.032s
150 * .032 = 4.8 IOPS
But, this is of course very stupid, because the duration of the execution may have been related to CPU usage rather than HDD usage, so unless usage of HDD was 100% for that time, it makes no sense to measure things like that.
So, how can I measure the IOPS for a command?
There are multiple time(1) commands on a typical Linux system; the default is a bash(1) builtin which is somewhat basic. There is also /usr/bin/time which you can run by either calling it exactly like that, or telling bash(1) to not use aliases and builtins by prefixing it with a backslash thus: \time. Debian has it in the "time" package which is installed by default, Ubuntu is likely identical, and other distributions will be quite similar.
Invoking it in a similar fashion to the shell builtin is already more verbose and informative, albeit perhaps more opaque unless you're already familiar with what the numbers really mean:
$ \time df
[output elided]
0.00user 0.00system 0:00.01elapsed 66%CPU (0avgtext+0avgdata 864maxresident)k
0inputs+0outputs (0major+261minor)pagefaults 0swaps
However, I'd like to draw your attention to the man page which lists the -f option to customise the output format, and in particular the %w format which counts the number of times the process gave up its CPU timeslice for I/O:
$ \time -f 'ios=%w' du Maildir >/dev/null
ios=184
$ \time -f 'ios=%w' du Maildir >/dev/null
ios=1
Note that the first run stopped for I/O 184 times, but the second run stopped just once. The first figure is credible, as there are 124 directories in my ~/Maildir: the reading of the directory and the inode gives roughly two IOPS per directory, less a bit because some inodes were likely next to each other and read in one operation, plus some extra again for mapping in the du(1) binary, shared libraries, and so on.
The second figure is of course lower due to Linux's disk cache. So the final piece is to flush the cache. sync(1) is a familiar command which flushes dirty writes to disk, but doesn't flush the read cache. You can flush that one by writing 3 to /proc/sys/vm/drop_caches. (Other values are also occasionally useful, but you want 3 here.) As a non-root user, the simplest way to do this is:
echo 3 | sudo tee /proc/sys/vm/drop_caches
Combining that with /usr/bin/time should allow you to build the scripts you need to benchmark the commands you're interested in.
As a minor aside, tee(1) is used because this won't work:
sudo echo 3 >/proc/sys/vm/drop_caches
The reason? Although the echo(1) runs as root, the redirection is as your normal user account, which doesn't have write permissions to drop_caches. tee(1) effectively does the redirection as root.
The iotop command collects I/O usage information about processes on Linux. By default, it is an interactive command but you can run it in batch mode with -b / --batch. Also, you can a list of processes with -p / --pid. Thus, you can monitor the activity of a git command with:
$ sudo iotop -p $(pidof git) -b
You can change the delay with -d / --delay.
You can use pidstat:
pidstat -d 2
More specifically pidstat -d 2 | grep COMMAND or pidstat -C COMMANDNAME -d 2
The pidstat command is used for monitoring individual tasks currently being managed by the Linux kernel. It writes to standard output activities for every task selected with option -p or for every task managed by the Linux kernel if option -p ALL has been used. Not selecting any tasks is equivalent to specifying -p ALL but only active tasks (tasks with non-zero statistics values) will appear in the report.
The pidstat command can also be used for monitoring the child processes of selected tasks.
-C commDisplay only tasks whose command name includes the stringcomm. This string can be a regular expression.

Resources