I am using Perf tool for profiling.
I need to compare multiple reports generated from perf record command.
I couldn't find any option in perf to do so, is there a way for it or manual interpretation is the only way?
You can use perf diff for this purpose:
perf diff [oldfile] [newfile]
Link to man page: https://linux.die.net/man/1/perf-diff
Related
Online manpages like https://linux.die.net/man/1/perf-record suggest that there is an option for Linux perf command that supports incremental profiling, i.e. merging the profiling data from multiple different runs, via perf record --append. However, on my system with perf version 4.15.18, the option is missing. Is my perf version too new, or too old, to use the --append option? Alternatively, if the --append option is missing, is there another way for me to merge/append perf results from multiple runs and do incremental profiling?
This question arose when doing sampling-based profiling using LLVM. In LLVM, instrumentation-based profiling supports merging profile data across multiple runs, and I was wondering if we can do the same thing with perf.
It was removed quite a while ago, see https://lore.kernel.org/patchwork/patch/391730/ and related discussion here: https://marc.info/?l=linux-kernel&m=137031146932578&w=2. Looks like the way --append is implemented is rather simple: simply by changing the write mode of profiling data to "append", and it doesn't work well with perf report, so they decided to remove it.
There seems to be the option --timestamp-filename of timestamping the output filename, which is potentially useful to batch-sample programs using perf. When doing sampling-based optimization in LLVM, we can then use AutoFDO to convert the profiles into LLVM-readable profiles and use llvm-profdata merge to merge everything.
I know that I can get the total percentage of branch mispredictions during the execution of a program with perf stat. But how can I get the statistics for a specific branch (if or switch statement in C code)?
You can sample on the branch-misses event:
sudo perf record -e branch-misses <yourapp>
and then report it (and even selecting the function you're interested in):
sudo perf report -n --symbols=<yourfunction>
There you can access the annotated code and get some statistics for a given branch. Or directly annotate it with the perf command with --symbol option.
I'm fascinated by the ability of 'perf' to record call graphs and am trying to understand how to use it to understand a new code base.
I compiled the code in debug mode, and ran unit tests using the following command:
perf record --call-graph dwarf make test
This creates a 230 meg perf.data. I then write out the call graph
perf report --call-graph --stdio > callgraph.txt
This creates a 50 meg file.
Ideally, I would only like to see code belonging to the project, not kernel code, system calls, c++ standard libraries, even boost and whatever other third party software. Currently I see items like __GI___dl_iterate_phdr, _Unwind_Find_FDE, etc.
I love the flamegraph project. However, that visualization isn't good for code comprehension. Are there any other projects, write-ups, ideas, which might be helpful?
perf report -g for huge application should not be dumped to external file as too verbose. Collected perf.data (with -g) will work without file redirection with interactive perf report TUI interface. You may disable callgraph reporting to find functions took most time with perf record without -g or perf report --no-children.
There is gprof2dot script (https://github.com/jrfonseca/gprof2dot) to visualize lagre perf report call-graphs as compact picture (graph).
There is also Brendan D. Gregg's interactive FlameGraphs in svg/js; and he often notes in presentations that perf report -g output shows many megabyte raw dumps of report as lot of A4 pages. There is usage instruction for the perf: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html#perf:
# git clone https://github.com/brendangregg/FlameGraph # or download it from github
# cd FlameGraph
# perf record -F 99 -g -- ../command
# perf script | ./stackcollapse-perf.pl > out.perf-folded
# ./flamegraph.pl out.perf-folded > perf-kernel.svg
PS: Why you are profiling make process? Try to select some test and profile only them. Use lower profile frequency to get smaller perf.data file. Also disable kernel-mode samples with :u suffix of default event "cycles": perf record -F 99 -g -e cycles:u -- ../command
I am trying to get performance of individual functions within a process. How can I do it using perf tool? Is there any other tool for this?
For example, let's say, main function calls functions A , B , C . I want to get performance of main function as well as functions A,B,C individually .
Is there a good document for understating perf source code?
Thank you.
What you want to do is user-land probing. Perf can only do part of it.
Try sudo perf top -p [pid] and then watch the scoreboard. It will show the list of functions sorted by CPU usage. Here is an snapshort of redis during benchmark:
If you want to get more infos of your user-land functions, such as IO usage, latency, memory usage, I strongly suggest you to use Systemtap. It is both scripting language and tool for profiling program on Linux kernel-based operation system. Here is a tutorial about it:
http://qqibrow.github.io/performance-profiling-with-systemtap/
And you don't need to be a expert of systemtap scripting, there are many good script online for you.
For example, there is an example about using it to find out the latency of specific function.
https://github.com/openresty/stapxx#func-latency-distr
See the Perforator tool, which is built for this: https://github.com/zyedidia/perforator.
Perforator uses the same perf_event_open API that perf uses, but also uses ptrace so that profiling can be selectively enabled only for certain regions of a program (such as functions). See the examples at the Github repository for details.
perf is documented at https://perf.wiki.kernel.org/index.php/Main_Page with a tutorial at https://perf.wiki.kernel.org/index.php/Tutorial
perf report gives the breakdown by "command", see https://perf.wiki.kernel.org/index.php/Tutorial#Sample_analysis_with_perf_report. perf annotate provides a way to select what commands to report, see "Source level analysis with perf annotate" in https://perf.wiki.kernel.org/index.php/Tutorial#Options_controlling_output_2.
I read somewhere that it is possible to convert perf.data (output from linux perf record profiling tool) to a format that kcachegrind can parse/plot, however I didn't find an application capable of doing this convertion and neither does kcachegrind opens perf.data.
Is this possible: use kcachegrind to see perf output? Which tool can I use?
There are two approaches for conversion of perf data to callgrind format, but its unclear which of them is more mature.
The one with more current commits called perfgrind can be found at https://github.com/ostash/perfgrind
However, it is stated to lack callgraph support, and commits came to a halt after announcement of a patch for the 2nd tool on the kernel mailing list, see lkml.org/lkml/2013/3/27/535.
The 2nd tool https://github.com/vitillo/perf approaches direct integration into the perf command, but has not yet seen an official release.
At least the perf 3.10.0 I tried does not support the proposed 'perf convert' syntax.