Using perf for application routine tracing

Using perf for application routine tracing - linux

Is it possible to capture the arguments to userspace function in a precompiled binary with the linux perf tool ? I couldn't figure this out via the documentation or google ?
If not any other suggestions ?
Thanks...

I don't know how to do it with perf but there other ways. If gdb is suitable then use it. If it is suitable for example because of performance problems then use SystemTap:
1) your precompiled binary has debug information and you can use gdb
just attach to a running process, put breakpoint and possibly add command for it:
break your_function
command
info args
continue
end
2) your precompiled binary does not have debug information and you can use gdb
In this situation you need to know a calling convetion. For example for x64_86
break your_function
command
info register rdi
continue
end
3) your precompiled binary has debug information, you cannot use gdb but can use SystemTap
Then something like this:
sudo stap params.stp -x <PID> 'process("your-process-name").function("your_function")'
> cat params.stp
function trace(extra)
{
printf("params:%s\n", extra)
}
probe $1.call { trace($$parms$$) }

Related

Create custom ./configure command line arguments

I'm updating a project to use autotools, and to maintain backwards compatibility with previous versions, I would like the user to be able to run ./configure --foo=bar to set a build option.
Based on reading the docs, it looks like I could set up ./configure --enable-foo, ./configure --with-foo, or ./configure foo=bar without any problem, but I'm not seeing anything allowing the desired behavior (specifically having a double dash -- before the option).
Any suggestions?

There's no way I know of doing this in configure.ac. You'll have to patch configure. This can be done by running the patching script in a bootstrap.sh after running autoreconf. You'll have to add your option to the ac_option processing loop. The case for --x looks like a promising one to copy or replace to inject your new option, something like:
--foo=*)
my_foo=$ac_optarg ;;
There's also some code that strips out commandline args when configure sometimes needs to be re-invoked. It'll be up to you to determine whether --foo should be stripped or not. I think this is probably why they don't allow this in the first place.
If it were me, I'd try and lobby for AC_ARG_WITH (e.g. --with-foo=bar). It seems like a lot less work.

in order to do that you have to add to your configure.ac something like this:
# Enable debugging mode
AC_ARG_ENABLE(debug,
AC_HELP_STRING([--enable-debug],[Show a lot of extra information when running]),
AM_CPPFLAGS="$AM_CPPFLAGS -DDEBUG"
debug_messages=yes,
debug_messages=no)
AC_SUBST(AM_CPPFLAGS)
AC_SUBST(AM_CXXFLAGS)
echo -e "\n--------- build environment -----------
Debug Mode : $debug_messages"
That is just a simple example to add for example a --enable-debug, it will set the DEBUG constant on the config.h file.
then your have to code something like this:
#include "config.h"
#ifdef DEBUG
// do debug
#else
// no debug
#endif

How can I get perf to find symbols in my program

When using perf report, I don't see any symbols for my program, instead I get output like this:
$ perf record /path/to/racket ints.rkt 10000
$ perf report --stdio
# Overhead Command Shared Object Symbol
# ........ ........ ................. ......
#
70.06% ints.rkt [unknown] [.] 0x5f99b8
26.28% ints.rkt [kernel.kallsyms] [k] 0xffffffff8103d0ca
3.66% ints.rkt perf-32046.map [.] 0x7f1d9be46650
Which is fairly uninformative.
The relevant program is built with debugging symbols, and the sysprof tool shows the appropriate symbols, as does Zoom, which I think is using perf under the hood.
Note that this is on x86-64, so the binary is compiled with -fomit-frame-pointer, but that's the case when running under the other tools as well.

This post is already over a year old, but since it came out at the top of my Google search results when I had the same problem, I thought I'd answer it here. After some more searching around, I found the answer given in this related StackOverflow question very helpful. On my Ubuntu Raring system, I then ended up doing the following:
Compile my C++ sources with -g (fairly obvious, you need debug symbols)
Run perf as
record -g dwarf -F 97 /path/to/my/program
This way perf is able to handle the DWARF 2 debug format, which is the standard format gcc uses on Linux. The -F 97 parameter reduces the sampling rate to 97 Hz. The default sampling rate was apparently too large for my system and resulted in messages like this:
Warning:
Processed 172390 events and lost 126 chunks!
Check IO/CPU overload!
and the perf report call afterwards would fail with a segmentation fault. With the reduced sampling rate everything worked out fine.
Once the perf.data file has been generated without any errors in the previous step, you can run perf report etc. I personally like the FlameGraph tools to generate SVG visualizations.
Other people reported that running
echo 0 > /proc/sys/kernel/kptr_restrict
as root can help as well, if kernel symbols are required.

In my case the solution was to delete the elf files which contained cached symbols from previous builds and were messing things up.
They are in ~/.debug/ folder

You can always use the '$ nm ' command.
here is some sample output:
Ethans-MacBook-Pro:~ phyrrus9$ nm a.out
0000000100000000 T __mh_execute_header
0000000100000f30 T _main
U _printf
0000000100000f00 T _sigint
U _signal
U dyld_stub_binder

I had this problem too, I couldn't see any userspace symbol, but I saw some kernel symbols. I thought this was a symbol loading issue. After tried all the possible solutions I could find, I still couldn't get it work.
Then I faintly remember that
ulimit -u unlimited
is needed. I tried and it magically worked.
I found from this wiki that this command is needed when you use too many file descriptors.
https://perf.wiki.kernel.org/index.php/Tutorial#Troubleshooting_and_Tips
my final command was
perf record -F 999 -g ./my_program
didn't need --call-graph

Make sure that you compile the program using -g option along with gcc(cc) so that debugging information is produced in the operating system's native format.
Try to do the following and check if there are debug symbols present in the symbol table.
$objdump -t your-elf
$readelf -a your-elf
$nm -a your-elf

How about your dev host machine? Is it also running the x86_64 OS?
If not, please make sure the perf is cross-compiled, because the perf depends on the objdump and other tools in toolchain.

I got the same problem with perf after overriding the name of my program via prctl(PR_SET_NAME)
As I can see your case is pretty similar:
70.06% ints.rkt [unknown]
Command you have executed (racket) is different from the one perf have seen.

you can check the value of kptr_restrict by cat /proc/kallsyms. If the addresses of the symbols in the result are all 0x000000, you can fix it by command echo 0 > sys/kernel/kptr_restrict . After this , you may get a wanted result of the perf report

linux - export output from apachetop to file

Is it possible to export output from apachetop to file? Something like this: "apachetop > file", but because apachetop is running "forever", so this command is also running forever. I just need to obtain actual output from this program and handle it in my GTK# application.
Every answer will be very appreciated.
Matej.

This might work:
{ apachetop > file 2>&1 & sleep 1; kill $! ; }
but no guarantees :)
Another way using linux is to find out the /dev/vcsN device that is being used when running the program and reading from that file directly. It contains a copy of the screen data for a given VT; I'm not sure if there is a applicable device for a pty.

Well indirectly apachetop is using the access.log file to get it's data.
Look at
/var/log/apache2/access.log
You'll simply have to parse the file to get the info you're looking for!/var/log/apache2/access.log

on-the-fly output redirection, seeing the file redirection output while the program is still running

If I use a command like this one:
./program >> a.txt &
, and the program is a long running one then I can only see the output once the program ended. That means I have no way of knowing if the computation is going well until it actually stops computing. I want to be able to read the redirected output on file while the program is running.
This is similar to opening a file, appending to it, then closing it back after every writing. If the file is only closed at the end of the program then no data can be read on it until the program ends. The only redirection I know is similar to closing the file at the end of the program.
You can test it with this little python script. The language doesn't matter. Any program that writes to standard output has the same problem.
l = range(0,100000)
for i in l:
if i%1000==0:
print i
for j in l:
s = i + j
One can run this with:
./python program.py >> a.txt &
Then cat a.txt .. you will only get results once the script is done computing.

From the stdout manual page:
The stream stderr is unbuffered.
The stream stdout is line-buffered
when it points to a terminal.
Partial lines will not appear until
fflush(3) or exit(3) is called, or
a new‐line is printed.
Bottom line: Unless the output is a terminal, your program will have its standard output in fully buffered mode by default. This essentially means that it will output data in large-ish blocks, rather than line-by-line, let alone character-by-character.
Ways to work around this:
Fix your program: If you need real-time output, you need to fix your program. In C you can use fflush(stdout) after each output statement, or setvbuf() to change the buffering mode of the standard output. For Python there is sys.stdout.flush() of even some of the suggestions here.
Use a utility that can record from a PTY, rather than outright stdout redirections. GNU Screen can do this for you:
screen -d -m -L python test.py
would be a start. This will log the output of your program to a file called screenlog.0 (or similar) in your current directory with a default delay of 10 seconds, and you can use screen to connect to the session where your command is running to provide input or terminate it. The delay and the name of the logfile can be changed in a configuration file or manually once you connect to the background session.
EDIT:
On most Linux system there is a third workaround: You can use the LD_PRELOAD variable and a preloaded library to override select functions of the C library and use them to set the stdout buffering mode when those functions are called by your program. This method may work, but it has a number of disadvantages:
It won't work at all on static executables
It's fragile and rather ugly.
It won't work at all with SUID executables - the dynamic loader will refuse to read the LD_PRELOAD variable when loading such executables for security reasons.
It's fragile and rather ugly.
It requires that you find and override a library function that is called by your program after it initially sets the stdout buffering mode and preferably before any output. getenv() is a good choice for many programs, but not all. You may have to override common I/O functions such as printf() or fwrite() - if push comes to shove you may just have to override all functions that control the buffering mode and introduce a special condition for stdout.
It's fragile and rather ugly.
It's hard to ensure that there are no unwelcome side-effects. To do this right you'd have to ensure that only stdout is affected and that your overrides will not crash the rest of the program if e.g. stdout is closed.
Did I mention that it's fragile and rather ugly?
That said, the process it relatively simple. You put in a C file, e.g. linebufferedstdout.c the replacement functions:
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
char *getenv(const char *s) {
static char *(*getenv_real)(const char *s) = NULL;
if (getenv_real == NULL) {
getenv_real = dlsym(RTLD_NEXT, "getenv");
setlinebuf(stdout);
}
return getenv_real(s);
}
Then you compile that file as a shared object:
gcc -O2 -o linebufferedstdout.so -fpic -shared linebufferedstdout.c -ldl -lc
Then you set the LD_PRELOAD variable to load it along with your program:
$ LD_PRELOAD=./linebufferedstdout.so python test.py | tee -a test.out
0
1000
2000
3000
4000
If you are lucky, your problem will be solved with no unfortunate side-effects.
You can set the LD_PRELOAD library in the shell, if necessary, or even specify that library system-wide (definitely NOT recommended) in /etc/ld.so.preload.

If you're trying to modify the behavior of an existing program try stdbuf (part of coreutils starting with version 7.5 apparently).
This buffers stdout up to a line:
stdbuf -oL command > output
This disables stdout buffering altogether:
stdbuf -o0 command > output

Have you considered piping to tee?
./program | tee a.txt
However, even tee won't work if "program" doesn't write anything to stdout until it is done. So, the effectiveness depends a lot on how your program behaves.

If the program writes to a file, you can read it while it is being written using tail -f a.txt.

Your problem is that most programs check to see if the output is a terminal or not. If the output is a terminal then output is buffered one line at a time (so each line is output as it is generated) but if the output is not a terminal then the output is buffered in larger chunks (4096 bytes at a time is typical) This behaviour is normal behaviour in the C library (when using printf for example) and also in the C++ library (when using cout for example), so any program written in C or C++ will do this.
Most other scripting languages (like perl, python, etc.) are written in C or C++ and so they have exactly the same buffering behaviour.
The answer above (using LD_PRELOAD) can be made to work on perl or python scripts, since the interpreters are themselves written in C.

The unbuffer command from the expect package does exactly what you are looking for.
$ sudo apt-get install expect
$ unbuffer python program.py | cat -
<watch output immediately show up here>

Compressing the core files during core generation

Is there way to compress the core files during core dump generation?
If the storage space is limited in the system, is there a way of conserving it in case of need for core dump generation with immediate compression?
Ideally the method would work on older versions of linux such as 2.6.x.

The Linux kernel /proc/sys/kernel/core_pattern file will do what you want: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#191
Set the filename to something like |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz and your core files should be saved compressed for you.

For an embedded Linux systems, following script change perfectly works to generate compressed core files in 2 steps
step 1: create a script
touch /bin/gen_compress_core.sh
chmod +x /bin/gen_compress_core.sh
cat > /bin/gen_compress_core.sh #!/bin/sh exec /bin/gzip -f - >"/var/core/core-$1.$2.gz"
ctrl +d
step 2: update the core pattern file
cat > /proc/sys/kernel/core_pattern |/bin/gen_compress_core.sh %e %p ctrl+d

As suggested by other answer, the Linux kernel /proc/sys/kernel/core_pattern file is good place to start: http://www.mjmwired.net/kernel/Documentation/sysctl/kernel.txt#141
As documentation says you can specify the special character "|" which will tell kernel to output the file to script. As suggested you could use |/bin/gzip -1 > /var/crash/core-%t-%p-%u.gz as name, however it doesn't seem to work for me. I expect that the reason is that on my system kernel doesn't treat the > character as a output, rather it probably passes it as a parameter to gzip.
In order to avoid this problem, like other suggested you can create your file in some location I am using /home//crash/core.sh, create it using the following command, replacing with your user. Alternatively you can also obviously change the entire path.
echo -e '#!/bin/bash\nexec /bin/gzip -f - >"/home/<username>/crashes/core-$1-$2-$3-$4-$5.gz"' > ~/crashes/core.sh
Now this script will take 5 input parameters and concatenate them and add to core-path. The full paths must be specified in the ~/crashes/core.sh. Also the location of this script can be specified. Now lets tell kernel to use tour executable with parameters when generating file:
sudo sysctl -w kernel.core_pattern="|/home/<username>/crashes/core.sh %e %p %h %t"
Again should be replaced (or entire path to match location and name of core.sh script). Next step is to crash some program, lets create example crashing cpp file:
int main (){
int * a = nullptr;
int b = *a;
}
After compiling and running there are 2 options, either we will see:
Segmentation fault (core dumped)
Or
Segmentation fault
In case we see the latter, there are few possible reasons.
ulimit is not set, ulimit -c should specify what is limit for cores
apport or your distro core dump collector is not running, this should be investigated further
there is an error in script we wrote, I suggest than checking some basic dump path to check if the other things aren't reason the below should create /tmp/core.dump:
sudo sysctl -w kernel.core_pattern="/tmp/core.dump"
I know there is already an answer for this question however it wasn't obvious for me why it isn't working "out of the box" so I wanted to summarize my findings, hope it helps someone.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using perf for application routine tracing - linux

Is it possible to capture the arguments to userspace function in a precompiled binary with the linux perf tool ? I couldn't figure this out via the documentation or google ? If not any other suggestions ? Thanks...

Related

Create custom ./configure command line arguments

How can I get perf to find symbols in my program

linux - export output from apachetop to file

on-the-fly output redirection, seeing the file redirection output while the program is still running

Compressing the core files during core generation

Categories

Resources