I'm looking for code examples on how to use the Linux system call ptrace() to trace system calls of a process and all its child, grandchild, etc processes. Similar to the behaviour of strace when it is fed the fork flag -f.
I'm aware of the alternative of looking into the sources of strace but I'm asking for a clean tutorial first in the hopes of getting a more isolated explanation.
I'm gonna use this to implement a fast generic system call memoizer similar to https://github.com/nordlow/strace-memoize but written in a compiled language. My current code examples I want to extend with this logic is my fork of ministrace at https://github.com/nordlow/ministrace/blob/master/ministrace.c
RTFM PTRACE_SETOPTIONS with the PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK and PTRACE_O_TRACEVFORK flags. In a nutshell, if you set it on a process, any time it creates children, those will automatically be traced as well.
Related
I want to measure performance of some kernel functions using Ftrace but I want to measure it selectively for particular value of argument. This is because the same/other programs calling the same function (but with different argument) pollute my Ftrace output logs.
Also, I don't want to set PID filter as it would not solve my issue (I'm running multiple parallel kernel threads, and same program can also call that function with different arguments)
What's the best possible way of doing it without affecting the measurements? Is there any Ftrace functionality (or possibly customizing the trace points) that I'm missing?
We can use Conditional Tracepoints for this kinds of case. This patch may also be helpful in understanding. One can check this file - samples\trace_events\trace-events-sample.h in linux kernel to see sample examples.
In samples\trace_events\trace-events-sample.h, It became crystal-clear to me after seeing this macro - TRACE_EVENT_CONDITION(). Thanks to the author for providing a detailed documentation there.
Moreover, one can use pre-defined event-tracepoints or define a new custom event tracepoint in include/trace/events/*.h and filter the trace logs by adding condition in TRACE_DIR/tracing/events/EVENT/filter. This kernel documentation link is very helpful to understand this.
Let's say, I have chrome running, which has 100 different processeses, not all of which are direct children. What's the best way to programmatically get all of the processes from either the procfs or the any syscall may be (I believe getrusage only allows calling process), given the PID of the main chrome parent in the hierarchy?
Also, is there any API that's equivalent to PSAPI in Windows which provides OpenProcess, GetProcessMemoryInfo etc, that allows you to iterate through memory efficiently, rather than parsing the procfs?
Most efficient way please. No calling other processes like ps, pstree, pgrep, etc.
Side context: This is mostly an educational exercise to find the most efficient way to do this, which I started going down trying to write a simple script in nodejs to try and get all the processes programatically and then, calculate the sum of the memory taken by the process tree, including each.
I'm actually the author of a C++ library that is designed to do exactly that - pfs.
pfs attempts to make all the interesting information inside procfs accessible through a very simple API. If you find it lacking any useful information, please create an issue, and I'll try to add it.
Seeing that you require that information from Node.js, you might be able to use the library for "inspiration" or for research purposes (as in, understand where the information is located).
Regarding the process tree: The procfs contains the parent PID of every process (find it under stat and/or status). You can enumerate all the running processes and store them into a container and then iterate over it while drawing or retaining the order you require.
My goal is to set virtual memory page permissions (as if the forked process called mprotect) from the parent process. Can this be done with ptrace(1) or by some other magic?
Thanks!
It can be done (via ptrace() indeed; gdb can do this), but not without a lot of finagling, since in order to call a function in another process, you basically have to setup its registers and stack, etc. for execution, and then continue the process, which will execute the function. One program I know off the top of my head that might have some useful source/methodology for you to look at is injectso. If you do look at injectso, look at the inject_code() functions.
In addition, calling conventions vary by platform, so you'd have to re-jigger your code for each architecture/OS, etc.
I'm trying to write a program that automatically sets process priorities based on a configuration file (basically path - priority pairs).
I thought the best solution would be a kernel module that replaces the execve() system call. Too bad, the system call table isn't exported in kernel versions > 2.6.0, so it's not possible to replace system calls without really ugly hacks.
I do not want to do the following:
-Replace binaries with shell scripts, that start and renice the binaries.
-Patch/recompile my stock Ubuntu kernel
-Do ugly hacks like reading kernel executable memory and guessing the syscall table location
-Polling of running processes
I really want to be:
-Able to control the priority of any process based on it's executable path, and a configuration file. Rules apply to any user.
Does anyone of you have any ideas on how to complete this task?
If you've settled for a polling solution, most of the features you want to implement already exist in the Automatic Nice Daemon. You can configure nice levels for processes based on process name, user and group. It's even possible to adjust process priorities dynamically based on how much CPU time it has used so far.
Sometimes polling is a necessity, and even more optimal in the end -- believe it or not. It depends on a lot of variables.
If the polling overhead is low-enough, it far exceeds the added complexity, cost, and RISK of developing your own style kernel hooks to get notified of the changes you need. That said, when hooks or notification events are available, or can be easily injected, they should certainly be used if the situation calls.
This is classic programmer 'perfection' thinking. As engineers, we strive for perfection. This is the real world though and sometimes compromises must be made. Ironically, the more perfect solution may be the less efficient one in some cases.
I develop a similar 'process and process priority optimization automation' tool for Windows called Process Lasso (not an advertisement, its free). I had a similar choice to make and have a hybrid solution in place. Kernel mode hooks are available for certain process related events in Windows (creation and destruction), but they not only aren't exposed at user mode, but also aren't helpful at monitoring other process metrics. I don't think any OS is going to natively inform you of any change to any process metric. The overhead for that many different hooks might be much greater than simple polling.
Lastly, considering the HIGH frequency of process changes, it may be better to handle all changes at once (polling at interval) vs. notification events/hooks, which may have to be processed many more times per second.
You are RIGHT to stay away from scripts. Why? Because they are slow(er). Of course, the linux scheduler does a fairly good job at handling CPU bound threads by downgrading their priority and rewarding (upgrading) the priority of I/O bound threads -- so even in high loads a script should be responsive I guess.
There's another point of attack you might consider: replace the system's dynamic linker with a modified one which applies your logic. (See this paper for some nice examples of what's possible from the largely neglected art of linker hacking).
Where this approach will have problems is with purely statically linked binaries. I doubt there's much on a modern system which actually doesn't link something dynamically (things like busybox-static being the obvious exceptions, although you might regard the ability to get a minimal shell outside of your controls as a feature when it all goes horribly wrong), so this may not be a big deal. On the other hand, if the priority policies are intended to bring some order to an overloaded shared multi-user system then you might see smart users preparing static-linked versions of apps to avoid linker-imposed priorities.
Sure, just iterate through /proc/nnn/exe to get the pathname of the running image. Only use the ones with slashes, the others are kernel procs.
Check to see if you have already processed that one, otherwise look up the new priority in your configuration file and use renice(8) to tweak its priority.
If you want to do it as a kernel module then you could look into making your own binary loader. See the following kernel source files for examples:
$KERNEL_SOURCE/fs/binfmt_elf.c
$KERNEL_SOURCE/fs/binfmt_misc.c
$KERNEL_SOURCE/fs/binfmt_script.c
They can give you a first idea where to start.
You could just modify the ELF loader to check for an additional section in ELF files and when found use its content for changing scheduling priorities. You then would not even need to manage separate configuration files, but simply add a new section to every ELF executable you want to manage this way and you are done. See objcopy/objdump of the binutils tools for how to add new sections to ELF files.
Does anyone of you have any ideas on how to complete this task?
As an idea, consider using apparmor in complain-mode. That would log certain messages to syslog, which you could listen to.
If the processes in question are started by executing an executable file with a known path, you can use the inotify mechanism to watch for events on that file. Executing it will trigger an I_OPEN and an I_ACCESS event.
Unfortunately, this won't tell you which process caused the event to trigger, but you can then check which /proc/*/exe are a symlink to the executable file in question and renice the process id in question.
E.g. here is a crude implementation in Perl using Linux::Inotify2 (which, on Ubuntu, is provided by the liblinux-inotify2-perl package):
perl -MLinux::Inotify2 -e '
use warnings;
use strict;
my $x = shift(#ARGV);
my $w = new Linux::Inotify2;
$w->watch($x, IN_ACCESS, sub
{
for (glob("/proc/*/exe"))
{
if (-r $_ && readlink($_) eq $x && m#^/proc/(\d+)/#)
{
system(#ARGV, $1)
}
}
});
1 while $w->poll
' /bin/ls renice
You can of course save the Perl code to a file, say onexecuting, prepend a first line #!/usr/bin/env perl, make the file executable, put it on your $PATH, and from then on use onexecuting /bin/ls renice.
Then you can use this utility as a basis for implementing various policies for renicing executables. (or doing other things).
What I want to do is create some kind of graph detailing the execution of (two) threads in Linux. I don't need to see what the threads do, just when they are scheduled and for how long, a time line basically.
I've spend the last few hours searching the internet for a way to trace the scheduling of pthreads. Unfortunately, the two projects I found require either kernel recompilation (LTTng) or glibc patching (NPTL Trace Tool), both of which I can not do (large, centrally managed system, on which I have no sudo rights).
Is there any other way to do something like this or will I have to resort to finding a laptop on which I can patch/recompile whatever I want?
Best regards
PS: I would have linked to both projects, but the site doesn't allow me (reputation < 10). The first search result on Google for the project names is the correct one though.
Superuser privileges are not needed to build an instrumented glibc / libpthread.so. The ptt_trace program that is part of NPTL Trace Tool will run your program using the instrumented library.
Maybe something like Intel's VTune?
There is also a tool called pthreadw (on sourceforge)
It's a wrapper library which intercepts calls to the usual functions of the pthread library, and reports stats, like typical times spent playing with locks, condition variables, etc...
It is not currently able to export traces, only textual summary reports.