Tracing pthread scheduling

Tracing pthread scheduling - linux

What I want to do is create some kind of graph detailing the execution of (two) threads in Linux. I don't need to see what the threads do, just when they are scheduled and for how long, a time line basically.
I've spend the last few hours searching the internet for a way to trace the scheduling of pthreads. Unfortunately, the two projects I found require either kernel recompilation (LTTng) or glibc patching (NPTL Trace Tool), both of which I can not do (large, centrally managed system, on which I have no sudo rights).
Is there any other way to do something like this or will I have to resort to finding a laptop on which I can patch/recompile whatever I want?
Best regards
PS: I would have linked to both projects, but the site doesn't allow me (reputation < 10). The first search result on Google for the project names is the correct one though.

Superuser privileges are not needed to build an instrumented glibc / libpthread.so. The ptt_trace program that is part of NPTL Trace Tool will run your program using the instrumented library.

Maybe something like Intel's VTune?

There is also a tool called pthreadw (on sourceforge)
It's a wrapper library which intercepts calls to the usual functions of the pthread library, and reports stats, like typical times spent playing with locks, condition variables, etc...
It is not currently able to export traces, only textual summary reports.

Related

Logging the kernel Ftrace point selectively for particular arguments

I want to measure performance of some kernel functions using Ftrace but I want to measure it selectively for particular value of argument. This is because the same/other programs calling the same function (but with different argument) pollute my Ftrace output logs.
Also, I don't want to set PID filter as it would not solve my issue (I'm running multiple parallel kernel threads, and same program can also call that function with different arguments)
What's the best possible way of doing it without affecting the measurements? Is there any Ftrace functionality (or possibly customizing the trace points) that I'm missing?

We can use Conditional Tracepoints for this kinds of case. This patch may also be helpful in understanding. One can check this file - samples\trace_events\trace-events-sample.h in linux kernel to see sample examples.
In samples\trace_events\trace-events-sample.h, It became crystal-clear to me after seeing this macro - TRACE_EVENT_CONDITION(). Thanks to the author for providing a detailed documentation there.
Moreover, one can use pre-defined event-tracepoints or define a new custom event tracepoint in include/trace/events/*.h and filter the trace logs by adding condition in TRACE_DIR/tracing/events/EVENT/filter. This kernel documentation link is very helpful to understand this.

Ptracing Process Trees

I'm looking for code examples on how to use the Linux system call ptrace() to trace system calls of a process and all its child, grandchild, etc processes. Similar to the behaviour of strace when it is fed the fork flag -f.
I'm aware of the alternative of looking into the sources of strace but I'm asking for a clean tutorial first in the hopes of getting a more isolated explanation.
I'm gonna use this to implement a fast generic system call memoizer similar to https://github.com/nordlow/strace-memoize but written in a compiled language. My current code examples I want to extend with this logic is my fork of ministrace at https://github.com/nordlow/ministrace/blob/master/ministrace.c

RTFM PTRACE_SETOPTIONS with the PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK and PTRACE_O_TRACEVFORK flags. In a nutshell, if you set it on a process, any time it creates children, those will automatically be traced as well.

Control Linux Application Launch/Licensing

I need to employ some sort of licensing on some Linux applications that I don't have access to their code base.
What I'm thinking is having a separate process read the license key and check for the availability of that application. I would then need to ensure that process is run during every invocation of the respected application. Is there some feature of Linux that can assist in this? For example something like the sudoers file in which I detect what user and what application is trying to be launched, and if a combination is met, run the license process check first.
Or can I do something like not allow the user to launch the (command-line) application by itself, and force them to pipe it to my license process as so:
/usr/bin/tm | license_process // whereas '/usr/bin/tm' would fail on its own

I need to employ some sort of licensing on some Linux applications
Please note that license checks will generally cost you way more (in support and administration) than they are worth: anybody who wants to bypass the check and has a modicum of skill will do so, and will not pay for the license if he can't anyway (that is, by not implementing a licensing scheme you are generally not leaving any money on the table).
that I don't have access to their code base.
That makes your task pretty much impossible: the only effective copy-protection schemes require that you rebuild your entire application, and make it check the license in so many distinct places that the would be attacker gets bored and goes away. You can read about such schemes here.
I'm thinking is having a separate process read the license key and check for the availability of that application.
Any such scheme will be bypassed in under 5 minutes by someone skilled with strace and gdb. Don't waste your time.

You could write a wrapper binary that does the checks, and then link in the real application as part of that binary, using some dlsym tricks you may be able to call the real main function from the wrapper main function.
IDEA
read up on ELF-hacking: http://www.linuxforums.org/articles/understanding-elf-using-readelf-and-objdump_125.html
use ld to rename the main function of the program you want to protect access to. http://fixunix.com/aix/399546-renaming-symbol.html
write a wrapper that does the checks and uses dlopen and dlsym to call the real main.
link together real application with your wrapper, as one binary.
Now you have an application that has your custom checks that are somewhat hard to break, but not impossible.
I have not tested this, don't have the time, but sort of fun experiment.

Automatically adjusting process priorities under Linux

I'm trying to write a program that automatically sets process priorities based on a configuration file (basically path - priority pairs).
I thought the best solution would be a kernel module that replaces the execve() system call. Too bad, the system call table isn't exported in kernel versions > 2.6.0, so it's not possible to replace system calls without really ugly hacks.
I do not want to do the following:
-Replace binaries with shell scripts, that start and renice the binaries.
-Patch/recompile my stock Ubuntu kernel
-Do ugly hacks like reading kernel executable memory and guessing the syscall table location
-Polling of running processes
I really want to be:
-Able to control the priority of any process based on it's executable path, and a configuration file. Rules apply to any user.
Does anyone of you have any ideas on how to complete this task?

If you've settled for a polling solution, most of the features you want to implement already exist in the Automatic Nice Daemon. You can configure nice levels for processes based on process name, user and group. It's even possible to adjust process priorities dynamically based on how much CPU time it has used so far.

Sometimes polling is a necessity, and even more optimal in the end -- believe it or not. It depends on a lot of variables.
If the polling overhead is low-enough, it far exceeds the added complexity, cost, and RISK of developing your own style kernel hooks to get notified of the changes you need. That said, when hooks or notification events are available, or can be easily injected, they should certainly be used if the situation calls.
This is classic programmer 'perfection' thinking. As engineers, we strive for perfection. This is the real world though and sometimes compromises must be made. Ironically, the more perfect solution may be the less efficient one in some cases.
I develop a similar 'process and process priority optimization automation' tool for Windows called Process Lasso (not an advertisement, its free). I had a similar choice to make and have a hybrid solution in place. Kernel mode hooks are available for certain process related events in Windows (creation and destruction), but they not only aren't exposed at user mode, but also aren't helpful at monitoring other process metrics. I don't think any OS is going to natively inform you of any change to any process metric. The overhead for that many different hooks might be much greater than simple polling.
Lastly, considering the HIGH frequency of process changes, it may be better to handle all changes at once (polling at interval) vs. notification events/hooks, which may have to be processed many more times per second.
You are RIGHT to stay away from scripts. Why? Because they are slow(er). Of course, the linux scheduler does a fairly good job at handling CPU bound threads by downgrading their priority and rewarding (upgrading) the priority of I/O bound threads -- so even in high loads a script should be responsive I guess.

There's another point of attack you might consider: replace the system's dynamic linker with a modified one which applies your logic. (See this paper for some nice examples of what's possible from the largely neglected art of linker hacking).
Where this approach will have problems is with purely statically linked binaries. I doubt there's much on a modern system which actually doesn't link something dynamically (things like busybox-static being the obvious exceptions, although you might regard the ability to get a minimal shell outside of your controls as a feature when it all goes horribly wrong), so this may not be a big deal. On the other hand, if the priority policies are intended to bring some order to an overloaded shared multi-user system then you might see smart users preparing static-linked versions of apps to avoid linker-imposed priorities.

Sure, just iterate through /proc/nnn/exe to get the pathname of the running image. Only use the ones with slashes, the others are kernel procs.
Check to see if you have already processed that one, otherwise look up the new priority in your configuration file and use renice(8) to tweak its priority.

If you want to do it as a kernel module then you could look into making your own binary loader. See the following kernel source files for examples:
$KERNEL_SOURCE/fs/binfmt_elf.c
$KERNEL_SOURCE/fs/binfmt_misc.c
$KERNEL_SOURCE/fs/binfmt_script.c
They can give you a first idea where to start.
You could just modify the ELF loader to check for an additional section in ELF files and when found use its content for changing scheduling priorities. You then would not even need to manage separate configuration files, but simply add a new section to every ELF executable you want to manage this way and you are done. See objcopy/objdump of the binutils tools for how to add new sections to ELF files.

Does anyone of you have any ideas on how to complete this task?
As an idea, consider using apparmor in complain-mode. That would log certain messages to syslog, which you could listen to.

If the processes in question are started by executing an executable file with a known path, you can use the inotify mechanism to watch for events on that file. Executing it will trigger an I_OPEN and an I_ACCESS event.
Unfortunately, this won't tell you which process caused the event to trigger, but you can then check which /proc/*/exe are a symlink to the executable file in question and renice the process id in question.
E.g. here is a crude implementation in Perl using Linux::Inotify2 (which, on Ubuntu, is provided by the liblinux-inotify2-perl package):
perl -MLinux::Inotify2 -e '
use warnings;
use strict;
my $x = shift(#ARGV);
my $w = new Linux::Inotify2;
$w->watch($x, IN_ACCESS, sub
{
for (glob("/proc/*/exe"))
{
if (-r $_ && readlink($_) eq $x && m#^/proc/(\d+)/#)
{
system(#ARGV, $1)
}
}
});
1 while $w->poll
' /bin/ls renice
You can of course save the Perl code to a file, say onexecuting, prepend a first line #!/usr/bin/env perl, make the file executable, put it on your $PATH, and from then on use onexecuting /bin/ls renice.
Then you can use this utility as a basis for implementing various policies for renicing executables. (or doing other things).

Using callgrind/kcachegrind to get per-thread statistics

I'd like to be able to see how "expensive" each thread in my application is using callgrind. I profiled with the --separate-thread=yes option which gives you a callgrind file for the whole app and then one per-thread.
This is useful for viewing the profile of any given thread, but what I really want is just a sorted list of CPU time from each thread so I can see which threads are the the biggest hogs.

Valgrind/Callgrind doesn't allow this behaviour. Neither kcachegrind does, but I think it will be a good improvement. Maybe some answers could be found on their mailing-list.
A working but really boring way could be to use option --separate-thread=no, and update your code to use for each thread a different function name or class name. Depending your code complexity, it could be the answer (using 1computeData(), 2computeData(), ..)

Just open multiple profiles in kcachegrind at the same time.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string