Automatically adjusting process priorities under Linux

Automatically adjusting process priorities under Linux - linux

I'm trying to write a program that automatically sets process priorities based on a configuration file (basically path - priority pairs).
I thought the best solution would be a kernel module that replaces the execve() system call. Too bad, the system call table isn't exported in kernel versions > 2.6.0, so it's not possible to replace system calls without really ugly hacks.
I do not want to do the following:
-Replace binaries with shell scripts, that start and renice the binaries.
-Patch/recompile my stock Ubuntu kernel
-Do ugly hacks like reading kernel executable memory and guessing the syscall table location
-Polling of running processes
I really want to be:
-Able to control the priority of any process based on it's executable path, and a configuration file. Rules apply to any user.
Does anyone of you have any ideas on how to complete this task?

If you've settled for a polling solution, most of the features you want to implement already exist in the Automatic Nice Daemon. You can configure nice levels for processes based on process name, user and group. It's even possible to adjust process priorities dynamically based on how much CPU time it has used so far.

Sometimes polling is a necessity, and even more optimal in the end -- believe it or not. It depends on a lot of variables.
If the polling overhead is low-enough, it far exceeds the added complexity, cost, and RISK of developing your own style kernel hooks to get notified of the changes you need. That said, when hooks or notification events are available, or can be easily injected, they should certainly be used if the situation calls.
This is classic programmer 'perfection' thinking. As engineers, we strive for perfection. This is the real world though and sometimes compromises must be made. Ironically, the more perfect solution may be the less efficient one in some cases.
I develop a similar 'process and process priority optimization automation' tool for Windows called Process Lasso (not an advertisement, its free). I had a similar choice to make and have a hybrid solution in place. Kernel mode hooks are available for certain process related events in Windows (creation and destruction), but they not only aren't exposed at user mode, but also aren't helpful at monitoring other process metrics. I don't think any OS is going to natively inform you of any change to any process metric. The overhead for that many different hooks might be much greater than simple polling.
Lastly, considering the HIGH frequency of process changes, it may be better to handle all changes at once (polling at interval) vs. notification events/hooks, which may have to be processed many more times per second.
You are RIGHT to stay away from scripts. Why? Because they are slow(er). Of course, the linux scheduler does a fairly good job at handling CPU bound threads by downgrading their priority and rewarding (upgrading) the priority of I/O bound threads -- so even in high loads a script should be responsive I guess.

There's another point of attack you might consider: replace the system's dynamic linker with a modified one which applies your logic. (See this paper for some nice examples of what's possible from the largely neglected art of linker hacking).
Where this approach will have problems is with purely statically linked binaries. I doubt there's much on a modern system which actually doesn't link something dynamically (things like busybox-static being the obvious exceptions, although you might regard the ability to get a minimal shell outside of your controls as a feature when it all goes horribly wrong), so this may not be a big deal. On the other hand, if the priority policies are intended to bring some order to an overloaded shared multi-user system then you might see smart users preparing static-linked versions of apps to avoid linker-imposed priorities.

Sure, just iterate through /proc/nnn/exe to get the pathname of the running image. Only use the ones with slashes, the others are kernel procs.
Check to see if you have already processed that one, otherwise look up the new priority in your configuration file and use renice(8) to tweak its priority.

If you want to do it as a kernel module then you could look into making your own binary loader. See the following kernel source files for examples:
$KERNEL_SOURCE/fs/binfmt_elf.c
$KERNEL_SOURCE/fs/binfmt_misc.c
$KERNEL_SOURCE/fs/binfmt_script.c
They can give you a first idea where to start.
You could just modify the ELF loader to check for an additional section in ELF files and when found use its content for changing scheduling priorities. You then would not even need to manage separate configuration files, but simply add a new section to every ELF executable you want to manage this way and you are done. See objcopy/objdump of the binutils tools for how to add new sections to ELF files.

Does anyone of you have any ideas on how to complete this task?
As an idea, consider using apparmor in complain-mode. That would log certain messages to syslog, which you could listen to.

If the processes in question are started by executing an executable file with a known path, you can use the inotify mechanism to watch for events on that file. Executing it will trigger an I_OPEN and an I_ACCESS event.
Unfortunately, this won't tell you which process caused the event to trigger, but you can then check which /proc/*/exe are a symlink to the executable file in question and renice the process id in question.
E.g. here is a crude implementation in Perl using Linux::Inotify2 (which, on Ubuntu, is provided by the liblinux-inotify2-perl package):
perl -MLinux::Inotify2 -e '
use warnings;
use strict;
my $x = shift(#ARGV);
my $w = new Linux::Inotify2;
$w->watch($x, IN_ACCESS, sub
{
for (glob("/proc/*/exe"))
{
if (-r $_ && readlink($_) eq $x && m#^/proc/(\d+)/#)
{
system(#ARGV, $1)
}
}
});
1 while $w->poll
' /bin/ls renice
You can of course save the Perl code to a file, say onexecuting, prepend a first line #!/usr/bin/env perl, make the file executable, put it on your $PATH, and from then on use onexecuting /bin/ls renice.
Then you can use this utility as a basis for implementing various policies for renicing executables. (or doing other things).

Related

How to get the last process that modified a particular file?

Ηi,
Say I have a file called something.txt. I would like to find the most recent program to modify it, specifically the full path to said program (eg. /usr/bin/nano). I only need to worry about files modified while my program is running, so I can add an event listener at program startup and find out what program modified it when my program was running.
Thanks!

auditd in Linux could perform actions regarding file modifications
See the following URI xmodulo.com/how-to-monitor-file-access-on-linux.html

Something like this generally isn't going to be possible for arbitrary processes. If these aren't arbitrary processes, then you could use some sort of network bus (e.g. redis) to publish "write" messages. Otherwise your only other bet would be to implement your own filesystem using FUSE. Even with FUSE though, you may not always have access to the pid depending on who/what is writing to the file and the security setup of your OS.

How to make a mod_perl interpreter sticky by some conventions?

As it seems that mod_perl only manages Perl interpreters per VHOST, is
there any way I can influence which cloned interpreter mod_perl
selects to process a request? I've read through the configurable
scopes and had a look into "modperl_interp_select" in the source and I
could see that if a request already has a interpreter associated, that
one is selected by mod_perl.
else if (r) {
if (is_subrequest && (scope == MP_INTERP_SCOPE_REQUEST)) {
[...]
}
else {
p = r->pool;
get_interp(p);
}
I would like to add some kind of handler before mod_perl selects an
interpreter to process a request and then select an interpreter to
assign it to the request myself, based on different criteria included in
the request.
But I'm having trouble to understand if such a handler can exist at
all or if everything regarding a request is already processed by a
selected interpreter of mod_perl.
Additionally, I can see APR::Pool-API, but it doesn't seem to provide
the capability to set some user data on a current pool object, which
is what mod_perl reads by "get_interp".
Could anyone help me on that? Thanks!
A bit on the background: I have a dir structure in cgi-bin like the following:
cgi-bin
software1
customer1
*.cgi
*.pm
customer2
*.cgi
*.pm
software2
customer1
*.cgi
*.pm
customer2
*.cgi
*.pm
Each customer uses a private copy of the software and the software is using itself, e.g. software1 of customer1 may talk to software2 of customer1 by loading some special client libs of software2 into it's own Perl interpreter. To get things more complicated, software2 may even bring general/common parts of software1 with it's own private installation by using svn:external. So I have a lot of the same software with the same Perl packages in one VHOST and I can't guarantee that all of those private installation always have the same version level.
It's quite a mixup, but which is known to work under the rules we have within the same Perl interpreter.
But now comes mod_perl, clones interpreters as needed and reuses them for requests into whichever sub dir of cgi-bin it likes and in this case things will break, because suddenly the interpreter already processed software1 of customer1 and should now process software2 of customer2, which uses common packages of software1, which already where loaded by the Perl interpreter before and are used because of %INC instead of the private packages of software2 and such...
Yes, there are different ways to deal with that, like VHOSTs and sub domains persoftware or customer or whatever, but I would like to check different ways of keeping one VHOST and the current directory structure, just by using what mod_perl or Apache httpd provides. And one way would be if I could tell mod_perl to always use the same Perl interpreter for requests to the same directory. This way mod_perl would create it's pool of interpreters and I would be responsible to select each of them per directory.

What I've learned so far is that it's not easily possible to influence mod_perl's decision about a selected interpreter. If one wants to, it seems that one would need to really patch mod_perl on C level or provide an own C-handler for httpd as a hook to run before mod_perl. In the end mod_perl is only a combination of handlers for httpd itself, so placing one in front of it doing some special things is possible. It's another question if it's wise to do so, because one would have to deal with some mod_perl internals like the fact there's no interpreter available currently and in the end in my case I would need a map somewhere for interpreters and their associated directory to process.
In the end it's not that easy and I don't want to patch mod_perl or start on low level C for a httpd handler/hook.
For documentation purposes I want to mention two possible workarounds which came into my mind:
Pool of Perl threads
The problem with the current mod_perl approach in my case is that it clones Perl interpreters low level in C and those are run by threads provided by a pool of httpd, every thread can run any interpreter any given time, unless it's not already in use by another thread. With this approach it seems impossible to access the interpreters within Perl itself without using any low level XS as well, especially it's not possible to manage the interpreters and threads with Perl's Threads API, simply because it aren't no Perl threads, it's Perl interpreters executed by httpd threads. In the end both behave the same, though, because during creation of Perl threads the current interpreter is cloned as well and associated OS threads are created and such. But while using Perl's threads you have more influence about the shared data and such.
So a workaround for my current problem could be to not let mod_perl and it's interpreters process a request, but instead create an own thread pool of Perl threads directly while starting up in a VHOST using PerlModule or such. Those threads can be directly managed entirely within Perl, one could create some queues to dispatch work in form of absolute paths to requested CGI applications and such. Besides the thread pool itself a handler would be needed which would get called instead of e.g. ModPerl::Registry to function as a dispatcher: It would need to decide based on some criteria which thread to use and put the requested path into it's queue and the thread itself could ultimately e.g. just create new instances of ModPerl::Registry to process the given file. Of course there would be some glue needed here and there...
There are some downsides of this approach of course: It sounds like a fair amount of work, doubles some of the functionality already implemented by mod_perl especially regarding pool maintenance and doubles the amount of threads and memory used, because the mod_perl interpreters and threads would only be used to execute the dispatcher handler and additionally one would have threads and interpreters to process the requests within the Perl threads. The amount of threads shouldn't be a huge problem at all, that one of mod_perl would just sleep and wait for the Perl thread to finish it's work.
#INC-Hook with source code changes
Another, and I guess easier, approach would be to make use of #INC hooks for Perl's require in combination again with an own mod_perl handler extending ModPerl::Registry. The key point is that the handler is the first place where the requested file is read and it's source can be changed before compiling it. Every
use XY::Z;
XY::Z->new(...);
could be changed to
use SomePrefix::XY::Z;
SomePrefix::XY::Z->new(...);
where SomePrefix would simply be the full path of the parent directory of the requested file changed to be a valid Perl package name. ModPerl::Registry already does something similar while transforming a requested CGI script automatically into a mod_perl handler, so this works in general and ModPerl::Registry already provides some logic to generate the package name and such. The change leads to that Perl won't find the packages anymore automatically, simply because they don't exist with the new name in a place known to Perl, that's where the #INC hook applies.
The hook is responsible to recognize such changed packages, simply because of the name of SomePrefix or a marker prefix in front of SomePrefix or whatever, and map those to a file in the file system to provide a handle to the requested file which Perl can load during "require". Additionally, the hook will provide a callback which gets called by Perl for each line of the file read and will function as a source code filter, again changing each "package", "use" or "require" statement to have SomePrefix in front of. This will result again in the hook being responsible for providing file handles to those packages etc.
The key point here is changing the source code during runtime once: Instead of "XY/Z.pm" which Perl would require normally, would be available n times in my directory structure and would be saved as "XY/Z.pm" in %INC, one lets Perl require "SomePrefix/XY/Z.pm", that would be stored in %INC and is unique for every Perl interpreter used by mod_perl because SomePrefix reflects the unique installation directory of a requested file. There's no room anymore for Perl to think it already had loaded XY::Z, just because it processed a request from another directory before.
Of course this only works for easy "use ...;" statements, things like "eval("require $package");" will make things a bit more complicated.
Comments welcome... :-)

"PerlOptions" is a DIR scoped variable, not limited to VirtualHosts, so new interpreter pools can be created for any location. These directive can even be placed in .htaccess files for ease of configuration, but something like this in your httpd.conf should give you the desired effect:
<Location /cgi-bin/software1/customer1>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software1/customer2>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software2/customer1>
PerlOptions +Parent
</Location>
<Location /cgi-bin/software2/customer2>
PerlOptions +Parent
</Location>

Control Linux Application Launch/Licensing

I need to employ some sort of licensing on some Linux applications that I don't have access to their code base.
What I'm thinking is having a separate process read the license key and check for the availability of that application. I would then need to ensure that process is run during every invocation of the respected application. Is there some feature of Linux that can assist in this? For example something like the sudoers file in which I detect what user and what application is trying to be launched, and if a combination is met, run the license process check first.
Or can I do something like not allow the user to launch the (command-line) application by itself, and force them to pipe it to my license process as so:
/usr/bin/tm | license_process // whereas '/usr/bin/tm' would fail on its own

I need to employ some sort of licensing on some Linux applications
Please note that license checks will generally cost you way more (in support and administration) than they are worth: anybody who wants to bypass the check and has a modicum of skill will do so, and will not pay for the license if he can't anyway (that is, by not implementing a licensing scheme you are generally not leaving any money on the table).
that I don't have access to their code base.
That makes your task pretty much impossible: the only effective copy-protection schemes require that you rebuild your entire application, and make it check the license in so many distinct places that the would be attacker gets bored and goes away. You can read about such schemes here.
I'm thinking is having a separate process read the license key and check for the availability of that application.
Any such scheme will be bypassed in under 5 minutes by someone skilled with strace and gdb. Don't waste your time.

You could write a wrapper binary that does the checks, and then link in the real application as part of that binary, using some dlsym tricks you may be able to call the real main function from the wrapper main function.
IDEA
read up on ELF-hacking: http://www.linuxforums.org/articles/understanding-elf-using-readelf-and-objdump_125.html
use ld to rename the main function of the program you want to protect access to. http://fixunix.com/aix/399546-renaming-symbol.html
write a wrapper that does the checks and uses dlopen and dlsym to call the real main.
link together real application with your wrapper, as one binary.
Now you have an application that has your custom checks that are somewhat hard to break, but not impossible.
I have not tested this, don't have the time, but sort of fun experiment.

Using callgrind/kcachegrind to get per-thread statistics

I'd like to be able to see how "expensive" each thread in my application is using callgrind. I profiled with the --separate-thread=yes option which gives you a callgrind file for the whole app and then one per-thread.
This is useful for viewing the profile of any given thread, but what I really want is just a sorted list of CPU time from each thread so I can see which threads are the the biggest hogs.

Valgrind/Callgrind doesn't allow this behaviour. Neither kcachegrind does, but I think it will be a good improvement. Maybe some answers could be found on their mailing-list.
A working but really boring way could be to use option --separate-thread=no, and update your code to use for each thread a different function name or class name. Depending your code complexity, it could be the answer (using 1computeData(), 2computeData(), ..)

Just open multiple profiles in kcachegrind at the same time.

Tracing pthread scheduling

What I want to do is create some kind of graph detailing the execution of (two) threads in Linux. I don't need to see what the threads do, just when they are scheduled and for how long, a time line basically.
I've spend the last few hours searching the internet for a way to trace the scheduling of pthreads. Unfortunately, the two projects I found require either kernel recompilation (LTTng) or glibc patching (NPTL Trace Tool), both of which I can not do (large, centrally managed system, on which I have no sudo rights).
Is there any other way to do something like this or will I have to resort to finding a laptop on which I can patch/recompile whatever I want?
Best regards
PS: I would have linked to both projects, but the site doesn't allow me (reputation < 10). The first search result on Google for the project names is the correct one though.

Superuser privileges are not needed to build an instrumented glibc / libpthread.so. The ptt_trace program that is part of NPTL Trace Tool will run your program using the instrumented library.

Maybe something like Intel's VTune?

There is also a tool called pthreadw (on sourceforge)
It's a wrapper library which intercepts calls to the usual functions of the pthread library, and reports stats, like typical times spent playing with locks, condition variables, etc...
It is not currently able to export traces, only textual summary reports.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string