Can I override a system function before calling fork? - linux

I'd like to be able to intercept filenames with a certain prefix from any of my child processes that I launch. This would be names like "pipe://pipe_name". I think wrapping the open() system call would be a good way to do this for my application, but I'd like to do it without having to compile a separate shared library and hooking it with the LD_PRELOAD trick (or using FUSE and having to have a mounted directory)
I'll be forking the processes myself, is there a way to redirect open() to my own function before forking and have it persist in the child after an exec()?
Edit: The thought behind this is that I want to implement multi-reader pipes by having an intermediate process tee() the data from one pipe into all the others. I'd like this to be transparent to my child processes, so that they can take a filename and open() it, and, if it's a pipe, I'll return the file descriptor for it, while if it's a normal file, I'll just pass that to the regular open() function. Any alternative way to do this that makes it transparent to the child processes would interesting to hear. I'd like to not have to compile a separate library that has to be pre-linked though.

I believe the answer here is no, it's not possible. To my knowledge, there's only three ways to achieve this:
LD_PRELOAD trick, compile a .so that's pre-loaded to override the system call
Implement a FUSE filesystem and pass a path into it to the client program, intercept calls.
Use PTRACE to intercept system calls, and fix them up as needed.
2 and 3 will be very slow as they'll have to intercept every I/O call and swap back to user space to handle it. Probably 500% slower than normal. 1 requires building and maintaining an external shared library to link in.
For my application, where I'm just wanting to be able to pass in a path to a pipe, I realized I can open both ends in my process (via pipe()), then go grab the path to the read end in /proc//fd and pass that path to the client program, which gives me everything I need I think.

Related

How to simulate INotify failure in functional test?

I have a Linux application that uses inotify for tracking filesystem changes. And I want to write a functional test suite for it that tests the application from the end-user perspective and as part of it I'd like to test situations where filesystem fails and particularly I want to test inotify failure.
Speicifically I'd like to make inotify_init(), inotify_add_watch(), inotify_rm_watch() calls and call of read() for the inotify file-descriptors return an error when it's required in the tests.
But the problem is that I can't find the way of how to simulate inotify failure. I wonder if somebody already encountered such problem and knows some solutions.
If you want to avoid any mocking whatsoever, your best bet is simply provoking errors by directly hitting OS limits. For example, inotify_init can fail with EMFILE errno, if the calling process has reached it's limit on number of open file descriptors. To reach such conditions with 100% precision you can use two tricks:
Dynamically manipulate limits of running process by changing values in procfs
Assign your app process to dedicated cgroup and "suspend" it by giving it ~0% CPU time via cgroups API (this is how Android throttles background apps and implements it's energy-saving "Doze" mode).
All possible errors conditions of inotify are documented in man pages of inotify, inotify_init and inotify_add_watch (I don't think that inotify_rm_watch can fail except for purely programming errors in your code).
Aside from ordinary errors (such as going over /proc/sys/fs/inotify/max_user_watches) inotify has several fault modes (queue space exhaustion, watch ID reuse), but those aren't "failures" in strict sense of word.
Queue exhaustion happens when someone performs filesystem changes faster than you can react. It is easy to reproduce: use cgroups to pause your program while it has an inotify descriptor open (so the event queue isn't drained) and rapidly generate lots of notifications by modifying the observed files/directories. Once you have /proc/sys/fs/inotify/max_queued_events of unhandled events, and unpause your program, it will receive IN_Q_OVERFLOW (and potentially miss some events, that didn't fit into queue).
Watch ID reuse is tedious to reproduce, because modern kernels switched from file descriptor-like behavior to PID-like behavior for watch-IDs. You should use the same approach as when testing for PID reuse — create and destroy lots of inotify watches until the integer watch ID wraps around.
Inotify also has a couple of tricky corner-cases, that rarely occur during normal operation (for example, all Java bindings I know, including Android and OpenJDK, do not handle all of them correctly): same-inode problem and handling IN_UNMOUNT.
Same-inode problem is well-explained in inotify documentation:
A successful call to inotify_add_watch() returns a unique watch descriptor for this inotify instance, for the filesystem object (inode) that corresponds to pathname. If the filesystem object was not previously being watched by this inotify instance, then the watch descriptor is newly allocated. If the filesystem object was already being watched (perhaps via a different link to the same object), then the descriptor for the existing watch is returned.
In plain words: if you watch two hard-links to the same file, their numeric watch IDs will be the same. This behavior can easily result in losing track of the second inotify watch, if you store watches in something like hashmap, keyed with integer watch IDs.
Second issue is even harder to observe, thus rarely properly supported despite not even being error mode: unmounting a partition, currently observed via inotify. The tricky part is: Linux filesystems do not allow you to unmount themselves when they you have file descriptors opened them, but observing a file via inotify does not prevent the filesystem unmounting. If your app observes files on separate filesystem, and user unmounts that filesystem, you have to be prepared to handle the resulting IN_UNMOUNT event.
All of tests above should be possible to perform on tmpfs filesystem.
After a bit of thinking, I came up with another solution. You can use Linux "seccomp" facility to "mock" results of individual inotify-related system calls. The upsides of this approach are being simple, robust and completely non-intrusive. You can conditionally adjust behavior of syscalls while still using original OS behavior in other cases. Technically this still counts as mocking, but the mocking layer is placed very deeply, between the kernel code and userspace syscall interface.
You don't need to modify the code of program, just write a wrapper, that installs a fitting seccomp filter before exec-ing your app (the code below uses libseccomp):
// pass control to kernel syscall code by default
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW);
if (!ctx) exit(1);
// modify behavior of specific system call to return `EMFILE` error
seccomp_rule_add(ctx, SCMP_ACT_ERRNO(EMFILE), __NR_inotify_init, 0));
execve(...
Seccomp is essentially a limited interpreter, running extended version of BPF bytecode, so it's capabilities are very extensive. libseccomp allows you to install limited conditional filters (for example comparing integer arguments of system call to constant values). If you want to achieve more impressive conditional behavior (such as comparing file path, passed to inotify_add_watch, to predefined value), you can combine direct usage of seccomp() syscall with kernel bpf() facility to write complex filtering programs in eBPF dialect.
Writing syscall filters might be tedious, and the behavior of program under effects of seccomp does not actually depend on kernel implementation (seccomp filters are invoked by kernel before passing the control to kernel syscall handler). So you may want to combine sparse use of seccomp with more organic approach, outlined in the my other answer.
Probably not as non-intrusive as you would like, but the INotify class from inotify_simple is small. You could completely wrap it, delegate all of the methods, and inject the errors.
The code would look something like this:
from inotify_simple.inotify_simple import INotify
class WrapINotify(object):
init_error_list = []
add_watch_error_list = []
rm_watch_error_list = []
read_error_list = []
def raise_if_error(self, error_list):
if not error_list:
return
# Simulate INotify raising an exception
exception = error_list.pop(0)
raise exception
def __init__(self):
self.raise_if_error(WrapINotify.init_error_list)
self.inotify = INotify()
def add_watch(self, path, mask):
self.raise_if_error(WrapINotify.add_watch_error_list)
self.inotify.add_watch(path, mask)
def rm_watch(self, wd):
self.raise_if_error(WrapINotify.rm_watch_error_list)
return self.inotify.rm_watch(wd)
def read(self, timeout=None, read_delay=None):
self.raise_if_error(WrapINotify.read_error_list)
return self.inotify.read(timeout, read_delay)
def close(self):
self.inotify.close()
def __enter__(self):
return self.inotify.__enter__()
def __exit__(self, exc_type, exc_value, traceback):
self.inotify.__exit__(exc_type, exc_value, traceback)
With this code, you would, somewhere else do:
WrapINotify.add_watch_error_list.append(OSError(28, 'No space left on disk'))
to inject the error. Of course, you could add more code to the wrapper class to implement different error injection schemes.

How to get the last process that modified a particular file?

Ηi,
Say I have a file called something.txt. I would like to find the most recent program to modify it, specifically the full path to said program (eg. /usr/bin/nano). I only need to worry about files modified while my program is running, so I can add an event listener at program startup and find out what program modified it when my program was running.
Thanks!
auditd in Linux could perform actions regarding file modifications
See the following URI xmodulo.com/how-to-monitor-file-access-on-linux.html
Something like this generally isn't going to be possible for arbitrary processes. If these aren't arbitrary processes, then you could use some sort of network bus (e.g. redis) to publish "write" messages. Otherwise your only other bet would be to implement your own filesystem using FUSE. Even with FUSE though, you may not always have access to the pid depending on who/what is writing to the file and the security setup of your OS.

how to avoid deadlock with parallel named pipes?

I am working on a flow-based programming system called net2sh. It is currently based on shell tools connected by named pipes. Several processes work together to get the job done, communicating over named pipes, not unlike a production line in a factory.
In general it is working delightfully well, however there is one major problem with it. In the case where processes are communicating over two or more named pipes, the "sending" process and "receiving" process must open the pipes in the same order. This is because when a process opens a named pipe, it blocks until the other end has also been opened.
I want a way to avoid this, without spawning extra "helper" processes for each pipe, without having to hack existing components, and without having to mess with the program networks to avoid this problem.
Ideally I am looking for some "non-blocking fifo" option, where "open" on a fifo always succeeds immediately but subsequent operations may block if the pipe buffer is full (or empty, for read)... I'd even consider using a kernel patch to that effect. According to fifo(7) O_NONBLOCK does do something different when opening fifos, not what I want exactly, and in order to use that I would have to rewrite every existing shell tool such as cat.
Here is a minimal example which deadlocks:
mkfifo a b
(> a; > b; ) &
(< b; < a; ) &
wait
If you can help me to solve this sensibly I will be very grateful!
There is a good description of using O_NONBLOCK with named pipes here: How do I perform a non-blocking fopen on a named pipe (mkfifo)?
It sounds like you want it to work in your entire environment without changing any C code. Therefore, one approach would be to set LD_PRELOAD to some shared library which contains a wrapper for open(2) which adds O_NONBLOCK to the flags whenever pathname refers to a named pipe.
A concise example of using LD_PRELOAD to override a library function is here: https://www.technovelty.org/c/using-ld_preload-to-override-a-function.html
Whether this actually works in practice without breaking anything else, you'll have to find out for yourself (please let us know!).

Setting memory permissions in forked process

My goal is to set virtual memory page permissions (as if the forked process called mprotect) from the parent process. Can this be done with ptrace(1) or by some other magic?
Thanks!
It can be done (via ptrace() indeed; gdb can do this), but not without a lot of finagling, since in order to call a function in another process, you basically have to setup its registers and stack, etc. for execution, and then continue the process, which will execute the function. One program I know off the top of my head that might have some useful source/methodology for you to look at is injectso. If you do look at injectso, look at the inject_code() functions.
In addition, calling conventions vary by platform, so you'd have to re-jigger your code for each architecture/OS, etc.

A non-reentrant function in an API being used in a multi-threaded program

I've using the QT API in C++, but I imagine answers can come effectively from people without any prior experience with QT.
QT has a function in its XML-handling class, called setContent(), which is specified as non-reentrant. When called, setContent() reads an XML file into memory, and returns it as a data structure.
As I understand it, a non-reentrant function is one that is not safe to call simultaneously from multiple threads even if the function is called to operate on different files/objects.
So based on this, my understanding is that I would not be able to have more than one thread that opens XML files using this function unless somehow both of these threads are protected against accessing the setContent() function at the same time.
Is this correct? If so, seems like a really poor way to write an API as this doesn't seem like a function at all that intuitively would raise multi-threading problems. In addition, no mutex is provided at all by the API.
So in order to use this function in my multi-threaded program, where more than one thread will be opening different XML files, what's the best way to handle access to the setContent() function? Should I create an extern mutex in a header file on its own that is included by every file that will access XML?
Looks like it's all about static QDomImplementation::InvalidDataPolicy invalidDataPolicy. It's the only static data that QDom*** classes use.
setContent and a bunch of global functions use its value when parsing, and if another thread changes it in the middle, obviously something may happen.
I suppose if your program never calls setInvalidDataPolicy(), you're safe to parse XML from different threads.
So based on this, my understanding is that I would not be able to have
more than one thread that opens XML files using this function unless
somehow both of these threads are protected against accessing the
setContent() function at the same time.
I think you're correct.
So in order to use this function in my multi-threaded program, where
more than one thread will be opening different XML files, what's the
best way to handle access to the setContent() function? Should I
create an extern mutex in a header file on its own that is included by
every file that will access XML?
Again, I tend to agree with you regarding the mutex. (By the way, Qt provides the QMutex) But I'm not sure what you mean by an extern mutex in a header file, so I'll just make sure to instantiate exactly one mutex, and dispatch a pointer to this mutex to all the threads that require it.

Resources