Need help unittesting rust library that has calls to execve

Need help unittesting rust library that has calls to execve - rust

Background:
I'm currently writing unittests for a library that is capable of starting other binaries, guaranteeing the binary will die after a timeout on linux.
The unittest is currently done by calling a binary that would normally sleep for 10 seconds and then create a file containing some data. The binary should be killed before those 10 seconds meaning the file should not exist had the timeout functioned. The path to that binary is currently hardcoded which is not what I want.
What I need help with:
Problem is I want to have access to such a binary when the crate is compiled, and then pass its path to the library being tested (thus being able to call said binary using execve syscall without hardcoding its location allowing other users of my crate to compile it). This means I need to somehow have a binary generated or grabbed during compilation and somehow have access to its path inside my unittest. Is there any decent approach to doing this?
The code for the binary can be written in whatever language as long as it works. But preferably rust or C/C++. Worst case it can be precompiled but I'd like to have it compiled on the fly so it works on ARM or other architectures aswell
What have I tried:
The current method is to simply hardcode the binary path and compile it manually using G++. This is not optimal however since if anyone downloads my crate from crates.io they won't have that binary and thus cannot pass its unittests.
I have been messing around with cc in build.rs, generating C++ code and then compiling it, but CC appears to be for compiling libraries which is not what I want since it attempts to link the binaries with the library (I believe thats what it's doing), and I have been googling for a few hours without finding any approach to solve this problem.

Related

Generating an additional artifact when crate is used as a dependency

I don't know if the title is phrased very well but I'll try my best to explain what I'm trying to achieve.
I have a project consisting of two crates, main-program and patch, the goal of the program is to capture audio from other processes on the system.
patch is a library crate that compiles to a DLL with a detour to hook into the system audio functions and sends the audio data over IPC.
in main-program I have some code that does the injection, as well as to receive the data over IPC.
Currently, I just have batch script that calls cargo build for each crate, and then copies the DLL and EXE to my output folder.
Now, what I want to do, is break out the code that does the injection and the receiving of data, and together with the patch crate, I want to create a library, my-audio-capture-lib that I can publish for use by others.
The optimal result would be that someone can add my-audio-capture-lib to their cargo.toml as a dependency, specify somewhere what filename they want the DLL to have, and then call a function like my-audio-capture-lib::capture_audio_from_pid in their code to recieve audio data. And when they build their project they should get their binary, as well as the DLL from my crate.
This however requires that at some point during the build process, my-audio-capture-lib produces the necessary DLL for injection. And I don't know how to do that, or if it's even possible to do.

How to use the binary output of a Cargo project as the input of another one?

In order to reduce the executable size of a Rust program (called runtime in my code), I am trying to compress it and then include it in a second program (called szl) that decompresses it and executes it.
I have done that by using a Cargo build script in szl that opens the output binary from runtime, compresses it, and then generates a file that is ready for use by include_bytes!.
The issue with this approach is the dependencies are not handled properly. For example, Cargo may try to build szl before runtime (and fail), and when the source code of runtime is modified, szl is not rebuilt.
Is there a way to tell Cargo that szl depends on the binary from runtime (and transitively on the source code of runtime), or should I use another approach such as an external Makefile?

While not exactly your use case, you might get it to work with the links manifest key. It would allow you to express a dependency between the two programs and you can pass more information with DEP_FOO_KEY variables.
Before you go to such drastic measures, it might be worth it to try other known strategies for reducing rust binary size (such as calling strip, remove debug symbols, LTO, panic=abort) etc.

Customising Cabal libraries (I think?)

Perhaps it's just better to describe my problem.
I'm developing a Haskell library. But part of the library is written in C, and another part actually in raw LLVM. To actually get GHC to spit out the code I want I have to follow this process:
Run ghc -emit-llvm on both the code that uses the Haskell module and the "Main" module.
Run clang -emit-llvm on the C file
Now I've got three .ll files from above. I add the part of the library I've handwritten in raw LLVM and llvm-link these into one .ll file.
I then run LLVM's opt on the linked file.
Lastly, I feed the LLVM bitcode fileback into GHC (which pleasantly accepts it) and produces an executable.
This process (with appropriate optimisation settings of course) seems to be the only way I can inline code from C, removing the function call overhead. Since many of these C functions are very small this is significant.
Anyway, I want to be able to distribute the library and for users to be able to use it as painlessly as possible, whilst still gaining the optimisations from the process above. I understand it's going to be a bit more of a pain than an ordinary library (for example, you're forced to compile via LLVM) but as painlessly as possible is what I'm looking for advice for.
Any guidance will be appreciated, I don't expect a step by step answer because I think it will be complex, but just some ideas would be helpful.

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

Finding the shared library name to use with dlload

In my open-source project Artha I use libnotify for showing passive desktop notifications to the user.
Instead of statically linking libnotify, a lookup at runtime is made for the shared object (.so) file via dlload, if available on the target machine, Artha exposes the notification feature in it's GUI. On app. start, a call to dlload with filename param as libnotify.so.1 is made and if it returns a non-null pointer, then the feature is exposed.
A recurring problem with this model is that every time the version number of the library is bumped, Artha's code needs to be updated, currently libnotify.so.4 is the latest to entail such an occurance.
Is there a linux system call (irrespective of the distro the app. is running on), which can tell me if a particular library's shared object is available at runtime? I know that there exists the bruteforce option of enumerating the library by going from 1 to say 10, I find the solution ugly and inelegant.
Also, if this can be addressed via autoconf, then that solution is welcome too I.e. at build time, based on the target machine, the configure.h generated should've the right .so name that can be passed to dlload.
P.S.: I think good distros follow the style of creating links to libnotify.so.x so that a programmer can just do dlload("libnotify.so", RTLD_LAZY) and the right version numbered .so is loaded; unfortunately not all distros follow this, including Ubuntu.

The answer is: you don't.
dlopen() is not designed to deal with things like that, and trying to load whichever soversion you find on the system just because it happens to have the symbols you need is not a good way to do it.
Different sonames have different ABIs, and different ABIs means that you may be calling the same exact symbol name that is expecting a different set (or different size) of parameters, which will cause crashes or misbehaviour that are extremely difficult do debug.
You should have a read on how shared object versions work and what an ABI is.
The libfoo.so link is there for the link editor (ld) and is usually installed with the -devel packages for that reason; it might also very well not be a link but rather a text file with a linker script, often times on purpose to avoid exactly what you're trying to do.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string