Call dlopen with file descriptor? - linux

I want to open a shared object as a data file and perform a verification check on it. The verification is a signature check, and I sign the shared object. If the verification is successful, I would like to load the currently opened shared object as a proper shared object.
First question: is it possible to call dlopen and load the shared object as a data file during the signature check so that code is not executed? According to the man pages, I don't believe so since I don't see a flag similar to RTLD_DATA.
Since I have the shared object open as a data file, I have the descriptor available. Upon successful verification, I would like to pass the descriptor to dlopen so the dynamic loader loads the shared object properly. I don't want to close the file then re-open it via dlopen because it could introduce a race condition (where the file verified is not the same file opened and executed).
Second question: how does one pass an open file to dlopen using a file descriptor so that dlopen performs customary initialization of the shared object?

On Linux, you probably could dlopen some /proc/self/fd/15 file (for file descriptor 15).
RTLD_DATA does not seems to exist. So if you want it, you have to patch your own dynamic loader. Perhaps doing that within MUSL Libc could be less hard. I still don't understand why you need it.
You have to trust the dlopen-ed plugin somehow (and it will run its constructor functions at dlopen time).
You could analyze the shared object plugin before dlopen-ing it by using some ELF parsing library, perhaps libelf or libbfd (from binutils); but I still don't understand what kind of analysis you want to make (and you really should explain that; in particular what happens if the plugin is indirectly linked to some bad behaving software). In other words you should explain more about your verification step. Notice that a shared object could overwrite itself....
Alternatively, don't use dlopen and just mmap your file (you'll need to parse some ELF and process relocations; see elf(5) and Levine's Linkers and Loaders for details, and look into the source code of your ld.so, e.g. in GNU glibc).
Perhaps using some JIT generation techniques might be useful (you would JIT generate code from some validated data), e.g. with GCCJIT, LLVM, or libjit or asmjit (or even LuaJit or SBCL) etc...
And if you have two file descriptors to the same shared object you probably won't have any race conditions.
An option is to build your ad-hoc static C or C++ source code analyzer (perhaps using some GCC plugin provided by you). That plugin might (with months, or perhaps years, of development efforts) check some property of the user C++ code. Beware of Rice's theorem (limiting the properties of every static source code analyzer). Then your program might (like my manydl.c does, or like RefPerSys will soon do, in mid 2020, or like the obsolete GCC MELT did a few years ago) take as input the user C++ code, run some static analysis on that C++ code (e.g. using your GCC plugin), compile that C++ code into a temporary shared object, and dlopen that shared object. So read Drepper's paper How to Write Shared Libraries.

Related

Passively inject shared object to a specific executable

I'm interested on injecting my own shared object to any future to run instance of a specific executable.
This executable gives me hard time since it is executed a lot and quite frequently which makes me reluctant to inject my code actively (using ptrace()).
The best option I thought of is to use some kind of ELF patcher (maybe https://github.com/NixOS/patchelf ?) in order to make the executable depends on my code. This option discourages me since I'm afraid any bug in the code will lead to an executable corruption.
Any other suggestion?
Rules:
Root permissions are granted
I can't load a kernel module
The injection must be inline - meaning, before the executable entry point was called (main())

Link to the same DLL twice - Implicit and explicit at the same time

The project on which I'm working, loads same library twice:
with LoadLibrary
statically loads the DLL with a lib file and " __declspec(dllimport/dllexport)".
What is happening in this case? Are these 2 "loadings" use same heap or share something else. E.g. is it same or similar as calling LoadLibrary twice?
My general problem is that I'm having stack corruption problems, when calling dll methods from exe via the second approach. And I'm wondering if the problem could be, because of the first loading? All projects use same RT, alignment and so on.
By "statically loads the DLL with a lib file and _declspec(dllimport/dllexport)" I assume you meant that that you compiled your executable with the .lib as a dependency, and at the runtime the .dll is automatically loaded by the exe (at the beginning). Here's a fragment from FreeLibrary (surprisingly) MSDN page:
The system maintains a per-process reference count for each loaded module. A module that was loaded at process initialization due to load-time dynamic linking has a reference count of one. The reference count for a module is incremented each time the module is loaded by a call to LoadLibrary. The reference count is also incremented by a call to LoadLibraryEx unless the module is being loaded for the first time and is being loaded as a data or image file.
So in other words, the .dll gets loaded at application startup (because you linked against it) and LoadLibrary just increments its ref count (). For more info you could also check DllMain, or this dll guide.
There's absolutely no reason to use both approaches for the same .dll in the same application.
The 2nd approach is the preferred one, if the .dll comes with a .h file (that holds the function definitions exported by the library, needed at compile time) and a .lib file (that instructs the liker to add references from the .dll file into the executable).
The 1st approach on the other hand is the only way if you only have the .dll file and you somehow have the signatures of the functions it exports. In that case you must define in your app pointers to those functions and initialize them using GetProcAddress. There are cases when this approach is preferred, for example when the functionality in the .dll is needed only in a corner case of the program flow, in that case there's no point to link against the .lib file and load the .dll at app startup if let's say in 99% of the cases it won't be required. Also, a major advantage of this approach: if the .dll is somehow deleted then only the functionality related to it won't work (LoadLibrary will fail), while using the other approach, the application won't start.
Now, without details i can't get to the bottom of this specific problem you'r running into. You say that you call a function "normally" (from its definition in the .h file), it fails while if you call it (with the same arguments) using a function pointer it succeeds? What's the stack error message?
Note: From my experience a typical reason for stack corruptions in scenarios like this one is calling convention mismatch between the caller and the callee (stdcall vs cdecl or viceversa). Also mixing Debug and Release could introduce problems.

.lib files and decompiling

I have a .exe which is compiled from a combination of .for (fortran), and .c source files.
It does not run on anything later than Win98, due to an error with the graphics server:
“access violation error in User 32.dll at Ox7e4467a9”
Unless there is some other way around the above error (?), I assume I have to recompile the .exe from source using a more modern graphics server. I have all the files to do this bar one .lib file!
Is it possible to pull any info on the missing lib file out of the current .exe I have?
It is possible to dis-assemble the .exe, but I don't think I gain much from this?
You probably can't "cut" the lib file from an executable. Even if you could somehow get the code from it, standard compilers and linker wouldn't know how to link against it, since it won't have the linking information needed (they are not included in the result binary).
However, if your problem is that your program works on Win98, but doesn't run on NT-based systems (XP, Vista, Win7), I think it would be easier to find out, what incompatibility is there that crashes the program. You mentioned that the access violation occurs in user32.dll. Start your program inside a debugger, take a look at which function the crash occurs. Make sure you have your PDB symbols loaded (so you can see names of internal non-public functions). Trace down which Win32 API is called and what are its parameters. Try to figure out, what should be at the memory that cannot be accessed.
Also without any other information, it's impossible to help you with that.
Once integrated into an image file (your exe), a library (your .lib) which is statically bound to an application (which is done by your linker) cannot be separated, differentiated from your own code, and thus, one cannot retrieve the code from a lib by decompiling the exe.

are ".o" files "loadable"?

I have been reading John R. Levine's Linkers and Loaders and I read that the properties of an object file will include one or more of the following.
file should be linkable
file should be loadable
file should be executable
Now, considering this example:
#include<stdio.h>
int main() {
printf("testing\n");
return 0;
}
Which I would compile and link with:
$ gcc -c t.c
$ gcc -o t t.o
I tried inspecting t.o using objdump and its type shows up as REL. What all properties does t.o satisfy? I believe that its linkable, non-executable. I would have believed that its not loadable(unless you create an .so file from the .o file); however the type REL means that its supposed to be relocated, and relocation would occur only in the context of loading, so I'm having a confusion here.
My doubts summarized :-
Are ".o" files loadable?
Reading resources regarding the sections present in a ".o", ".so" file - differences etc?
An object file (i.e., a file with the .o extension) is not loadable. This is because it lacks critical information about how to resolve all the symbols within it: in this case, the println symbol in particular would need additional information. (C compilers do not bind library identities into the object files they create, which is occasionally even useful.)
When you link the object file into a shared library (.so), you are adding that binding. Typically, you're also grouping a number of object files together and resolving references between them (plus a few more esoteric things). That then makes the result possible to load, since the loader can then just do resolution of references and loading of dependencies that it doesn't already know about.
Going from there to executable is typically just a matter of adding on the OS-defined program bootstrap. This is a small piece of code that the OS will start the program running by calling, and it typically works by loading the rest of the program and dependencies and then calling main() with information about the arguments. (It's also responsible for exiting cleanly if main returns.)
Just to set the context this link states somethings similar (emphasis for readability only);
A file may be linkable, used as input by a link editor or linking
loader. It my be executable, capable of being loaded into
memory and run as a program, loadable, capable of being loaded
into memory as a library along with a program, or any combination of
the three.
A .o file is a linker object file, which is according to this definition not executable and definitely linkable. Loadable is a tougher call, but since .o files are not loadable without some definitely not cross platform trickery, I'd say the spirit is that it's not loadable.

How tcamalloc gets linked to the main program

I want to know how malloc gets linked to the main program.Basically I have a program which uses several static and dynamic libraries.I am including all these in my makefile using option "-llibName1 -llibName2".
The documentation of TCmalloc says that we can override our malloc simply by calling "LD_PRELOAD=/usr/lib64/libtcmalloc.so".I am not able to understand how tcamlloc gets called to the all these static and dynamic libraries.Also how does tcmalloc also gets linked to STL libraries and new/delete operations of C++?
can anyone please give any insights on this.
"LD_PRELOAD=/usr/lib64/libtcmalloc.so" directs the loader to use libtcmalloc.so before any other shared library when resolving symbols external to your program, and because libtcmalloc.so defines a symbol named "malloc", that is the verison your program will use.
If you omit the LD_PRELOAD line, glibc.so (or whatever C library you have on your system) will be the first shared library to define a symbol named "malloc".
Note also that if you link against a static library which defines a symbol named "malloc" (and uses proper arguments, etc), or another shared library is loaded that defines a symbol named "malloc", your program will attempt to use that version of malloc.
That's the general idea anyway; the actual goings-on is quite interesting and I will have to direct to you http://en.wikipedia.org/wiki/Dynamic_linker as a starting point for more information.

Resources