Linux elf binary recuperate file that binary read/writes

Linux elf binary recuperate file that binary read/writes - linux

I need to get files that a binary uses.
I can view all dependency of an ELF binary in the .interp section, but can I get conf files of my binary?
For example if a binary reads /etc/host, I want to see /etc/host in a section of my ELF file.
I do not see that in the documentation:
https://refspecs.linuxfoundation.org/LSB_1.1.0/gLSB/specialsections.html

I need to get files that some binary executable uses.
You can't get (all of) them. A file path used by some executable could be computed at runtime (and that is very often the case, just think of the cat(1) program). Solving that problem (of reliably computing all the files used by a program) in general could be proved equivalent to the Halting problem.
However, in practice, the strings(1) utility might help you guess some of the files (statically) referred by an executable.
You could also use strace(1) to understand (dynamically) what files are open(2)-ed during some particular execution.
Read also carefully the documentation of your executable. If it is a free software, study also its source code.

Related

Loading Linux libraries at runtime

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
Here is my specific problem: I want to publish a Linux program in ELF binary form that should run on as many distributions as possible so my mandatory dependencies are as low as it gets: The only libraries required under any circumstances are libpthread, libX11, librt and libm (and glibc of course). I'm linking dynamically against these libraries when I build my program using gcc.
Optionally, however, my program should also support ALSA (sound interface), the Xcursor, Xfixes, and Xxf86vm extensions as well as GTK. But these should only be used if they are available on the user's system, otherwise my program should still run but with limited functionality. For example, if GTK isn't there, my program will fall back to terminal mode. Because my program should still be able to run without ALSA, Xcursor, Xfixes, etc. I cannot link dynamically against these libraries because then the program won't start at all if one of the libraries isn't there.
So I need to manually check if the libraries are present and then open them one by one using dlopen() and import the necessary function symbols using dlsym(). This, however, leads to all kinds of problems:
1) Library naming conventions:
Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000". These extensions seem to differ from system to system. So which one should I choose when calling dlopen()? Using a hardcoded name here seems like a very bad idea because the names differ from system to system. So the only workaround that comes to my mind is to scan the whole library path and look for filenames starting with a "libXcursor.so" prefix and then do some custom version matching. But how do I know that they are really compatible?
2) Library search paths: Where should I look for the *.so files after all? This is also different from system to system. There are some default paths like /usr/lib and /lib but *.so files could also be in lots of other paths. So I'd have to open /etc/ld.so.conf and parse this to find out all library search paths. That's not a trivial thing to do because /etc/ld.so.conf files can also use some kind of include directive which means that I have to parse even more .conf files, do some checks against possible infinite loops caused by circular include directives etc. Is there really no easier way to find out the search paths for *.so?
So, my actual question is this: Isn't there a more convenient, less hackish way of achieving what I want to do? Is it really so complicated to create a Linux program that has some optional dependencies like ALSA, GTK, libXcursor... but should also work without it! Is there some kind of standard for doing what I want to do? Or am I doomed to do it the hackish way?
Thanks for your comments/solutions!

I think a major design flaw in Linux is the shared object hell when it comes to distributing programs in binary instead of source code form.
This isn't a design flaw as far as creators of the system are concerned; it's an advantage -- it encourages you to distribute programs in source form. Oh, you wanted to sell your software? Sorry, that's not the use case Linux is optimized for.
Library naming conventions: Shared objects often aren't simply called "libXcursor.so" but have some kind of version extension like "libXcursor.so.1" or even really funny things like "libXcursor.so.0.2000".
Yes, this is called external library versioning. Read about it here. As should be clear from that description, if you compiled your binaries using headers on a system that would normally give you libXcursor.so.1 as a runtime reference, then the only shared library you are compatible with is libXcursor.so.1, and trying to dlopen libXcursor.so.0.2000 will lead to unpredictable crashes.
Any system that provides libXcursor.so but not libXcursor.so.1 is either a broken installation, or is also incompatible with your binaries.
Library search paths: Where should I look for the *.so files after all?
You shouldn't be trying to dlopen any of these libraries using their full path. Just call dlopen("libXcursor.so.1", RTLD_GLOBAL);, and the runtime loader will search for the library in system-appropriate locations.

How to get a list of paths in /etc/ld.so.conf on Linux

What is the most portable and robust way to get the list of paths, configured by /etc/ld.so.conf and files included from it? Parsing the file manually seems to be not a good idea — the format is likely to change in the future revisions.
To allow better understanding of the question, I will give you specific details below. Note that, despite these details, this is a general programming question, applicable to other situations.
There is a program, called LuaRocks. It is a package manager for Lua programming language (somewhat like Ruby gems or Python eggs). LuaRocks packages are called "rocks".
As a convenience feature, LuaRocks allows a rock author to specify a list of external dependencies for a rock, formulated as a list of C header files and / or dynamic library files. (.so on Linux.) If the specified file does not exist, the rock can't be installed.
Currently, on Linux, LuaRocks by default checks .so file existance by searching for the file in two hardcoded paths, /usr/lib and /usr/local/lib.
I believe that this is incorrect behaviour, and it is broken by the recent changes in the Ubuntu and other Debian distributions.
Update: the paths are not hardcoded per se, but are user-configurable in the config file. Still, IMO, not a best solution.
Instead (as I understand it), LuaRocks should look up file in the paths, specified by /etc/ld.so.conf and files included from it.
(Now please re-read the question above ;-) )

You shouldn't need to parse /etc/ld.so.conf or any of the config files - if you run 'ldconfig', it will scan the configured directories and generate a cache file.
Then, subsequently when you attempt a dlopen it'll automatically find the files by iterating through the cached library directories. Same thing with compiling and giving -lSomeLib, you shouldn't need to specify -L/my/other/path if you've got it configured in ld.so.conf(.d)
autoconf accomplishes this by attempting to compile a test program that links to the shared library, but that's just a functional wrapper around the dlopen() call.
So, while other methods may not necessarily be 'wrong', at the root of it attempting to link to the library or doing a dlopen() are the 'most right' ways of doing it.
Consider this, if you attempt to link to a library in a directory that ISN'T cached in /etc/ld.so.cache, when you try to run the program it will fail because it won't be able to dlopen() the library!
Hence, any 'good' shared library will be in /etc/ld.so.cache and be linkable/dlopen()able, this means that gcc can use it to link and that the user-generated library or executable will be able to open it when it executes.
You can circumvent this by expressly setting the environment variable LD_LIBRARY_PATH, or LD_PRELOAD_PATH - but each of these has it's own caveats and should be avoided if possible for 'standard' use.
A good write-up on writing shared libraries covers some of these issues, and is a good read for anyone working on programmatic consuming of other-shared libraries. Ulrich Drepper's How to write shared libraries.

According to the FHS, the following are valid locations for dynamic libraries:
/lib*/
/opt/*/lib*/
/usr/lib*/
/usr/local/lib*/
(And most likely ~/lib*/ as well.)
All entries in my /etc/ld.so.conf.d/* conform to this. Some entries reference subdirectories below the FHS dirs, which probably means that you can use the libraries in there without path information.
Now I don't know enough about LuaRocks. If you're limited to Lua-path-style globs (only ?), you cannot match these and have to parse the configs. Otherwise, you could just try to find them anywhere in these directories.
This would break on non-FHS-conforming systems (only option: parse config) and if a directory is not included in the config, the installer might see libraries that the linker cannot find.
These two seem acceptable to me, therefore I'd simply ignore the config and look at these dirs.
(Another possibility could be trying to link the library, this should automagically use the right path. However, this is platform-specific and maybe dangerous.)

Interpreted standard library

It's common for a programming language to come with a standard library implemented at least partly in the language itself.
In the case of an interpreted language, the obvious implementation is to read the library source files when the interpreter starts up, but this runs into the messy but persistent problem of making sure the interpreter knows where to find those files even when both are moved around. It would be cleaner if they could be embedded in the interpreter itself, so there is just a single executable.
I can see a simple way to do this by just translating the library source files to C literal strings, but I'm curious as to whether there are any pitfalls I'm overlooking or refinements to the method.
So my question is, what existing interpreted languages attach library source files in the language itself, to the interpreter?

Bytecode virtual machines often provide an answer to this: store the bytecode in files (*.pyc, *.rbc) and load the bytecoded versions of the libraries using a simpler mechanism.
Smalltalks do this by dumping the standard heap into a separate file called an "image".
As for single-file distribution, append the library file(s) to the end of the executable file, and include special logic for the interpreter to read from its binary and find a structure of those interpretable program data, or alternatively build the interpreter with a static inclusion of the program data.

Getting a full path from addr2line

I'm trying to automate some debugging tasks. In certain cases, I print the value of $ra [this is a MIPS machine] and parts of the stack as hex addresses. During debugging, I use addr2line to convert them into file:line pairs.
I'd like to automate this procedure.
The problem is that addr2line returns a filename that equivelent to the value of __FILE__ at compilation time; i.e., the name of the file as passed to the compiler. This is usually foo.c, sometimes src/foo.c. As my project has several hundred directories in total, this may not be enough to uniquely identify the file (there may be 1/foo.c, 2/foo.c, etc). Even if it was deterministic, it seems rather inefficient to start running find in my screen for each argument [I suppose I could build a hash table and save them, but I'd like to keep this as a straightforward bash script]
GDB seems to get the right file. If I look at the actual source file with debugging symbols, I can also see that right after the filename there appears to be the full path to the __FILE__ [i.e., if __FILE__ is src/foo.c, and it's really in /home/me/projects/something/comp1/src/foo.c, I will see /home/me/projects/something/comp1 in the file. How can I get this progmatically?
Thanks.

This is very surprising behavior. I'm unable to reproduce it in:
Linux with gcc 4.1.2 and addr2line 2.17.50.0.6
Cygwin with gcc 4.3.4 and gcc 3.4.4 and addr2line 2.20.51.20100410
addr2line should rely on the debug information stored in the executable. And the debug information should contain absolute paths (regardless of what source path was given to the compiler) in order to avoid any ambiguities when using a debugger. Everywhere I try it, addr2line always shows an absolute path.
Assuming you are using make for your build system, one option, albeit a probably painful one, would be to change your makefiles to use a non-recursive strategy (something you really ought to be doing anyway). With such a system, only a single instance of make is running, from a single working directory (usually the top-level of your source tree). Therefore all invocations of the compiler specify the full path to the source file (relative to the root of the source tree). This would solve your problem if, in fact, addr2line always shows the filenames as they were specified to the compiler. Not the best solution, but one that would work. And as a side benefit, you'd get all the advantages of non-recursive make.

linux/gcc: ldd functionality from inside a C/C++ program

Is there a simple and efficient way to know that a given dynamically linked ELF is missing a required .so for it to run, all from the inside of a C/C++ program?
I need a program with somewhat similar functionality as ldd, without trying to execute the ELF to find out the (met/unmet) dependencies in the system. Perhaps asking the ld-linux.so utility via some library? (I'm a newbie in this part of linux =)
NOTE: reading the source code of ldd was not very helpful for my intentions: it seems that ldd is in fact forking another process and executing the program.
If it's not possible to know that a program has unmet dependencies without executing it, is there some way to, at least, quickly list the .so's required for that ELF all from within my program?
Thanks in advance =)

As per ld.so(8), setting the environment variable LD_TRACE_LOADED_OBJECTS to a non-empty string will give ldd-like results (instead of executing the binary or library normally).
setenv("LD_TRACE_LOADED_OBJECTS", "1", 1);
FILE *ldd = popen("/lib/libz.so");

Have you tried dlopen function? you can use this to load a dynamic library (or, for your case, to ckeck if a library can be loaded).
Having a list of needed libraries is more difficult, take a look to handle_dynamic function on readelf source

What about using ptrace() to trace all open() calls to find all what the program depends on (however, the output includes files,not only libraries).Or maybe filtering the output by the prefix in file name "/lib" helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string