Getting a full path from addr2line - linux

I'm trying to automate some debugging tasks. In certain cases, I print the value of $ra [this is a MIPS machine] and parts of the stack as hex addresses. During debugging, I use addr2line to convert them into file:line pairs.
I'd like to automate this procedure.
The problem is that addr2line returns a filename that equivelent to the value of __FILE__ at compilation time; i.e., the name of the file as passed to the compiler. This is usually foo.c, sometimes src/foo.c. As my project has several hundred directories in total, this may not be enough to uniquely identify the file (there may be 1/foo.c, 2/foo.c, etc). Even if it was deterministic, it seems rather inefficient to start running find in my screen for each argument [I suppose I could build a hash table and save them, but I'd like to keep this as a straightforward bash script]
GDB seems to get the right file. If I look at the actual source file with debugging symbols, I can also see that right after the filename there appears to be the full path to the __FILE__ [i.e., if __FILE__ is src/foo.c, and it's really in /home/me/projects/something/comp1/src/foo.c, I will see /home/me/projects/something/comp1 in the file. How can I get this progmatically?
Thanks.

This is very surprising behavior. I'm unable to reproduce it in:
Linux with gcc 4.1.2 and addr2line 2.17.50.0.6
Cygwin with gcc 4.3.4 and gcc 3.4.4 and addr2line 2.20.51.20100410
addr2line should rely on the debug information stored in the executable. And the debug information should contain absolute paths (regardless of what source path was given to the compiler) in order to avoid any ambiguities when using a debugger. Everywhere I try it, addr2line always shows an absolute path.
Assuming you are using make for your build system, one option, albeit a probably painful one, would be to change your makefiles to use a non-recursive strategy (something you really ought to be doing anyway). With such a system, only a single instance of make is running, from a single working directory (usually the top-level of your source tree). Therefore all invocations of the compiler specify the full path to the source file (relative to the root of the source tree). This would solve your problem if, in fact, addr2line always shows the filenames as they were specified to the compiler. Not the best solution, but one that would work. And as a side benefit, you'd get all the advantages of non-recursive make.

Related

Linux elf binary recuperate file that binary read/writes

I need to get files that a binary uses.
I can view all dependency of an ELF binary in the .interp section, but can I get conf files of my binary?
For example if a binary reads /etc/host, I want to see /etc/host in a section of my ELF file.
I do not see that in the documentation:
https://refspecs.linuxfoundation.org/LSB_1.1.0/gLSB/specialsections.html
I need to get files that some binary executable uses.
You can't get (all of) them. A file path used by some executable could be computed at runtime (and that is very often the case, just think of the cat(1) program). Solving that problem (of reliably computing all the files used by a program) in general could be proved equivalent to the Halting problem.
However, in practice, the strings(1) utility might help you guess some of the files (statically) referred by an executable.
You could also use strace(1) to understand (dynamically) what files are open(2)-ed during some particular execution.
Read also carefully the documentation of your executable. If it is a free software, study also its source code.

Using GNU Standard Directory Variables inside executable

Often one needs the location of one of the standard GNU directories inside the executable. Unfortunately GNU autoconf does not provide a standard way to do this but suggests several work around, each having different disadvantages, a common way to access the installed location is this to add preprocess define for the location in CPPFLAGS:
AM_CPPFLAGS = -DDATADIR='"$(datadir)"'
However, the GNU Autoconf manual's section for defining directories contains the following sentence:
Note that all the previous solutions hard wire the absolute name of these directories in the executables, which is not a good property. You may try to compute the names relative to prefix, and try to find prefix at runtime, this way your package is relocatable.
Is there a library or any standard way to compute the GNU directories inside an executable as suggested in the quoted paragraph? Would that have other disadvantages compared to the preprocessor define mentioned above?
I think the docs are rather clear about this: The standard way is to not make any assumptions about the absolute path, and use relative paths instead. Especially you should not make any assumptions about ${prefix}
So if your application needs to access shared data, access it via ../share/foo/foodata.txt rather than using /usr/local/share/foo/foodata.txt; this way you can easily re-locate your application.
Afaik, there is no external library that computes the standard paths for you, based on your calling binary.
This is probably for two reasons:
if the binary indeed uses the standard paths, then it's trivial to calculate those paths yourself (using relative paths). what would a library do better?
if the binary does not use the standard paths (e.g. because the builder used something like the following (admittedly hypothetical) example), then the task of resolving these paths is virtually impossible; so a library won't help you either
ex:
./configure --sbindir=/home/me/sbin --bindir=/opt/foo/bin
make pkglibdir=/usr/lib/goo/
make install libdir=/usr/local/foo/lib/
A helper-library (or your application) might record all those paths into some auxiliary file (for additional lookups if the standard-paths fail), but I think the biggest problem is that there is no defined place where to store that file
libdir or datadir are obviously not good (as the data should help resolve these paths, so cannot rely on them)
putting the data into the same directory as the application binary breaks the assumption of bindir only containing executables.
putting the data into the application binary might require that binary to be modified during make install, which sounds very dirty as well.

Verifying two different build architectures (one a re-write of the other) are functionally equivalent

I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)
Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).
I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.

Gurus say that LD_LIBRARY_PATH is bad - what's the alternative?

I read some articles about problems in using the LD_LIBRARY_PATH, even as a part of a wrapper script:
http://linuxmafia.com/faq/Admin/ld-lib-path.html
http://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the
In this case - what are the recommended alternatives?
Thanks.
You can try adding:
-Wl,-rpath,path/to/lib
to the linker options. This will save you the need to worry about the LD_LIBRARY_PATH environment variable, and you can decide at compile time to point to a specific library.
For a path relative to the binary, you can use $ORIGIN, eg
-Wl,-rpath,'$ORIGIN/../lib'
($ORIGIN may not work when statically linking to shared libraries with ld, use -Wl,--allow-shlib-undefined to fix this)
I've always set LD_LIBRARY_PATH, and I've never had a problem.
To quote you first link:
When should I set LD_LIBRARY_PATH? The short answer is never. Why? Some users seem to set this environment variable because of bad advice from other users or badly linked code that they do not know how to fix.
That is NOT what I call a definitive problem statement. In fact it brings to mind I don't like it. [YouTube, but SFW].
That second blog entry (http://blogs.oracle.com/ali/entry/avoiding_ld_library_path_the) is much more forthcoming on the nature of the problem... which appears to be, in a nutshell, library version clashes ThisProgram requires Foo1.2, but ThatProgram requires Foo1.3, hence you can't run both programs (easily). Note that most of these problems are negated by a simple wrapper script which sets the LD_LIBRARY_PATH for just the executing shell, which is (almost always) a separate child process of interactive shell.
Note also that the alternatives are pretty well explained in the post.
I'm just confused as to why you would post a question containing links to articles which apparently answer your question... Do you have a specific question which wasn't covered (clearly enough) in either of those articles?
the answer is in the first article you quoted.
In UNIX the location of a library can be specified with the -L dir option to the compiler.
....
As an alternative to using the -L and -R options, you can set the environment variable LD_RUN_PATH before compiling the code.
I find that the existing answers to not actually answer the question in a straightforward way:
LD_RUN_PATH is used by the linker (see ld) at the time you link your software. It is used only if you have no -rpath ... on the command line (-Wl,rpath ... on the gcc command line). The path(s) defined in that variable are added to the RPATH entry in your ELF binary file. (You can see that RPATH using objdump -x binary-filename—in most cases it is not there though! It appears in my development binaries, but once the final version gets installed RPATH gets removed.)
LD_LIBRARY_PATH is used at runtime, when you want to specify a directory that the dynamic linker (see ldd) needs to search for libraries. Specifying the wrong path could lead to loading the wrong libraries. This is used in addition to the RPATH value defined in your binary (as in 1.)
LD_RUN_PATH really causes no security threat unless you are a programmer and don't know how to use it. As I am using CMake to build my software, the -rpath is used all the time. That way I do not have to install everything to run my software. ldd can find all the .so files automatically. (the automake environment was supposed to do that too, but it was not very good at it, in comparison.)
LD_LIBRARY_PATH is a runtime variable and thus you have to be careful with it. That being said, many shared object would be really difficult to deal with if we did not have that special feature. Whether it is a security threat, probably not. If a hacker takes a hold of your computer, LD_LIBRARY_PATH is accessible to that hacker anyway. What could happen is that you use the wrong path(s) in that variable, your binary may not load, but if it loads you may end up with a crashing binary or at least a binary that does not work quite right. One concern is that over time you get new versions of the library and you are likely to forget to remove the LD_LIBRARY_PATH which means you may be using an unsecure version of the library.
The one other possibility for security is if the hacker installs a fake library of the same name as what the binary is searching, library that includes all the same functions, but that has some of those functions replaced with sneaky code. He can get that library loaded by changing the LD_LIBRARY_PATH variable. Then it will eventually get executed by the hacker. Again, if the hacker can add such a library to your system, he's already in and probably does not need to do anything like that in the first place (since he's in he has full control of your system anyway.) Because in reality, if the hacker can only place the library in his account he won't do anything much (unless your Unix box is not safe overall...) If the hacker can replace one of your /usr/lib/... libraries, he already has full access to your system. So LD_LIBRARY_PATH is not needed.

Are there good reasons not to exploit '#!/bin/make -f' at the top of a makefile to give an executable makefile?

Mostly for my amusement, I created a makefile in my $HOME/bin directory called rebuild.mk, and made it executable, and the first lines of the file read:
#!/bin/make -f
#
# Comments on what the makefile is for
...
all: ${SCRIPTS} ${LINKS} ...
...
I can now type:
rebuild.mk
and this causes make to execute.
What are the reasons for not exploiting this on a permanent basis, other than this:
The makefile is tied to a single directory, so it really isn't appropriate in my main bin directory.
Has anyone ever seen the trick exploited before?
Collecting some comments, and providing a bit more background information.
Norman Ramsey reports that this technique is used in Debian; that is interesting to know. Thank you.
I agree that typing 'make' is more idiomatic.
However, the scenario (previously unstated) is that my $HOME/bin directory already has a cross-platform main makefile in it that is the primary maintenance tool for the 500+ commands in the directory.
However, on one particular machine (only), I wanted to add a makefile for building a special set of tools. So, those tools get a special makefile, which I called rebuild.mk for this question (it has another name on my machine).
I do get to save typing 'make -f rebuild.mk' by using 'rebuild.mk' instead.
Fixing the position of the make utility is problematic across platforms.
The #!/usr/bin/env make -f technique is likely to work, though I believe the official rules of engagement are that the line must be less than 32 characters and may only have one argument to the command.
#dF comments that the technique might prevent you passing arguments to make. That is not a problem on my Solaris machine, at any rate. The three different versions of 'make' I tested (Sun, GNU, mine) all got the extra command line arguments that I type, including options ('-u' on my home-brew version) and targets 'someprogram' and macros CC='cc' WFLAGS=-v (to use a different compiler and cancel the GCC warning flags which the Sun compiler does not understand).
I would not advocate this as a general technique.
As stated, it was mostly for my amusement. I may keep it for this particular job; it is most unlikely that I'd use it in distributed work. And if I did, I'd supply and apply a 'fixin' script to fix the pathname of the interpreter; indeed, I did that already on my machine. That script is a relic from the first edition of the Camel book ('Programming Perl' by Larry Wall).
One problem with this for generally distributable Makefiles is that the location of make is not always consistent across platforms. Also, some systems might require an alternate name like gmake.
Of course one can always run the appropriate command manually, but this sort of defeats the whole purpose of making the Makefile executable.
I've seen this trick used before in the debian/rules file that is part of every Debian package.
To address the problem of make not always being in the same place (on my system for example it's in /usr/bin), you could use
#!/usr/bin/env make -f
if you're on a UNIX-like system.
Another problem is that by using the Makefile this way you cannot override variables, by doing, for example make CFLAGS=....
"make" is shorter than "./Makefile", so I don't think you're buying anything.
The reason I would not do this is that typing "make" is more idiomatic to building Makefile based projects. Imagine if every project you built you had to search for the differently named makefile someone created instead of just typing "make && make install".
You could use a shell alias for this too.
We can look at this another way: is it a good idea to design a language whose interpreter looks for a fixed filename if you don't give it one? What if python looked for Pythonfile in the absence of a script name? ;)
You don't need such a mechanism in order to have a convention based around a known name. Example: Autoconf's ./configure script.

Resources