Interpreting address in core file - linux

I am looking for any references or guidance on how to interpret the addresses in the backtrack of a core file.
For e.g. a typical backtrack would look like:
(gdb) where
#0 [0x000012345] in func1 + 123 (a.out + 0xABC)
#1 [0x000034345] in func2 + 567 (a.out + 0xea7)
..
I can compile with -g and get exact line numbers. But the executable in production environment would not have been compiled with -g in which case I get a stack as above.
I would like to know:
what the addresses and offsets
[0x000012345], +123 and 0xABC represent, how to interpret them and how to map them to line number in the source code.
Appreciate any help.

On most architectures, it is simply not possible to get good data out of core files without debugging information. Even simple things like stack unwinding will not work. An expert, familiar with the processor architecture and the ABI, will be able to determine many things from it, but it is a cumbersome, manual process.
For the rest of the answer, I assume you are using GCC. If you compile your executable with optimization and -g, GCC will create production-ready binaries with debugging information. Unless there are compiler bugs (which are rare), -g will not affect the generated code. The debugging information consists of data tables on the side (and this causes problems for debugging because the machine code and the original code may be very far apart in aspects such as the execution order of individual statements).
If you do not want to distribute production binaries with debugging information, you should still compile with -g, but remove the debugging information with strip before distributing the binaries, while retaining the unstripped binaries for your own records. When later analyzing a core file, you use these unstripped binaries, rather than the stripped ones you gave away. The in-memory offsets you quote are the same between stripped and unstripped binaries, which is why this works.
For more advanced usage, you could also separate the debugging information using tools such as eu-strip -f, but this does not seem necessary in your case.

Related

How to inspect Haskell bytecode

I am trying to figure out a bug (a serious performance downgrade). Unfortunately, I wasn't able to figure out why by going back many different versions of my code.
I am suspecting it could be some modifications to libraries that I've updated, not to mention in the meanwhile I've updated to GHC 7.6 from 7.4 (and if anybody knows if some laziness behavior has changed I would greatly appreciate it!).
I have an older executable of this code that does not have this bug and thus I wonder if there are any tools to tell me the library versions I was linking to from before? Like if it can figure out the symbols, etc.
GHC creates executables, which are notoriously hard to understand... On my Linux box I can view the assembly code by typing in
objdump -d <executable filename>
but I get back over 100K lines of code from just a simple "Hello, World!" program written in Haskell.
If you happen to have the GHC .hi files, you can get some information about the executable by typing in
ghc --show-iface <hi filename>
This won't give you the assembly code, but you can get some extra information that may prove useful.
As I mentioned in the comment above, on Linux you can use "ldd" to see what C-system libraries you used in the compile, but that is also probably less than useful.
You can try to use a disassembler, but those are generally written to disassemble to C, not anything higher level and certainly not Haskell. That being said, GHC compiles to C as an intermediary (at least it used to; has that changed?), so you might be able to learn something.
Personally I often find view system calls in action much more interesting than viewing pure assembly. On my Linux box, I can view all system calls by running using strace (use Wireshark for the network traffic equivalent):
strace <program executable>
This also will generate a lot of data, so it might only be useful if you know of some specific place where direct real world communication (i.e., changes to a file on the hard disk drive) goes wrong.
In all honesty, you are probably better off just debugging the problem from source, although, depending on the actual problem, some of these techniques may help you pinpoint something.
Most of these tools have Mac and Windows equivalents.
Since much has changed in the last 9 years, and apparently this is still the first result a search engine gives on this question (like for me, again), an updated answer is in order:
First of all, yes, while Haskell does not specify a bytecode format, bytecode is also just a kind of machine code, for a virtual machine. So for the rest of the answer I will treat them as the same thing. The “Core“ as well as the LLVM intermediate language, or even WASM could be considered equivalent too.
Secondly, if your old binary is statically linked, then of course, no matter the format your program is in, no symbols will be available to check out. Because that is what linking does. Even with bytecode, and even with just classic static #include in simple languages. So your old binary will be no good, no matter what. And given the optimisations compilers do, a classic decompiler will very likely never be able to figure out what optimised bits used to be partially what libraries. Especially with stream fusion and such “magic”.
Third, you can do the things you asked with a modern Haskell program. But you need to have your binaries compiled with -dynamic and -rdynamic, So not only the C-calling-convention libraries (e.g. .so), and the Haskell libraries, but also the runtime itself is dynamically loaded. That way you end up with a very small binary, consisting of only your actual code, dynamic linking instructions, and the exact data about what libraries and runtime were used to build it. And since the runtime is compiler-dependent, you will know the compiler too. So it would give you everything you need, but only if you compiled it right. (I recommend using such dynamic linking by default in any case as it saves memory.)
The last factor that one might forget, is that even the exact same compiler version might behave vastly differently, depending on what IT was compiled with. (E.g. if somebody put a backdoor in the very first version of GHC, and all GHCs after that were compiled with that first GHC, and nobody ever checked, then that backdoor could still be in the code today, with no traces in any source or libraries whatsoever. … Or for a less extreme case, that version of GHC your old binary was built with might have been compiled with different architecture options, leading to it putting more optimised instructions into the binaries it compiles for unless told to cross-compile.)
Finally, of course, you can profile even compiled binaries, by profiling their system calls. This will give you clues about which part of the code acted differently and how. (E.g. if you notice that your new binary floods the system with some slow system calls where the old one just used a single fast one. A classic OpenGL example would be using fast display lists versus slow direct calls to draw triangles. Or using a different sorting algorithm, or having switched to a different kind of data structure that fits your work load badly and thrashes a lot of memory.)

Determine judging parameters for a code in Linux

I am developing an online code judging software for c/c++/java codes.
I want to include various parameters for judging a code ,namely compilation time,execution time,memory usage ,just like the IDEONE API provides with.
How can i extract these parameters while compiling/executing a code in a LINUX environment?Are there any specific commands?
Also are there any other parameters which can be used to judge a code?
The judge verb is a bit strange in your question (which is perhaps too imprecise). Maybe you mean evaluate ?
Assuming the evaluated source code is compiled by a recent GCC compiler (i.e. version 4.7 or 4.8 of GCC) and that you can parameterize (or just repeat) its compilation, you could consider extending & customizing the GCC compiler for evaluation or metric purposes. This is possible either directly thru GCC plugins (painfully coded in C or C++), or thru MELT extensions (MELT is a domain specific language to extend and customize GCC).
You'll need some work to go this route, because you need to dive into GCC internals. The MELT probe might help you in understanding more the Gimple representation (inside GCC). You could also try compiling some sample code with gcc -fdump-tree-all which produce many dump files.
So the idea is that you would take time (days, perhaps weeks) to develop a MELT extension (e.g. in some file shiven.melt) for analysis, metrics and evaluation purposes, and that you would [re-] compile the example.c source code, e.g. with
gcc -fplugin=melt \
-fplugin-arg-melt-extra=shiven \
-fplugin-arg-melt-mode=shivenanalysis \
-c example.c
(of course you'll add other compiler flags, e.g. -O -I/some/include/dir/ ...)
Then, you could make a MELT extension to measure some characteristics of the compiled code, like number of functions, number of basic blocks, number of Gimple instructions, etc. This will happen at compilation time. Your MELT extension (in your file shiven.melt) could e.g. write some statistics in some database.
Extending GCC is meaningful for C, C++, Fortran, Ada .... source code, but much less for Java (because nobody uses GCC to compile Java, even if gcj exists, and because gcj probably supports a subset of some old Java standard).
Please subscribe to gcc-melt#googlegroups list and ask there for MELT related questions. Mention explicitly your MELT interest (perhaps your question) in your subscription.
There is the time command which gives you the execution time of a binary. With that you can get the compilation time, time gcc code.c, or execution time, time ./a.out. For memory usage you can use valgrind, or ps. With ps, if you are using stdin for input it should be simple. Just start the application, run ps at certain intervals in the backgound, and supply the input to the application.

Where did the first make binary come from?

I'm having to build gnu make from source for reasons too complicated to explain here.
I noticed to build it I require the make command itself, in the traditional fashion:
./configure
make install
So what if I didn't have the make binary already? Where did the first ever make binary come from?
From the same place the first gcc binary came from.
The first make was created probably using a shell script to do the build. After that, make would "make" itself.
It's a notable achievement in systems development when the platform becomes "self-hosting". That is the platform can build itself.
Things like "make make" and "gcc gcc.c".
Many language writers will create their language in another language (say, C) and when they have moved it far enough along, they will use that original bootstrap compiler to write a new compiler in the original language. Finally, they discard the original.
Back in the day, a friend was working on a debugger for OS/2, notable for being a multi-tasking operating system at the time. And he would regale about the times when they would be debugging the debugger, and find a bug. So, they would debug the debugger debugging the debugger. It's a novel concept and goes to the heart of computing and abstraction.
Inevitably, it all boils back to when someone keyed in something through a hardwire key pad or some other switches to get an initial program loaded. Then they leveraged that program to do other work, and it all just grows from there.
Stuart Feldman, then at AT&T, wrote the source code for make around the time of 7th Edition UNIX™, and used manual compilation (or maybe a shell script) until make was working well enough to be used to build itself. You can find the UNIX Programmer's Manual for 7th Edition online, and in particular, the original paper describing the original version of make, dated August 1978.
make is just one convenience tool. It is still possible to invoke cc, ld, etc. manually or via other scripting tools.
If you're building GNU make, have a look at build.sh in the source tree after running configure:
# Shell script to build GNU Make in the absence of any `make' program.
# build.sh. Generated from build.sh.in by configure.
Compiling C programs is not the only way to produce an executable file. The first make executable (or more notably the C compiler itself) could for example be an assembly program, or it could be hand coded in machine code. It could also be cross compiled on a completely different system.
The essence of make is that it is a simplified way of running some commands.
To make the first make, the author had to manually act as make, and run gcc or whatever toolset was available, rather than having it run automatically.

Verifying two different build architectures (one a re-write of the other) are functionally equivalent

I'm re-writing a build that produces a number of things (shared/static libraries, jars, executables, etc). The question came up whether there's a way to verify that the results are functionally equivalent without doing a full top-to-bottom test of the resulting software.
However, that is proving to be more difficult to do than I anticipated.
As an example, I expected that the md5 of two objects produced from the same source (sun studio C++ compiler) and command-line parameters would have the same md5 hash, but that isn't the case. I can build the file, rename it, build again, and they have different hashes.
With that said ... is there a way do a quick check to verify that two files produced from separate build architectures of the same source tree (eg, two shared objects) are functionally equivalent?
edit I am sorry, I neglected to mention this is for a debug build ... when debugging flags aren't used the binaries are identical, but they've been using debugging flags by default for so many years their stuff breaks when you remove the debugging flags (part of the reason I'm re-writing the build is to take that particular 'feature' out of the build so we can get some proper testing going)
Windows DLLs have a link timestamp (TimeDateStamp) as part of PE image.
Looking at linker options, I don't see an option to suppress that. So re-linking a DLL (or an EXE) will always produce a different binary.
You could write a tool to zero out these timestamps (always at a fixed offset from file start), and compare MD5s afterwards. But you'll likely discover lots of other differences as well. In particular, any program that uses __DATE__ or __TIME__ builtins will give you trouble.
We've had to work quite hard to achieve bit-identical rebuilds (using GNU toolchain). It's possible (at least for open-source tools, on Linux), but not easy (as you've discovered).
I forgot about this question; I'm revisiting so I can give the answer I came up with.
objcopy can be used to produce a new binary file in different formats. It's been a few years since I worked on this, so the specifics escape me, but here's what I recall:
objcopy can strip various things out (debug info, symbol information, etc), but even after stripping stuff out I was still seeing different hashes between objects.
In the end I found I could convert it from ELF to other formats. I ended up dumping it to another format (I think I chose SREC) that consistently provided the same MD5 for objects built at different times with identical source/flags.
I'm betting I could have done this a better way with objcopy (or perhaps another binutils tool), but it was good enough to satisfy our concerns.

Can autotools create multi-platform makefiles

I have a plugin project I've been developing for a few years where the plugin works with numerous combinations of [primary application version, 3rd party library version, 32-bit vs. 64-bit]. Is there a (clean) way to use autotools to create a single makefile that builds all versions of the plugin.
As far as I can tell from skimming through the autotools documentation, the closest approximation to what I'd like is to have N independent copies of the project, each with its own makefile. This seems a little suboptimal for testing and development as (a) I'd need to continually propagate code changes across all the different copies and (b) there is a lot of wasted space in duplicating the project so many times. Is there a better way?
EDIT:
I've been rolling my own solution for a while where I have a fancy makefile and some perl scripts to hunt down various 3rd party library versions, etc. As such, I'm open to other non-autotools solutions. For other build tools, I'd want them to be very easy for end users to install. The tools also need to be smart enough to hunt down various 3rd party libraries and headers without a huge amount of trouble. I'm mostly looking for a linux solution, but one that also works for Windows and/or the Mac would be a bonus.
If your question is:
Can I use the autotools on some machine A to create a single universal makefile that will work on all other machines?
then the answer is "No". The autotools do not even make a pretense at trying to do that. They are designed to contain portable code that will determine how to create a workable makefile on the target machine.
If your question is:
Can I use the autotools to configure software that needs to run on different machines, with different versions of the primary software which my plugin works with, plus various 3rd party libraries, not to mention 32-bit vs 64-bit issues?
then the answer is "Yes". The autotools are designed to be able to do that. Further, they work on Unix, Linux, MacOS X, BSD.
I have a program, SQLCMD (which pre-dates the Microsoft program of the same name by a decade and more), which works with the IBM Informix databases. It detects the version of the client software (called IBM Informix ESQL/C, part of the IBM Informix ClientSDK or CSDK) is installed, and whether it is 32-bit or 64-bit. It also detects which version of the software is installed, and adapts its functionality to what is available in the supporting product. It supports versions that have been released over a period of about 17 years. It is autoconfigured -- I had to write some autoconf macros for the Informix functionality, and for a couple of other gizmos (high resolution timing, presence of /dev/stdin etc). But it is doable.
On the other hand, I don't try and release a single makefile that fits all customer machines and environments; there are just too many possibilities for that to be sensible. But autotools takes care of the details for me (and my users). All they do is:
./configure
That's easier than working out how to edit the makefile. (Oh, for the first 10 years, the program was configured by hand. It was hard for people to do, even though I had pretty good defaults set up. That was why I moved to auto-configuration: it makes it much easier for people to install.)
Mr Fooz commented:
I want something in between. Customers will use multiple versions and bitnesses of the same base application on the same machine in my case. I'm not worried about cross-compilation such as building Windows binaries on Linux.
Do you need a separate build of your plugin for the 32-bit and 64-bit versions? (I'd assume yes - but you could surprise me.) So you need to provide a mechanism for the user to say
./configure --use-tppkg=/opt/tp/pkg32-1.0.3
(where tppkg is a code for your third-party package, and the location is specifiable by the user.) However, keep in mind usability: the fewer such options the user has to provide, the better; against that, do not hard code things that should be optional, such as install locations. By all means look in default locations - that's good. And default to the bittiness of the stuff you find. Maybe if you find both 32-bit and 64-bit versions, then you should build both -- that would require careful construction, though. You can always echo "Checking for TP-Package ..." and indicate what you found and where you found it. Then the installer can change the options. Make sure you document in './configure --help' what the options are; this is standard autotools practice.
Do not do anything interactive though; the configure script should run, reporting what it does. The Perl Configure script (note the capital letter - it is a wholly separate automatic configuration system) is one of the few intensively interactive configuration systems left (and that is probably mainly because of its heritage; if starting anew, it would most likely be non-interactive). Such systems are more of a nuisance to configure than the non-interactive ones.
Cross-compilation is tough. I've never needed to do it, thank goodness.
Mr Fooz also commented:
Thanks for the extra comments. I'm looking for something like:
./configure --use-tppkg=/opt/tp/pkg32-1.0.3 --use-tppkg=/opt/tp/pkg64-1.1.2
where it would create both the 32-bit and 64-bit targets in one makefile for the current platform.
Well, I'm sure it could be done; I'm not so sure that it is worth doing by comparison with two separate configuration runs with a complete rebuild in between. You'd probably want to use:
./configure --use-tppkg32=/opt/tp/pkg32-1.0.3 --use-tppkg64=/opt/tp/pkg64-1.1.2
This indicates the two separate directories. You'd have to decide how you're going to do the build, but presumably you'd have two sub-directories, such as 'obj-32' and 'obj-64' for storing the separate sets of object files. You'd also arrange your makefile along the lines of:
FLAGS_32 = ...32-bit compiler options...
FLAGS_64 = ...64-bit compiler options...
TPPKG32DIR = #TPPKG32DIR#
TPPKG64DIR = #TPPKG64DIR#
OBJ32DIR = obj-32
OBJ64DIR = obj-64
BUILD_32 = #BUILD_32#
BUILD_64 = #BUILD_64#
TPPKGDIR =
OBJDIR =
FLAGS =
all: ${BUILD_32} ${BUILD_64}
build_32:
${MAKE} TPPKGDIR=${TPPKG32DIR} OBJDIR=${OBJ32DIR} FLAGS=${FLAGS_32} build
build_64:
${MAKE} TPPKGDIR=${TPPKG64DIR} OBJDIR=${OBJ64DIR} FLAGS=${FLAGS_64} build
build: ${OBJDIR}/plugin.so
This assumes that the plugin would be a shared object. The idea here is that the autotool would detect the 32-bit or 64-bit installs for the Third Party Package, and then make substitutions. The BUILD_32 macro would be set to build_32 if the 32-bit package was required and left empty otherwise; the BUILD_64 macro would be handled similarly.
When the user runs 'make all', it will build the build_32 target first and the build_64 target next. To build the build_32 target, it will re-run make and configure the flags for a 32-bit build. Similarly, to build the build_64 target, it will re-run make and configure the flags for a 64-bit build. It is important that all the flags affected by 32-bit vs 64-bit builds are set on the recursive invocation of make, and that the rules for building objects and libraries are written carefully - for example, the rule for compiling source to object must be careful to place the object file in the correct object directory - using GCC, for example, you would specify (in a .c.o rule):
${CC} ${CFLAGS} -o ${OBJDIR}/$*.o -c $*.c
The macro CFLAGS would include the ${FLAGS} value which deals with the bits (for example, FLAGS_32 = -m32 and FLAGS_64 = -m64, and so when building the 32-bit version,FLAGS = -m32would be included in theCFLAGS` macro.
The residual issues in the autotools is working out how to determine the 32-bit and 64-bit flags. If the worst comes to the worst, you'll have to write macros for that yourself. However, I'd expect (without having researched it) that you can do it using standard facilities from the autotools suite.
Unless you create yourself a carefully (even ruthlessly) symmetric makefile, it won't work reliably.
As far as I know, you can't do that. However, are you stuck with autotools? Are neither CMake nor SCons an option?
We tried it and it doesn't work! So we use now SCons.
Some articles to this topic: 1 and 2
Edit:
Some small example why I love SCons:
env.ParseConfig('pkg-config --cflags --libs glib-2.0')
With this line of code you add GLib to the compile environment (env). And don't forget the User Guide which just great to learn SCons (you really don't have to know Python!). For the end user you could try SCons with PyInstaller or something like that.
And in comparison to make, you use Python, so a complete programming language! With this in mind you can do just everything (more or less).
Have you ever considered to use a single project with multiple build directories?
if your automake project is implemented in a proper way (i.e.: NOT like gcc)
the following is possible:
mkdir build1 build2 build3
cd build1
../configure $(YOUR_OPTIONS)
cd build2
../configure $(YOUR_OPTIONS2)
[...]
you are able to pass different configuration parameters like include directories and compilers (cross compilers i.e.).
you can then even run this in a single make call by running
make -C build1 -C build2 -C build3

Resources