Why does running objcopy --strip-all in a rust program halves their size?

Why does running objcopy --strip-all in a rust program halves their size? - rust

When I execute objcopy --strip-all in any rust program it halves their size. For example if I compile an normal hello world application with cargo build --release it ends with an 3 mb executable (in linux). Then when i run objcopy --strip-all on the executable i end with an 330 kb executable. Why does this happen?
I also tested this in windows with x86_64-pc-windows-gnu as my toolchain and it also lowered down the size of the executable from 4 mb to 1 mb.
In windows my toolchain is nightly 2021-07-22. In linux my toolchain is nightly 2021-07-05.

When you generate a binary, the Rust compiler generates a lot of debugging information, as well as other information such as symbol names for each symbol. Most other compilers do this as well, sometimes with an option (e.g., -g). Having this data is very helpful for debugging, even if you're compiling in release mode.
What you're doing with --strip-all is removing all of this extra data. In Rust, most of the data you're removing is just debugging information, and in a typical Linux distro, this data is stripped out of the binary and stored in special debug packages so it can be used if needed, but otherwise not downloaded.
This data isn't absolutely needed to run the program, so you may decide to strip it (which is usually done with the strip binary). If size isn't a concern for you, keeping it to aid debugging in case of a problem may be more helpful.

Related

"Illegal instruction" when running ARM code targeting my CPU

I'm compiling a rather large project for ARM. I'm using an AT91SAM9G25-EK as a devboard running a Debian ARM image. All libraries and executables in the image seem to be compiled for the armv4t instruction set.
My CPU is an ARM926EJ-S, which should run armv5tej code.
I'm using GCC to cross compile for my board. My CXX flags look like the following:
set(CMAKE_CXX_FLAGS "--signed-char --sysroot=${SYSROOT} -mcpu=arm926ej-s -mtune=arm926ej-s -mfloat-abi=softfp" CACHE STRING "" FORCE)
If I try to run this on my board, I get an Illegal Instruction signal (SIGILL) during initialization of one of my dependencies (using armv4t).
If I enable thumb mode (-mthumb -mthumb-interwork) it works, but uses Thumb for all the code, which in my case runs slower (I'm doing some serious number crunching).
In this case, if I specify one function to be compiled for ARM mode (using __attribute__((target("arm")))) it will run fine until that function is called, then exit with SIGILL.
I'm lost. Is it that bad I'm linking against libraries using armv4t? Am I misunderstanding the way ARM modes work? Is it something in the linux kernel?

What softfp means is to use using the soft-float calling convention between functions, but still use the hardware FPU within them. Assuming your cross-compiler is configured with a default -mfpu option other than "none" (run arm-whatever-gcc -v and look for --with-fpu= to check), then you've got a problem, because as far as I can see from the Atmel datasheet, SAM9G25 doesn't have an FPU.
My first instinct would be to slap GDB on there, catch the signal and disassemble the offending instruction to make sure, but the fact that Thumb code works OK is already a giveaway (Thumb before ARMv6T2 doesn't include any coprocessor instructions, and thus can't make use of an FPU).
In short, use -mfloat-abi=soft to ensure the ARM code actually uses software floating-point and avoids poking a non-existent FPU. And if the "serious number crunching" involves a lot of floating-point, perhaps consider getting a different MCU...

Why does a shared object fail if it has extra symbols compared to the original

I have a stripped ld.so that I want to replace with the unstripped version (so that valgrind works). I have ensured that I have the same version of glib and the cross compiler.
I have compiled the shared object, calling 'file' on it shows that it is compiled correctly (the only difference with the original being the 'unstripped' and being about 15% bigger). Unfortunately, it then causes a kernel panic (unable to init) on start up. Stripping the newly compiled .so, readelf-ing it and diff-ing it with the original, shows that there were extra symbols in the new version of the .so . All of the old symbols were still present, so what I don't understand is why the kernel panics with those extra symbols there.
I would expect the extra symbols to have no affect on the kernel start up, as they should never be called, so why do I get a kernel panic?
NB: To be clear - I will still need to investigate why there are extra symbols, but my question is about why these unused symbols cause problems.

The kernel (assuming Linux) doesn't depend on or use ld.so in any way, shape or form. The reason it panics is most likely that it can't exec any of the user-level programs (such as /bin/init and /bin/sh), which do use ld.so.
As to why your init doesn't like your new ld.so, it's hard to tell. One common mistake is to try to replace ld.so with the contents of /usr/lib/debug/ld-X.Y.so. While that file looks like it's not much different from the original /lib/ld-X.Y.so, it is in fact very different, and can't be used to replace the original, only to debug the original (/usr/lib/debug/ld-X.Y.so usually contains only debug sections, but none of code and data sections of /lib/ld-X.Y.so, and so attempts to run it usually cause immediate SIGSEGV).
Perhaps you can set up a chroot, mimicking your embedded environment, and run /bin/ls in it? The error (or a core dump) this will produce will likely tell you what's wrong with your ld.so.

How to increase probability of Linux core dumps matching symbols?

I have a very complex cross-platform application. Recently my team and I have been running stress tests and have encountered several crashes (and core dumps accompanying them). Some of these core dumps are very precise, and show me the exact location where the crash occurred with around 10 or more stack frames. Others sometimes have just one stack frame with ?? being the only symbol!
What I'd like to know is:
Is there a way to increase the probability of core dumps pointing in the right direction?
Why isn't the number of stack frames reported consistent?
Any best practice advise for managing core dumps.
Here's how I compile the binaries (in release mode):
Compiler and platform: g++ with glibc-2.3.2-95.50 on CentOS 3.6 x86_64 -- This helps me maintain compatibility with older versions of Linux.
All files are compiled with the -g flag.
Debug symbols are stripped from the final binary and saved in a separate file.
When I have a core dump, I use GDB with the executable which created the core, and the symbols file. GDB never complains that there's a mismatch between the core/binary/symbols.
Yet I sometimes get core dumps with no symbols at all! It's understandable that I'm linking against non-debug version of libstdc++ and libgcc, but it would be nice if at least the stack trace shows me where in my code did the faulty instruction call originate (although it may ultimately end in ??).

Others sometimes have just one stack frame with "??" being the only symbol!
There can be many reasons for that, among others:
the stack frame was trashed (overwritten)
EBP/RBP (on x86/x64) is currently not holding any meaningful value — this can happen e.g. in units compiled with -fomit-frame-pointer or asm units that do so
Note that the second point may occur simply by, for example, glibc being compiled in such a way. Having the debug info for such system libraries installed could mitigate this (something like what the glibc-debug{info,source} packages are on openSUSE).
gdb has more control over the program than glibc, so glibc's backtrace call would naturally be unable to print a backtrace if gdb cannot do so either.
But shipping the source would be much easier :-)

As an alternative, on a glibc system, you could use the backtrace function call (or backtrace_symbols or backtrace_symbols_fd) and filter out the results yourself, so only the symbols belonging to your own code are displayed. It's a bit more work, but then, you can really tailor it to your needs.

Have you tried installing debugging symbols of the various libraries that you are using? For example, my distribution (Ubuntu) provides libc6-dbg, libstdc++6-4.5-dbg, libgcc1-dbg etc.
If you're building with optimisation enabled (eg. -O2), then the compiler can blur the boundary between stack frames, for example by inlining. I'm not sure that this would cause backtraces with just one stack frame, but in general the rule is to expect great debugging difficulty since the code you are looking it in the core dump has been modified and so does not necessarily correspond to your source.

Is there any difference between executable binary files between distributions?

As all Linux distributions use the same kernel, is there any difference between their executable binary files?
If yes, what are the main differences? Or does that mean we can build a universal linux executable file?

All Linux distributions use the same binary format ELF, but there is still some differences:
different cpu arch use different instruction set.
the same cpu arch may use different ABI, ABI defines how to use the register file, how to call/return a routine. Different ABI can not work together.
Even on same arch, same ABI, this still does not mean we can copy one binary file in a distribution to another. Since most binary files are not statically linked, so they depends on the libraries under the distribution, which means different distribution may use different versions or different compilation configuration of libraries.
So if you want your program to run on all distribution, you may have to statically link a version that depends on the kernel's syscall only, even this you can only run a specified arch.
If you really want to run a program on any arch, then you have to compile binaries for all arches, and use a shell script to start up the right one.

All Linux ports (that is, the Linux kernel on different processors) use ELF as the file format for executables and libraries. A specific ELF binary is labeled with a single architecture/OS on which it can run (although some OSes have compatibility to run ELF binaries from other OSes).
Most ports have support for the older a.out format. (Some processors are new enough that there have never existed any a.out executables for them.)
Some ports support other executable file formats as well; for example, the PA-RISC port has support for HP-UX's old SOM executables, and the μcLinux (nonmmu) ports support their own FLAT format.
Linux also has binfmt_misc, which allows userspace to register handlers for arbitrary binary formats. Some distributions take advantage of this to be able to execute Windows, .NET, or Java applications -- it's really still launching an interpreter, but it's completely transparent to the user.
Linux on Alpha has support for loading Intel binaries, which are run via the em86 emulator.
It's possible to register binfmt_misc for executables of other architectures, to be run with qemu-user.
In theory, one could create a new format -- perhaps register a new "architecture" in ELF -- for fat binaries. Then the kernel binfmt loader would have to be taught about this new format, and you wouldn't want to miss the ld-linux.so dynamic linker and the whole build toolchain. There's been little interest in such a feature, and as far as I know, nobody is working on anything like it.

Almost all linux program files use the ELF standard.

Old Unixes also used COFF format. You may still find executables from times of yore in this format. Linux still has support for it (I don't know if it's compiled in current distros, though).

If you want to create a program that runs an all Linux distributions, you can consider using scripting languages (like Python and Perl) or a platform independent programming language like Java.
Programs written in scripting languages are complied at execution time, which means they are always compiled to match the platform they are executed on, and, hence, should always work (given that the libraries are set up properly).
Programs written in Java, on the other hand, are compiled before distributing them, but can be executed on any Linux distribution as long as it has a Java VM installed.
Furthermore, programs written in Java can be run on other operating systems like MS Windows and Mac OS.
The same is true for many programs written in Python and Perl; however, whether a Python or Perl program will work on another operating system depends on what libraries are used by that program and whether these libraries are available on the other operating systems.

How to reduce compilation cost in GCC and make?

I am trying to build some big libraries, like Boost and OpenCV, from their source code via make and GCC under Ubuntu 8.10 on my laptop. Unfortunately the compilation of those big libraries seem to be big burden to my laptop (Acer Aspire 5000). Its fan makes higher and higher noises until out of a sudden my laptop shuts itself down without the OS gracefully turning off.
So I wonder how to reduce the compilation cost in case of make and GCC?
I wouldn't mind if the compilation will take much longer time or more space, as long as it can finish without my laptop shutting itself down.
Is building the debug version of libraries always less costly than building release version because there is no optimization?
Generally speaking, is it possible to just specify some part of a library to install instead of the full library? Can the rest be built and added into if later found needed?
Is it correct that if I restart my laptop, I can resume compilation from around where it was when my laptop shut itself down? For example, I noticed that it is true for OpenCV by looking at the progress percentage shown during its compilation does not restart from 0%. But I am not sure about Boost, since there is no obvious information for me to tell and the compilation seems to take much longer.
UPDATE:
Thanks, brianegge and Levy Chen! How to use the wrapper script for GCC and/or g++? Is it like defining some alias to GCC or g++? How to call a script to check sensors and wait until the CPU temperature drops before continuing?

I'd suggest creating a wrapper script for gcc and/or g++
#!/bin/bash
sleep 10
exec gcc "$#"
Save the above as "gccslow" or something, and then:
export CC="gccslow"
Alternatively, you can call the script gcc and put it at the front of your path. If you do that, be sure to include the full path in the script, otherwise, the script will call itself recursively.
A better implementation could call a script to check sensors and wait until the CPU temperature drops before continuing.

For your latter question: A well written Makefile will define dependencies as a directed a-cyclical graph (DAG), and it will try to satisfy those dependencies by compiling them in the order according to the DAG. Thus as a file is compiled, the dependency is satisfied and need not be compiled again.
It can, however, be tricky to write good Makefiles, and thus sometime the author will resort to a brute force approach, and recompile everything from scratch.
For your question, for such well known libraries, I will assume the Makefile is written properly, and that the build should resume from the last operation (with the caveat that it needs to rescan the DAG, and recalculate the compilation order, that should be relatively cheap).

Instead of compiling the whole thing, you can compile each target separately. You have to examine the Makefile for identifying them.
Tongue-in-cheek: What about putting the laptop into the fridge while compiling?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string