Modifying contents of data section in ELF

Modifying contents of data section in ELF - linux

I have a program that initializes an array to 0. I have pointed the value of that array to a custom section using: __attribute__((section(".mysection")))
struct my_struct my_array[1] __attribute__((section(".mysection"))) = {
{0, 0},
};
The above is only so that we have a default and the linker marks the section as loadable and also includes it in the appropriate section list.
Now I wish to edit the generated ELF and modify the contents of that struct as I choose. I already have a binary file which contains the contents that I wish to have for that section.
I tried using --remove-section and --add-section but could not force the new section to be part of the sections.
Not sure if --update-section would help here, but the microcontroller I have doesn't have --update-section in objcopy and when I try the public version of it, that says that it doesnt support the bfd target.
FWIW, the ELF doesnt use any relocatable addresses etc. All addresses are physical addresses in memory.
Is there a way to achieve this? I just need to replace the contents of the section and modify its length.

In case other, more simple ways, are not suitable, you may implement such functionality by using ELFIO library.

Related

Linux ELF shared library issue

Currently I am working with ELF files and trying to deal with loading SO files. I am trying to "forcibly" link a new (a fake one, without actual calls to the code) SO dependency into executable file. To do that, I modified the .dynstr section contents (created a new section, filled it with the new contents, and resolved all sh_link fileds of Elf64_Shdr entries). Also I modified the .dynamic section (it has more than one null entry, so I modified one) to have DT_NEEDED type with linkage to the needed third-party SO name.
My small test app, being analyzed, appears to be fine (as readelf with -d option, or objdump -p, show). Nevertheless, when trying to run the application, it tells:
error while loading shared libraries: ��oU: cannot open shared object file: No such file or directory
Every time running, the name is different. This makes me think some addresses in the ELF loaded are invalid.
I understand that this way of patching is highly error-prone, but I am interested anyway. So, my question is: are there any ELF tools (like gdb or strace), which can debug image loading process (i.e. which can tell one what is wrong before entry point is hit)? Or are there any switches or options, which can help with this situation?
I have tried things like strace -d, but it would not tell anything interesting.

You do not mention patching DT_STRTAB and DT_STRSZ. These tags control how the dynamic loader locates the dynamic string table. The section headers are only used by the link editor, not at run time.

First of all, I did not manage to find any possibility to deal with sane debugging. My solution came in just because of hard-way raw ELF file hex bytes manual analysis.
My conception in general was right (forgot to mention the DT_STRTAB and DT_STRSZ modification though, thanks to Florian Weimer for reminding of it). The patchelf util (see in the postscriptum below) made me sure I am generally right.
The thing is: when you add a new section to the end, make sure you put data to the PLT right way. To add a new ".dynstr" section, I had to overwrite an auxiliary note segment (Elf**_Phdr::p_type == PT_NOTE) with a new segment, right for the new ".dynstr" section data. Not yet sure if such overwriting might cause some error.
It turned out that I put a raw ELF file ('offline') offset, but had to put this data RVA in the running image (after loading ELF into memory by the system loader, 'online'). Once I fixed it, the ELF started to work properly.
P.S. found a somewhat similar question: How can I change the filename of a shared library after building a program that depends on it? (a useful util for the same purpose I need, patchelf, is mentioned there; patchelf is available under Debian via APT, it is a nice tool for the stated purpose)

What is NT_GNU_BUILD_ID used for?

I was reading the help guide for golang ld and one of the options is
-B value
Add a NT_GNU_BUILD_ID note when using ELF. The value
should start with 0x and be an even number of hex digits.
Does anyone knows why one would use that flag ?
searching for NT_GNU_BUILD_ID does not provide any insightful answer.

This comes from the massive conversion from C to Go of cmd/new5l (Feb. 2015), translated from src/cmd/ld/pobj.c
That information was introduced in commit 7d507dc6e (Dec. 2013, for Go 1.3), a preparation for the new linker structure
NT_GNU_BUILD_ID is mentioned here as a unique build ID bitstring.
You see it employed for instance in Fedora release build
To embed an ID into both the stripped object and its .debug file, I've chosen to use an ELF note section.
strip et al can keep the section intact in both files when its type is SHT_NOTE.
The new section is canonically called .note.gnu.build-id, but the name is not normative, and the section can be merged with other SHT_NOTE sections.
The ELF note headers give name "GNU" and type 3 (NT_GNU_BUILD_ID) for a build ID note, of which there can be only one in a linked object (or an ET_REL file of the .ko style).
You can see it introduced in this patch in 2007:
This patch adds a new option to ld for ELF targets, --build-id.
It generates a synthetic ELF note section containing "unique build ID" bits
chosen by ld.
This is done as an ld option to be efficient and foolproof to enable for
every compilation (vs some script adding a generated object into the link).
It's done the way it is so it can use no more and no less than the exact
final ELF bits of the output to contribute to selecting a unique ID.
This is the best way to ensure that deterministic styles of ID generation
(i.e. cryptographic hash) will always yield identical results for repeated
builds reproduced precisely.

Difference in md5sums in two object files

I compile twice the same .c and .h files and get object files with the same size but different md5sums.
Here is the only difference from objdump -d:
1) cpcidskephemerissegment.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_ZN68_GLOBAL__N_sdk_segment_cpcidskephemerissegment.cpp_00000000_B8B9E66611MinFunctionEii>:
2) cpcidskephemerissegment.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_ZN68_GLOBAL__N_sdk_segment_cpcidskephemerissegment.cpp_00000000_8B65537811MinFunctionEii>:
What can be the reason? Thanks!

I guess, the compiler didn't know how to name this namespace and used path to the source file plus some random number.
The compiler must guarantee that a symbol in unnamed namespace does not conflict with any other symbol in your program. By default this is achieved by taking full filename of the source, and appending a random hash value to it (it's legal to compile the same source twice (e.g. with different macros) and link the two objects into a single program, and the unnamed namespace symbols must still be distinct, so using just the source filename without the seed is not enough).
If you know that you are not linking the same source file more than once, and want to have a bit-identical object file on re-compile, the solution is to add -frandom-seed="abcd" to your compile line (replace "abcd" with anything you want; it's common to use the filename as the value of random seed). Documentation here.

The reasons can be many:
Using macros like __DATE__ and __TIME__
Embedding counters that are incremented for each build (the Linux kernel does this)
Timestamps (or similarly variable quantities) embedded in the .comments ELF section. One example of a compiler that does this is the xlC compiler on AIX.
Different names as a result of name mangling (e.g. C++)
Changes in environment variables which are affecting the build process.
Compiler bug(s) (however unlikely)
To produce bit identical builds, you can use GCC's -frandom-seed parameter. There were situations where it could break things before GCC 4.3, but GCC now turns functions defined in anonymous namespaces into static symbols. However, you will always be safe if you compile each file using a different value for -frandom-seed, the simplest way being to use
the filename itself as the seed.

Finally I've found the answer!
c++filt command gave the original name of the function:
{unnamed namespace}: MinFunction(int, int)
In the source was:
namespace
{
MinFunction(int a, int b) { ... }
}
I named the namespace and got stable checksum of object file!
As I guess, the compiler didn't know how to name this namespace and used path to the source file plus some random number.

Trouble using Libelf/Elfio libraries : ELF not executable anymore

I'm using Libelf and Elfio to try and add a new section to ELF files. I would like it to be executable, just like .text.
This is my problem : with Libelf, as soon as I load (elf_begin()), update (elf_update()) and release (elf_end()) my ELF, it stops to be executable (seg fault when launching). readelf -S displays the sections but returns also the error :
readelf: Warning: the .dynamic section is not contained within the dynamic segment
I didn't find any function in Libelf to "add" the .dynamic section to the DYNAMIC segment.
But I can do that with Elfio (with the segment->add_section_index() function), but then I have to manually add every other section to every other segment, as Elfio seems to overwrite them when loading the ELF.
Has anyone any experience with those libraries ?
My final goal is to be able to create a new executable section in the ELF, and modify its entry point to jump and execute directly that new section, in order to create a packer.

Libelf does not manage the segments (i. e. the program header entries) of an executable ELF file. It does, however, by default re-layout the sections when you call elf_update().
After the re-layout, the program header entries will most probably contain obsolete offsets.
The loader will then try (or refuse) to load sections from file offsets that were only correct before the edit.
Thus the error message: the .dynamic section is now at another offset in the file, and the loader notices that it is not in the DYNAMIC segment anymore.
You can tell Libelf that you are responsible for the section layout by calling elf_flagelf(elf, ELF_C_SET, ELF_F_LAYOUT).
But then again, adding a new section will not be so easy anymore...

How to embed version information into shared library and binary?

On Linux, is there a way to embed version information into an ELF binary? I would like to embed this info at compile time so it can then be extract it using a script later. A hackish way would be to plant something that can be extracted using the strings command. Is there a more conventional method, similar to how Visual Studio plant version info for Windows DLLs (note version tab in DLL properties)?

One way to do it if using cvs or subversion is to have a special id string formatted specially in your source file. Then add a pre-commit hook to cvs or svn that updates that special variable with the new version of the file when a change is committed. Then, when the binary is built, you can use ident to extract that indformation. For example:
Add something like this to your cpp file:
static char fileid[] = "$Id: fname.cc,v 1.124 2010/07/21 06:38:45 author Exp $";
And running ident (which you can find by installing rcs) on the program should show the info about the files that have an id string in them.
ident program
program:
$Id: fname.cc,v 1.124 2010/07/21 06:38:45 author Exp $
Note As people have mentioned in the comments this technique is archaic. Having the source control system automatically change your source code is ugly and the fact that source control has improved since the days when cvs was the only option means that you can find a better way to achieve the same goals.

To extend the #sashang answer, while avoiding the "$Id:$" issues mentioned by #cdunn2001, ...
You can add a file "version_info.h" to your project that has only:
#define VERSION_MAJOR "1"
#define VERSION_MINOR "0"
#define VERSION_PATCH "0"
#define VERSION_BUILD "0"
And in your main.c file have the line:
static char version[] = VERSION_MAJOR "." VERSION_MINOR "." VERSION_PATCH "." VERSION_BUILD;
static char timestamp[] = __DATE__ " " __TIME__;
(or however you want to use these values in your program)
Then set up a pre-build step which reads the version_info.h file, bumps the numbers appropriately, and writes it back out again. A daily build would just bump the VERSION_BUILD number, while a more serious release would bump other numbers.
If your makefile lists this on your object's prerequisite list, then the build will recompile what it needs to.

The Intel Fortran and C++ compilers can certainly do this, use the -sox option. So, yes there is a way. I don't know of any widespread convention for embedding such information in a binary and I generally use Emacs in hexl-mode for reading the embedded information, which is quite hackish.
'-sox' also embeds the compiler options used to build an executable, which is very useful.

If you declare a variable called program_version or similar you can find out at which address the variable is stored at and then proceed to extract its value. E.g.
objdump -t --demangle /tmp/libtest.so | grep program_version
0000000000600a24 g O .data 0000000000000004 program_version
tells me that program_version resides at address 0000000000600a24 and is of size 4. Then just read the value at that address in the file.
Or you could just write a simple program that links the library in questions and prints the version, defined either as an exported variable or a function.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string