A quick question about elf file headers, I can't seem to find anything useful on how to add/change fields in the elf header. I'd like to be able to change the magic numbers and to add a build date to the header, and probably a few other things.
As I understand it the linker creates the header information, but I don't see anything in the LD script that refers to it (though i'm new to ld scripts).
I'm using gcc and building for ARM.
thanks!
Updates:
ok maybe my first question should be: is it possible to create/edit the header file at link time?
I don't know of linker script commands that can do this, but you can do it post-link using the objcopy command. The --add-section option can be used to add a section containing arbitrary data to the ELF file. If the ELF header doesn't contain the fields you want, just make a new section and add them there.
This link (teensy elf binary) was someone's answer to another question, but it goes into the intricacies of an ELF header in some detail.
You can create an object file with informative fields like a version number and link that file such that they are included in the resulting ELF binary.
Ident
For example, as part of you build process, you can generate - say - info.c that contains one or more #ident directives:
#ident "Build: 1.2.3 (Halloween)"
#ident "Environment: example.org"
Compile it:
$ gcc -c info.c
Check if the information is included:
$ readelf -p .comment info.o
String dump of section '.comment':
[ 1] Build: 1.2.3 (Halloween)
[ 1a] Environment: example.org
[ 33] GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)
Alternatively, you can use objdump -s --section .comment info.o. Note that GCC also writes its own comment, by default.
Check the information after linking an ELF executable:
$ gcc -o main main.o info.o
$ readelf -p .comment main
String dump of section '.comment':
[ 0] GCC: (GNU) 7.2.1 20170915 (Red Hat 7.2.1-2)
[ 2c] Build: 1.2.3 (Halloween)
[ 45] Environment: example.org
Comment Section
Using #ident in a C translation unit is basically equivalent to creating a .comment section in an assembler file. Example:
$ cat info.s
.section .comment
.string "Build: 1.2.3 (Halloween)"
.string "Environment: example.org"
$ gcc -c info.s
$ readelf -p .comment info.o
String dump of section '.comment':
[ 0] Build: 1.2.3 (Halloween)
[ 19] Environment: example.org
Using an uncommon section name works, as well (e.g. .section .blahblah). But .comment is used and understood by other tools. GNU as also understands the .ident directive, and this is what GCC translates #ident to.
With Symbols
For data that you also want to access from the ELF executable itself you need to create symbols.
Objcopy
Say you want to include some magic bytes stored in a data file:
$ cat magic.bin
2342
Convert into a object file with GNU objcopy:
$ objcopy -I binary -O elf64-x86-64 -B i386 \
--rename-section .data=.rodata,alloc,load,readonly,data,contents \
magic.bin magic.o
Check for the symbols:
$ nm magic.o
0000000000000005 R _binary_magic_bin_end
0000000000000005 A _binary_magic_bin_size
0000000000000000 R _binary_magic_bin_start
Example usage:
#include <stdio.h>
#include <string.h>
#include <inttypes.h>
extern const char _binary_magic_bin_start[];
extern const char _binary_magic_bin_end[];
extern const unsigned char _binary_magic_bin_size;
static const size_t magic_bin_size = (uintptr_t) &_binary_magic_bin_size;
int main()
{
char s[23];
memcpy(s, _binary_magic_bin_start,
_binary_magic_bin_end - _binary_magic_bin_start);
s[magic_bin_size] = 0;
puts(s);
return 0;
}
Link everything together:
$ gcc -g -o main_magic main_magic.c magic.o
GNU ld
GNU ld is also able to turn data files into object files using an objcopy compatible naming scheme:
$ ld -r -b binary magic.bin -o magic-ld.o
Unlike objcopy, it places the symbols into the .data instead of the .rodata section, though (cf. objdump -h magic.o).
incbin
In case GNU objcopy isn't available, one can use the GNU as .incbin directive to create the object file (assemble with gcc -c incbin.s):
.section .rodata
.global _binary_magic_bin_start
.type _binary_magic_bin_start, #object
_binary_magic_bin_start:
.incbin "magic.bin"
.size _binary_magic_bin_start, . - _binary_magic_bin_start
.global _binary_magic_bin_size
.type _binary_magic_bin_size, #object
.set _binary_magic_bin_size, . - _binary_magic_bin_start
.global _binary_magic_bin_end
.type _binary_magic_bin_end, #object
.set _binary_magic_bin_end, _binary_magic_bin_start + _binary_magic_bin_size
; an alternate way to include the size
.global _binary_magic_bin_len
.type _binary_magic_bin_len, #object
.size _binary_magic_bin_len, 8
_binary_magic_bin_len:
.quad _binary_magic_bin_size
xxd
A more portable alternative that doesn't require GNU objcopy nor GNU as is to create an intermediate C file and compile and link that. For example with xxd:
$ xxd -i magic.bin | sed 's/\(unsigned\)/const \1/' > magic.c
$ gcc -c magic.c
$ nm magic.o
0000000000000000 R magic_bin
0000000000000008 R magic_bin_len
$ cat magic.c
const unsigned char magic_bin[] = {
0x32, 0x33, 0x34, 0x32, 0x0a
};
const unsigned int magic_bin_len = 5;
I'm fairly sure that a sufficiently complex ld script can do what you want. However, I have no idea how.
On the other hand, elfsh can easily do all sorts of manipulations to elf objects, so give it a whirl.
You might be able to use libmelf, a dead project on freshmeat, but available from LOPI - http://www.ipd.bth.se/ska/lopi.html
Otherwise, you can get the spec and (over)write the header yourself.
I haven't done this in awhile, but can't you just append arbitrary data to an executable. If you always append fixed-size data it would be trivial to recover anything you append. Variable size wouldn't be much harder. Probably easier than messing w/ elf headers and potentially ruining you executables.
I didn't finish the book but iirc Linkers and Loaders by John Levine had the gory details that you would need to be able to do this.
In Solaris you can use elfedit but I think you are really asking solutions for Linux. Linux Is Not UniX :P
In Linux Console:
$ man ld
$ ld --verbose
HTH
Related
I have a working position independent Linux freestanding x86_64 hello world:
main.S
.text
.global _start
_start:
asm_main_after_prologue:
/* Write */
mov $1, %rax /* syscall number */
mov $1, %rdi /* stdout */
lea msg(%rip), %rsi /* buffer */
mov $len, %rdx /* len */
syscall
/* Exit */
mov $60, %rax /* syscall number */
mov $0, %rdi /* exit status */
syscall
msg:
.ascii "hello\n"
len = . - msg
which I can assemble and run with:
as -o main.o main.S
ld -o main.out main.o
./main.out
Since it is position independent due to the RIP relative load, now I wanted to link it as a PIE and see it get loaded at random addresses every time to have some fun.
First I tried:
ld -pie -o main.out main.o
but then running it fails with:
-bash: ./main.out: No such file or directory
and readelf -Wa says that a weird interpreter /lib/ld64.so.1 was used instead of the regular one /lib64/ld-linux-x86-64.so.2 for some reason.
I then learnt that his is actually the recommended System V AMD64
ABI interpreter name at 5.2.1 "Program Interpreter".
In any case, I then try to force matters with:
ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o main.out main.o
and now it works: I get hello and the executable gets loaded to a different address every time according to GDB.
Finally, as a final step, I wanted to also make that executable be statically linked to make things even more minimal, and possibly get rid of the explicit -dynamic-linker.
That's what I could not do, and this is why I'm asking here.
If I try either of:
ld -static -pie -o main.out main.o
ld -static -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -o main.out main.o
-static does not seem to make any difference: I still get dynamic executables.
After quickly glancing at the kernel 5.0 source code in fs/binfmt_elf.c I saw this interesting comment:
* There are effectively two types of ET_DYN
* binaries: programs (i.e. PIE: ET_DYN with INTERP)
* and loaders (ET_DYN without INTERP, since they
* _are_ the ELF interpreter). The loaders must
so I guess when I achieve what I want, I will have a valid interpreter, and I'm so going to use my own minimal hello world as the interpreter of another program.
One thing I might try later on is see how some libc implementation compiles its loader and copy it.
Related question: Compile position-independent executable with statically linked library on 64 bit machine but that mentions an external library, so hopefully this is more minimal and answerable.
Tested in Ubuntu 18.10.
You want to add --no-dynamic-linker to your link command:
$ ld main.o -o main.out -pie --no-dynamic-linker
$ file main.out
main.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, not stripped
$ ./main.out
hello
so I guess when I achieve what I want, I will have a valid interpreter, and I'm so going to use my own minimal hello world as the interpreter of another program.
I am not sure I understood what you are saying correctly. If you meant that main.out would have itself as its interpreter, that's wrong.
P.S. GLIBC-2.27 added support for -static-pie, so you no longer have to resort to assembly to get a statically linked PIE binary. But you'll have to use very recent GCC and GLIBC.
I've done a bunch of reading on dynamic linker relocations and position independent code including procedure linkage tables and global offset tables. I don't understand why a statically linked executable needs a PLT and GOT. I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
I also created a shared library with a simple increment function that I compiled with gcc -shared without -fpic and I also see PLT and GOT sections. I didn't expect this either.
I don't understand why a statically linked executable needs a PLT and GOT.
It doesn't.
I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
This is an accident of implementation. The sections come from crt1.o, and there isn't a separate crt1s.o for fully-static linking, so you end up with .plt and .got entries from there.
You can strip these sections, and the binary will still work:
objcopy -R.got -R.plt a.out a.out2
Note: do not strip .rela.plt, as that section is still needed to implement IFUNCs.
I found that gcc generates a .got and .got.lpt when generating position independent code and taking the address of a function defined in another source file.
My test files were:
part1.c:
extern void afunc();
int _start()
{
return 0x55 & (__SIZE_TYPE__) afunc;
}
part2.c:
void afunc() {}
My test was (substitute your own gcc version):
for o in s 4 3 2 1 0
do
aarch64-linux-gnu-gcc-10 -fPIC part1.c part2.c -o static.elf -static -nostdlib -O$o &&
aarch64-linux-gnu-objdump -x static.elf | grep 'GLOBAL_OFFSET'
done
I get the following output for all optimization levels:
0000000000410fd8 l O .got 0000000000000000 _GLOBAL_OFFSET_TABLE_
Replacing -fPIC with -fno-PIC and the segment goes away.
You can tell if your compiler defaults to -fPIC by running this:
aarch64-linux-gnu-gcc-10 -mcmodel=large -x c - < /dev/null
From which, I get the error, if it does:
cc1: sorry, unimplemented: code model ‘large’ with ‘-fPIC’
I am very very new to this, I have elf file input.out and need to create hex executable from it. I am using objcopy to create executable in intel hex format as follows
objcopy -O ihex input.out out.hex
by this out.hex contains data from all sections (.interp, .note.ABI-tag etc), but i am not sure if all of it is required for executable. Is just .text section enough for creating executable hex so can i just use as below or any more sections are required
objcopy -j.text -O ihex input.out out.hex
Also if there any good reference to understand this in detail, I couldn't find much by Goggling. Probably I don't know what to search.
It could work with
objcopy -O ihex input.elf output.hex
Add the -S will strip useless sections.
Linux binaries are usually dynamically linked to the core system library (libc). This keeps the memory footprint of the binary quite small but binaries which are dependent on the latest libraries will not run on older systems. Conversely, binaries linked to older libraries will run happily on the latest systems.
Therefore, in order to ensure our application has good coverage during distribution we need to figure out the oldest libc we can support and link our binary against that.
How should we determine the oldest version of libc we can link to?
Work out which symbols in your executable are creating the dependency on the undesired version of glibc.
$ objdump -p myprog
...
Version References:
required from libc.so.6:
0x09691972 0x00 05 GLIBC_2.3
0x09691a75 0x00 03 GLIBC_2.2.5
$ objdump -T myprog | fgrep GLIBC_2.3
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3 realpath
Look within the depended-upon library to see if there are any symbols in older versions that you can link against:
$ objdump -T /lib/libc.so.6 | grep -w realpath
0000000000105d90 g DF .text 0000000000000021 (GLIBC_2.2.5) realpath
000000000003e7b0 g DF .text 00000000000004bf GLIBC_2.3 realpath
We're in luck!
Request the version from GLIBC_2.2.5 in your code:
#include <limits.h>
#include <stdlib.h>
__asm__(".symver realpath,realpath#GLIBC_2.2.5");
int main () {
realpath ("foo", "bar");
}
Observe that GLIBC_2.3 is no longer needed:
$ objdump -p myprog
...
Version References:
required from libc.so.6:
0x09691a75 0x00 02 GLIBC_2.2.5
$ objdump -T myprog | grep realpath
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.2.5 realpath
For further information, see http://web.archive.org/web/20160107032111/http://www.trevorpounds.com/blog/?p=103.
Unfortunately, #Sam's solution doesn't work well in my situation. But according to his way, I found my own way to solve that.
This is my situation:
I'm writing a C++ program using the Thrift framework(it's an RPC middleware). I prefer static link to dynamic link, so my program is linked to libthrift.a statically instead of libthrift.so. However, libthrift.a is dynamically linked to glibc, and since my libthrift.a is build on my system with glibc 2.15, my libthrift.a uses memcpy of version 2.14(memcpy#GLIBC_2.14) provided by glibc 2.15.
But the problem is that our server machines have only the glibc version 2.5 which has only memcpy#GLIBC_2.2.5. It is much lower than memcpy#GLIBC_2.14. So, of course, my server program can't run on those machines.
And I found this solusion:
Using .symver to obtain the ref to memcpy#GLIBC_2.2.5.
Write my own __wrap_memcpy function which just calls memcpy#GLIBC_2.2.5 directly.
When linking my program, add -Wl,--wrap=memcpy option to gcc/g++.
The code involved in steps 1 and 2 is here: https://gist.github.com/nicky-zs/7541169
To do this in a more automated fashion, you can use the following script to create a list of all the symbols that are newer in your GLIBC than in a given version (set on line 2). It creates a glibc.h file (filename set by the script argument) which contains all the necessary .symver declarations. You can then add -include glibc.h to your CFLAGS to make sure it gets picked up everywhere in your compilation.
This is sufficient if you don't use any static libraries that were compiled without the above include. If you do, and you don't want to recompile, you can use objcopy to create a copy of the library with the symbols renamed to the old versions. The second to bottom line of the script creates a version of your system libstdc++.a that will link against the old glibc symbols. Adding -L. (or -Lpath/to/libstdc++.a/) will make your program statically link libstdc++ without linking in a bunch of new symbols. If you don't need this, delete the last two lines and the printf ... redeff line.
#!/bin/bash
maxver=2.9
headerf=${1:-glibc.h}
set -e
for lib in libc.so.6 libm.so.6 libpthread.so.0 libdl.so.2 libresolv.so.2 librt.so.1; do
objdump -T /usr/lib/$lib
done | awk -v maxver=${maxver} -vheaderf=${headerf} -vredeff=${headerf}.redef -f <(cat <<'EOF'
BEGIN {
split(maxver, ver, /\./)
limit_ver = ver[1] * 10000 + ver[2]*100 + ver[3]
}
/GLIBC_/ {
gsub(/\(|\)/, "",$(NF-1))
split($(NF-1), ver, /GLIBC_|\./)
vers = ver[2] * 10000 + ver[3]*100 + ver[4]
if (vers > 0) {
if (symvertext[$(NF)] != $(NF-1))
count[$(NF)]++
if (vers <= limit_ver && vers > symvers[$(NF)]) {
symvers[$(NF)] = vers
symvertext[$(NF)] = $(NF-1)
}
}
}
END {
for (s in symvers) {
if (count[s] > 1) {
printf("__asm__(\".symver %s,%s#%s\");\n", s, s, symvertext[s]) > headerf
printf("%s %s#%s\n", s, s, symvertext[s]) > redeff
}
}
}
EOF
)
sort ${headerf} -o ${headerf}
objcopy --redefine-syms=${headerf}.redef /usr/lib/libstdc++.a libstdc++.a
rm ${headerf}.redef
glibc 2.2 is a pretty common minimum version. However finding a build platform for that version may be non-trivial.
Probably a better direction is to think about the oldest OS you want to support and build on that.
What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks
It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.
I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.
I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).