I'm compiling a static executable like this:
ld.lld out/main.o -o out/sm -Tstatic.ld -static
strip --strip-all out/sm
This is the linker script I'm using:
ENTRY(_start)
SECTIONS
{
. = 0x100e8;
.all : {
*(.bss*)
*(.text*)
*(.data*)
*(.rodata*)
*(COMMON*)
} :code
.shstrtab : {
*(.shstrtab)
}
/DISCARD/ : {
*(*)
}
}
PHDRS
{
code PT_LOAD FILEHDR PHDRS ;
}
The executable works as expected, but the strip command doesn't remove .shstrtab section from the executable.
If I remove the .shstrtab section from the linker script, I get this error:
ld.lld out/main.o -o out/.sm -Tstatic.ld -static
ld.lld: error: discarding .shstrtab section is not allowed
Why is the .shstrtab section necessary? I've replaced all the standard section names and the executable still works as expected, so the program loading code doesn't care about the section names.
As an aside, is it possible to completely exclude the section headers in a linker script, since it isn't needed for a static executable.
note: GNU linkers silently put .shstrtab in the output executable even if it is discarded.
Why is '.shstrtab' section mandatory?
Each section in the section table has a section name. It is stored as a reference to the section name table (.shstrtab).
So as long as there is at least one section in an ELF file, there must be a .shstrtab section (however, it might by named differently).
Indeed, it would be allowed to build an ELF file without any sections (but only with program headers).
However, I have never seen such an ELF file linked by a regular linker (only files that were intentionally created to be as small as possible or similar).
Related
This question already has an answer here:
How to implement system call in ARM64?
(1 answer)
Closed 3 years ago.
Can you let me know what I'm doing wrong?
I'm new to assembly programming and am unfamiliar with the various options in ld.
I've been trying to use the yasm compiler initially but then realised that as is the way to go for the ARM architecture while composing GNU compliant assembly code.
Better luck running as from the binutils package, i.e. the GNU assembler. But the assembly code has to be ARM-compliant.
The following is the code within arm.s:
.text /* Start of the program code section */
.global main /* declares the main identifier */
.type main, %function
main: /* Address of the main function */
/* Program code would go here */
BR LR
/* Return to the caller */
.end /* End of the program */
The above was throwing an Illegal Instruction error. That can be fixed
by substituting ret for BR LR. This is new to ARM V8.
ARM, a RISC architecture, is not supported by YASM.
My build file is as follows:
#/usr/bin/env bash
#display usage
[ $# -eq 0 ] && { echo "Usage: $0 <File Name without extension> ";exit 1; }
set +e
rm -f $1.exe $1 $1.o
as -o $1.o $1.s
[ -e $1.o ] && { file $1.o;}
gcc -s -o $1.exe $1.o -fpic
ld -s -o $1 -pie --dynamic-linker /system/bin/linker64 /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o $1.o -lc -lgcc -ldl /data/data/com.termux/files/usr/lib/crtend_android.o
[ -e $1.exe ] && { file $1.exe;nohup ./$1.exe; }
[ -e $1 ] && { file $1;nohup ./$1;}
set -e
The code was causing either a segmentation fault or a bus error earlier.
I was able to run a program or two without any segmentation or bus errors with the updated build file above. I set up the build file to produce two executables, one using gcc and the other ld, since some online tutorials use ld instead of gcc for the linking step. Using the verbose setting of gcc, you can look at the options passed to the linker and thus mimic the same for the linker independently.
There may be some redundant settings that I've missed.
You can access updates to the source code and build file at
Learn Assembly.
Check out this resource from Keil here. arm Keil product guides
More resources:
https://thinkingeek.com/2016/10/08/exploring-aarch64-assembler-chapter1/
How to link a gas assembly program that uses the C standard library with ld without using gcc?
While the above problem appears to be fixed for now, I have errors running the following code:
.text
.global main
main:
mov w0, #2
mov w7, #1 // request to exit program
svc 0
I obtain an illegal instruction error when I try to execute the code.
Secondly, if I alter the main to _start (since I don't want to be using main all the time), I have the following error from the buildrun script.
./buildrun myprogram
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: myprogram.o: in function `_start': (.text+0x0): multiple definition of `_start'; /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o:crtbegin.c:(.text+0x0): first defined here /data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o: in function `_start_main': crtbegin.c:(.text+0x38): undefined reference to `main
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: crtbegin.c:(.text+0x3c): undefined reference to `main' clang-8: error: linker command failed with exit co
de 1 (use -v to see invocation)
ld: myprogram.o: in function `_start': (.text+0x0): multiple definition of `_start'; /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o:crtbegin.c:(.text+0x0): first defined here ld: /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o: in function `_start_main': crtbegin.c:(.text+0x38): undefined reference to `main'
ld: crtbegin.c:(.text+0x3c): undefined reference to `main'
How do I create programs with entry points other than main?
I want to be able to :
Create a statically linked executable that works.
Create an executable that has a function named _start instead of main.
This file builds static executables that don't use main or call any library calls.
Create a dynamically linked executable with an entry point other than main.
My build file handles this, sort of, with the entry point as second parameter.
Create an executable that uses supervisor call svc to exit without throwing an illegal instruction error as against using ret.
I was able to call svc by setting the system call number in register X8 as against W7 in version 7 ARM. Additionally, ARM 64 has renumbered the system call numbers as per the following header file.
https://github.com/torvalds/linux/blob/v4.17/include/uapi/asm-generic/unistd.h
https://reverseengineering.stackexchange.com/q/16917
.data
.balign 8
labs: .asciz "Azeria Labs\n" //.asciz adds a null-byte to the end of the string .balign 8 after_labs: .set size_of_labs, after_labs - labs .balign 8 addr_of_labs: .dword labs .balign 8 .text
.global main
main:
mov x0, #1 //STDOUT ldr x1,addr_of_labs //memory address of labs mov w2, #size_of_labs //size of labs mov x8,#64 svc #0x0 // invoke syscall _exit: mov x8, #93 //exit syscall
svc #0x0 //invoke syscall
The above code was ported from the example code listed below.
https://azeria-labs.com/writing-arm-shellcode/
Compacting the data section into one instead of splitting it as in the example from the site mitigates the relocation errors while linking.
Other useful references:
https://thinkingeek.com/2013/01/09/arm-assembler-raspberry-pi-chapter-1/
*Check the comment by ehrt74 on the above post for the motivation to explore svc call further. *
Yasm is an x86 assembler. It cannot produce executables for an ARM processor.
The tutorials you are working with are describing x86 assembly. They are intended to be followed on an x86 system.
Linux separates the linker-time search path and run-time search path.
For the run-time search path, I found the rule for ld.so in its man page (8 ld.so):
DT_RPATH
environment LD_LIBRARY_PATH
DT_RUNPATH
ld.so.cache
/lib, /usr/lib
But for linker-time search path, no luck for ld :(
Man page for ld (1 ld) says, besides -L option:
The default set of paths searched (without being specified with -L) depends on which emulation mode ld is using, and in some cases also on how it was configured.
The paths can also be specified in a link script with the "SEARCH_DIR" command. Directories specified this way are searched at the point in which the linker script appears in the command line.
Does the "default set of paths" depending on emulation mode mean "SEARCH_DIR"?
misssprite, to look for the linker search path for specific ELF emulation just run ld -m<emulation> --verbose | grep SEARCH_DIR
Speaking about the ld itself, the library path search order is the following:
Directories specified via -L command line flags
Directories in the LIBRARY_PATH environment variable
SEARCH_DIR variables in the linker script.
You can look what directories are specified in the default linker script by running ld --verbose | grep SEARCH_DIR. Note that = in the SEARCH_DIR values will be replaced by the value of --sysroot option if you specify it.
Usually ld is not invoked directly, but via compiler driver which passes several -L options to the linker. In the case of gcc or clang you can print the additional library search directories added by a compiler by invoking it with -print-search-dirs option. Also note that if you specify some machine-specific compiler flags (like e.g -m32 as misssprite mentioned) than the linker may use different linker script according to the chosen ELF emulation. In the case of gcc you can use -dumpspecs option to look how different compiler flags affect the linker invocation. But IMHO the simplest way to look for the linker command line is to compile and link a simple program with -v specified.
misssprite, there is no search for ld.so or ld-linux.so in the binutils's ld linker.
When dynamic program is build with gcc, it uses option -dynamic-linker of ld (collect2) program: http://man7.org/linux/man-pages/man1/ld.1.html
-Ifile, --dynamic-linker=file
Set the name of the dynamic linker. This is only meaningful when
generating dynamically linked ELF executables. The default
dynamic linker is normally correct; don't use this unless you
know what you are doing.")
Usually used as runtime loader for ELF, the "ld-linux.so" is registered as interpreter in the dynamic ELF file, program header INTERP (.interp), check output readelf -l ./dynamic_application. This field is for full path, as I understand.
When there is no gcc (directly called 'ld' program) or no this option was given, ld uses hardcoded string of full path to ld.so; and this default is incorrect for most OS, including Linux:
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/bfd/elf64-x86-64.c#L510
/* The name of the dynamic interpreter. This is put in the .interp
section. */
#define ELF64_DYNAMIC_INTERPRETER "/lib/ld64.so.1"
#define ELF32_DYNAMIC_INTERPRETER "/lib/ldx32.so.1"
https://github.com/bneumeier/binutils/blob/db980de65ca9f296aae8db4d13ee884f0c18ac8a/gold/x86_64.cc#L816
template<>
const Target::Target_info Target_x86_64<64>::x86_64_info =
...
"/lib/ld64.so.1", // program interpreter
const Target::Target_info Target_x86_64<32>::x86_64_info =
...
"/libx32/ldx32.so.1", // program interpreter
Correct dynamic linker/loader path is hardcoded in machine spec files of gcc, grep output of gcc -dumpspecs command for ld-linux for -dynamic-linker option value.
I've done a bunch of reading on dynamic linker relocations and position independent code including procedure linkage tables and global offset tables. I don't understand why a statically linked executable needs a PLT and GOT. I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
I also created a shared library with a simple increment function that I compiled with gcc -shared without -fpic and I also see PLT and GOT sections. I didn't expect this either.
I don't understand why a statically linked executable needs a PLT and GOT.
It doesn't.
I compiled a hello world program on my ubuntu x86_64 machine and when I dump the section headers with readelf -S it shows PLT and GOT sections.
This is an accident of implementation. The sections come from crt1.o, and there isn't a separate crt1s.o for fully-static linking, so you end up with .plt and .got entries from there.
You can strip these sections, and the binary will still work:
objcopy -R.got -R.plt a.out a.out2
Note: do not strip .rela.plt, as that section is still needed to implement IFUNCs.
I found that gcc generates a .got and .got.lpt when generating position independent code and taking the address of a function defined in another source file.
My test files were:
part1.c:
extern void afunc();
int _start()
{
return 0x55 & (__SIZE_TYPE__) afunc;
}
part2.c:
void afunc() {}
My test was (substitute your own gcc version):
for o in s 4 3 2 1 0
do
aarch64-linux-gnu-gcc-10 -fPIC part1.c part2.c -o static.elf -static -nostdlib -O$o &&
aarch64-linux-gnu-objdump -x static.elf | grep 'GLOBAL_OFFSET'
done
I get the following output for all optimization levels:
0000000000410fd8 l O .got 0000000000000000 _GLOBAL_OFFSET_TABLE_
Replacing -fPIC with -fno-PIC and the segment goes away.
You can tell if your compiler defaults to -fPIC by running this:
aarch64-linux-gnu-gcc-10 -mcmodel=large -x c - < /dev/null
From which, I get the error, if it does:
cc1: sorry, unimplemented: code model ‘large’ with ‘-fPIC’
I have 2 obj files assembled with GNU as, they are:
a.o : my major program
b.o : some utility functions
a.o doesn't have an entry point. The final linked file will be loaded into memory and the execution will jump to its very beginning loaded address, where is the first instrucion of a.o.
Now I want to link them together with GNU ld. And I want to make a.o appear before b.o in the final file. How could I control this? Do I have to make a custom section and write in the linker script like this:
SECTIONS
{
. = 0x7c00;
.text : { *(.text) }
.my_custom_section : { *(.my_custom_section) }
.data : { *(.data) }
.bss : { *(.bss) }
}
OUTPUT_FORMAT(binary)
Update
Is there something wrong with this question? Did I post it wrong? If so, please let me know, guys. Many thanks.
Currently, I found that the command line sequence of the input files seems to be relevant.
If I do like this:
ld a.o b.o -o final.bin
Content from a.o will appear before b.o.
If I do like this:
ld b.o a.o -o final.bin
It will be otherwise.
Is it meant to be controlled like this?
According to the manual:
options which refer to files ... cause the file to be read at the point at which the option appears in the command line, relative to the object files and other file options
So the order of files in the binary is the order in which they appear on the command line.
Therefore, it is meant to be controlled as you mention in your update.
The order of operations to ld is in fact relevant.
Unless explicitly stated somehow, the entry point is the first code byte of the first file on the list.
The resulting executable always has the contents of the .o files in invocation order. (with .a files it gets complicated).
What's the best tool for converting PE binaries to ELF binaries?
Following is a brief motivation for this question:
Suppose I have a simple C program.
I compiled it using gcc for linux(this gives ELF), and using 'i586-mingw32msvc-gcc' for Windows(this gives a PE binary).
I want to analyze these two binaries for similarities, using Bitblaze's static analysis tool - vine(http://bitblaze.cs.berkeley.edu/vine.html)
Now vine doesn't have a good support for PE binaries, so I wanted to convert PE->ELF, and then carry on with my comparison/analysis.
Since all the analysis has to run on Linux, I would prefer a utility/tool that runs on Linux.
Thanks
It is possible to rebuild an EXE as an ELF binary, but the resulting binary will segfault very soon after loading, due to the missing operating system.
Here's one method of doing it.
Summary
Dump the section headers of the EXE file.
Extract the raw section data from the EXE.
Encapsulate the raw section data in GNU linker script snippets.
Write a linker script to build an ELF binary, including those scripts from the previous step.
Run ld with the linker script to produce the ELF file.
Run the new program, and watch it segfault as it's not running on Windows (and it tries to call functions in the Import Address Table, which doesn't exist).
Detailed Example
Dump the section headers of the EXE file. I'm using objdump from the mingw cross compiler package to do this.
$ i686-pc-mingw32-objdump -h trek.exe
trek.exe: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 AUTO 00172600 00401000 00401000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .idata 00001400 00574000 00574000 00172a00 2**2
CONTENTS, ALLOC, LOAD, DATA
2 DGROUP 0002b600 00576000 00576000 00173e00 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 000e7800 005a2000 005a2000 00000000 2**2
ALLOC
4 .reloc 00013000 0068a000 0068a000 0019f400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .rsrc 00000a00 0069d000 0069d000 001b2400 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
Use dd (or a hex editor) to extract the raw section data from the EXE. Here, I'm just going to copy the code and data sections (named AUTO and DGROUP in this example). You may want to copy additional sections though.
$ dd bs=512 skip=2 count=2963 if=trek.exe of=code.bin
$ dd bs=512 skip=2975 count=347 if=trek.exe of=data.bin
Note, I've converted the file offsets and section sizes from hex to decimal to use as skip and count, but I'm using a block size of 512 bytes in dd to speed up the process (example: 0x0400 = 1024 bytes = 2 blocks # 512 bytes).
Encapsulate the raw section data in GNU ld linker scripts snippets (using the BYTE directive). This will be used to populate the sections.
cat code.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >code.ld
cat data.bin | hexdump -v -e '"BYTE(0x" 1/1 "%02X" ")\n"' >data.ld
Write a linker script to build an ELF binary, including those scripts from the previous step. Note I've also set aside space for the uninitialized data (.bss) section.
start = 0x516DE8;
ENTRY(start)
OUTPUT_FORMAT("elf32-i386")
SECTIONS {
.text 0x401000 :
{
INCLUDE "code.ld";
}
.data 0x576000 :
{
INCLUDE "data.ld";
}
.bss 0x5A2000 :
{
. = . + 0x0E7800;
}
}
Run the linker script with GNU ld to produce the ELF file. Note I have to use an emulation mode elf_i386 since I'm using 64-bit Linux, otherwise a 64-bit ELF would be produced.
$ ld -o elf_trek -m elf_i386 elf_trek.ld
ld: warning: elf_trek.ld contains output sections; did you forget -T?
$ file elf_trek
elf_trek: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
statically linked, not stripped
Run the new program, and watch it segfault as it's not running on Windows.
$ gdb elf_trek
(gdb) run
Starting program: /home/quasar/src/games/botf/elf_trek
Program received signal SIGSEGV, Segmentation fault.
0x0051d8e6 in ?? ()
(gdb) bt
\#0 0x0051d8e6 in ?? ()
\#1 0x00000000 in ?? ()
(gdb) x/i $eip
=> 0x51d8e6: sub (%edx),%eax
(gdb) quit
IDA Pro output for that location:
0051D8DB ; size_t stackavail(void)
0051D8DB proc stackavail near
0051D8DB push edx
0051D8DC call [ds:off_5A0588]
0051D8E2 mov edx, eax
0051D8E4 mov eax, esp
0051D8E6 sub eax, [edx]
0051D8E8 pop edx
0051D8E9 retn
0051D8E9 endp stackavail
For porting binaries to Linux, this is kind of pointless, given the Wine project.
For situations like the OP's, it may be appropriate.
I've found a simpler way to do this. Use the strip command.
Example
strip -O elf32-i386 -o myprogram.elf myprogram.exe
The -O elf32-i386 has it write out the file in that format.
To see supported formats run
strip --info
I am using the strip command from mxe, which on my system is actually named /opt/mxe/usr/bin/i686-w64-mingw32.static-strip.
I don't know whether this totally fits your needs, but is it an option for you to cross-compile with your MinGW version of gcc?
I mean do say: does it suit your needs to have i586-mingw32msvc-gcc compile direct to ELF format binaries (instead of the PEs you're currently getting). A description of how to do things in the other direction can be found here - I imagine it will be a little hacky but entirely possible to make this work for you in the other direction (I must admit I haven't tried it).