Using ".init_array" section of ELF file - linux

When there is a need to run a piece of code on the program startup (on Linux), how to use correctly the .init_section of an executable file (ELF32-i386)? I have the following code (GNU Assembler) which has ctor initialization function, and the address of this function is placed inside .init_array section:
.intel_syntax noprefix
.data
s1: .asciz "Init code\n"
s2: .asciz "Main code\n"
.global _start
.global ctor
.text
ctor:
mov eax, 4 # sys_write()
mov ebx, 1 # stdout
mov ecx, offset s1
mov edx, 10
int 0x80
ret
.section .init_array
.long ctor
.text
_start:
mov eax, 4
mov ebx, 1
mov ecx, offset s2
mov edx, 10
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This code is assembled with:
as -o init.o init.asm
ld -o init init.o
When the resulting executable is run, only the "Main code" string is printed. How to use properly the .init_array section?
EDIT1: I want to use .init_array because there are multiple source files with their own init code. One can call all this code 'manually' on startup and modify it every time when source files are added to or removed from the project, but .init_array seems to be designed just for this case :
Before transferring control to an application, the runtime linker
processes any initialization sections found in the application and any
loaded dependencies. The initialization sections .preinit_array,
.init_array, and .init are created by the link-editor when a dynamic
object is built.
The runtime linker executes functions whose addresses are contained in
the .preinit_array and .init_array sections. These functions are
executed in the same order in which their addresses appear in the
array.
In case when an executable is created without gcc, the linker seems to not execute the startup code. I tried to write my own standard init routine which reads function pointers in .init_array, section and calls them. It works OK for one file, where one can mark the end of the section, for example, with zero. But with multiple files this zero can be relocated in the middle of the section. How can one correctly determine the size of a section assembled from multiple source files?

If you make a statically linked bare executable the way you're doing, with your own code at the _start entry point, your code just runs from that point. If you want something to happen, your code has to make it happen. There is no magic.
Using sections can be useful to group startup code from multiple source files together, so all the startup code is cold and can potentially be paged out, or at least not need a TLB entry.
So you "properly use" sections by putting functions there and calling them from code that runs sometime after _start.
In your code example, it looks like .init_array is a list of function pointers. I assume the standard CRT startup files read the ELF file and find the length of that section, then walk through it making indirect calls to those functions. Since you're making custom code, it's going to be faster just to call an init function that does everything.
dynamic linking:
The "runtime linker" is the ELF interpreter for dynamic binaries. It runs code in your process before _start, so yes, apparently it does process that ELF section and make magic happen.
So in response to your edit, your options are: implement this processing of .init_array yourself, or create dynamic executables. I'm pretty sure this procedure has been covered in other questions, and I don't have time to research a correct command line for a dynamic executable that still doesn't link libc. (Although you might just want to use gcc -nostartfiles, or something.)
If you're stuck, leave a comment. I may update this later when I have more time anyway, or feel free to edit in a working command.

For normal C programs the .init_array is traversed by a function that is called from _start before main gets called. A good description is on this site.
So I see two ways: You can simply link against the glibc start code. Or you have to find out another mechanism to solve this issue by yourself.

Related

On x64 Linux, what is the difference between syscall, int 0x80 and ret to exit a program?

I decided yesterday to learn assembly (NASM syntax) after years of C++ and Python and I'm already confused about the way to exit a program. It's mostly about ret because it's the suggested instruction on SASM IDE.
I'm speaking for main obviously. I don't care about x86 backward compatibility. Only the x64 Linux best way. I'm curious.
If you use printf or other libc functions, it's best to ret from main or call exit. (Which are equivalent; main's caller will call the libc exit function.)
If not, if you were only making other raw system calls like write with syscall, it's also appropriate and consistent to exit that way, but either way, or call exit are 100% fine in main.
If you want to work without libc at all, e.g. put your code under _start: instead of main: and link with ld or gcc -static -nostdlib, then you can't use ret. Use mov eax, 231 (__NR_exit_group) / syscall.
main is a real & normal function like any other (called with a valid return address), but _start (the process entry point) isn't. On entry to _start, the stack holds argc and argv, so trying to ret would set RIP=argc, and then code-fetch would segfault on that unmapped address. Nasm segmentation fault on RET in _start
System call vs. ret-from-main
Exiting via a system call is like calling _exit() in C - skip atexit() and libc cleanup, notably not flushing any buffered stdout output (line buffered on a terminal, full-buffered otherwise).
This leads to symptoms such as Using printf in assembly leads to empty output when piping, but works on the terminal (or if your output doesn't end with \n, even on a terminal.)
main is a function, called (indirectly) from CRT startup code. (Assuming you link your program normally, like you would a C program.) Your hand-written main works exactly like a compiler-generate C main function would. Its caller (__libc_start_main) really does do something like int result = main(argc, argv); exit(result);,
e.g. call rax (pointer passed by _start) / mov edi, eax / call exit.
So returning from main is exactly1 like calling exit.
Syscall implementation of exit() for a comparison of the relevant C functions, exit vs. _exit vs. exit_group and the underlying asm system calls.
C question: What is the difference between exit and return? is primarily about exit() vs. return, although there is mention of calling _exit() directly, i.e. just making a system call. It's applicable because C main compiles to an asm main just like you'd write by hand.
Footnote 1: You can invent a hypothetical intentionally weird case where it's different. e.g. you used stack space in main as your stdio buffer with sub rsp, 1024 / mov rsi, rsp / ... / call setvbuf. Then returning from main would involve putting RSP above that buffer, and __libc_start_main's call to exit could overwrite some of that buffer with return addresses and locals before execution reached the fflush cleanup. This mistake is more obvious in asm than C because you need leave or mov rsp, rbp or add rsp, 1024 or something to point RSP at your return address.
In C++, return from main runs destructors for its locals (before global/static exit stuff), exit doesn't. But that just means the compiler makes asm that does more stuff before actually running the ret, so it's all manual in asm, like in C.
The other difference is of course the asm / calling-convention details: exit status in EAX (return value) or EDI (first arg), and of course to ret you have to have RSP pointing at your return address, like it was on function entry. With call exit you don't, and you can even do a conditional tailcall of exit like jne exit. Since it's a noreturn function, you don't really need RSP pointing at a valid return address. (RSP should be aligned by 16 before a call, though, or RSP%16 = 8 before a tailcall, matching the alignment after call pushes a return address. It's unlikely that exit / fflush cleanup will do any alignment-required stores/loads to the stack, but it's a good habit to get this right.)
(This whole footnote is about ret vs. call exit, not syscall, so it's a bit of a tangent from the rest of the answer. You can also run syscall without caring where the stack-pointer points.)
SYS_exit vs. SYS_exit_group raw system calls
The raw SYS_exit system call is for exiting the current thread, like pthread_exit().
(eax=60 / syscall, or eax=1 / int 0x80).
SYS_exit_group is for exiting the whole program, like _exit.
(eax=231 / syscall, or eax=252 / int 0x80).
In a single-threaded program you can use either, but conceptually exit_group makes more sense to me if you're going to use raw system calls. glibc's _exit() wrapper function actually uses the exit_group system call (since glibc 2.3). See Syscall implementation of exit() for more details.
However, nearly all the hand-written asm you'll ever see uses SYS_exit1. It's not "wrong", and SYS_exit is perfectly acceptable for a program that didn't start more threads. Especially if you're trying to save code size with xor eax,eax / inc eax (3 bytes in 32-bit mode) or push 60 / pop rax (3 bytes in 64-bit mode), while push 231/pop rax would be even larger than mov eax,231 because it doesn't fit in a signed imm8.
Note 1: (Usually actually hard-coding the number, not using __NR_... constants from asm/unistd.h or their SYS_... names from sys/syscall.h)
And historically, it's all there was. Note that in unistd_32.h, __NR_exit has call number 1, but __NR_exit_group = 252 wasn't added until years later when the kernel gained support for tasks that share virtual address space with their parent, aka threads started by clone(2). This is when SYS_exit conceptually became "exit current thread". (But one could easily and convincingly argue that in a single-threaded program, SYS_exit does still mean exit the whole program, because it only differs from exit_group if there are multiple threads.)
To be honest, I've never used eax=252 / int 0x80 in anything, only ever eax=1. It's only in 64-bit code where I often use mov eax,231 instead of mov eax,60 because neither number is "simple" or memorable the way 1 is, so might as well be a cool guy and use the "modern" exit_group way in my single-threaded toy program / experiment / microbenchmark / SO answer. :P (If I didn't enjoy tilting at windmills, I wouldn't spend so much time on assembly, especially on SO.)
And BTW, I usually use NASM for one-off experiments so it's inconvenient to use pre-defined symbolic constants for call numbers; with GCC to preprocess a .S before running GAS you can make your code self-documenting with #include <sys/syscall.h> so you can use mov $SYS_exit_group, %eax (or $__NR_exit_group), or mov eax, __NR_exit_group with .intel_syntax noprefix.
Don't use the 32-bit int 0x80 ABI in 64-bit code:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? explains what happens if you use the COMPAT_IA32_EMULATION int 0x80 ABI in 64-bit code.
It's totally fine for just exiting, as long as your kernel has that support compiled in, otherwise it will segfault just like any other random int number like int 0x7f. (e.g. on WSL1, or people that built custom kernels and disabled that support.)
But the only reason you'd do it that way in asm would be so you could build the same source file with nasm -felf32 or nasm -felf64. (You can't use syscall in 32-bit code, except on some AMD CPUs which have a 32-bit version of syscall. And the 32-bit ABI uses different call numbers anyway so this wouldn't let the same source be useful for both modes.)
Related:
Why am I allowed to exit main using ret? (CRT startup code calls main, you're not returning directly to the kernel.)
Nasm segmentation fault on RET in _start - you can't ret from _start
Using printf in assembly leads to empty output when piping, but works on the terminal stdout buffer (not) flushing with raw system call exit
Syscall implementation of exit() call exit vs. mov eax,60/syscall (_exit) vs. mov eax,231/syscall (exit_group).
Can't call C standard library function on 64-bit Linux from assembly (yasm) code - modern Linux distros config GCC in a way that call exit or call puts won't link with nasm -felf64 foo.asm && gcc foo.o.
Is main() really start of a C++ program? - Ciro's answer is a deep dive into how glibc + its CRT startup code actually call main (including x86-64 asm disassembly in GDB), and shows the glibc source code for __libc_start_main.
Linux x86 Program Start Up
or - How the heck do we get to main()? 32-bit asm, and more detail than you'll probably want until you're a lot more comfortable with asm, but if you've ever wondered why CRT runs so much code before getting to main, that covers what's happening at a level that's a couple steps up from using GDB with starti (stop at the process entry point, e.g. in the dynamic linker's _start) and stepi until you get to your own _start or main.
https://stackoverflow.com/tags/x86/info lots of good links about this and everything else.

GCC compiled code: why integer declaration needs several statements?

I'm learning AT&T assembly,I know arrays/variables can be declared using .int/.long, or using .equ to declare a symbol, that's to be replaced by assembly.
They're declared insided either .data section(initialzed),or .bss section(uninitialzed).
But when I used gcc to compiled a very simple .c file with '-S' command line option to check the disassembly code, I noticed that:
(1) .s is not using both .data and .bss, but only .data
(2) The declaration of an integer(.long) cost several statements, some of them seems redundant or useless to me.
Shown as below, I've added some comments as per my questions.
$ cat n.c
int i=23;
int j;
int main(){
return 0;
}
$ gcc -S n.c
$ cat n.s
.file "n.c"
.globl i
.data
.align 4
.type i, #object #declare i, I think it's useless
.size i, 4 #There's '.long 23', we know it's 4 bytes, why need this line?
i:
.long 23 #Only this line is needed, I think
.comm j,4,4 #Why j is not put inside .bss .section?
.text
.globl main
.type main, #function
main:
.LFB0: #What does this symbol mean, I don't find it useful.
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0: #What does this symbol mean, I don't find it useful.
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
All my questions are in the comments above, I re-emphasize here again:
.type i, #object
.size i, 4
i:
.long 23
I really think above code is redundant, should be as simple as:
i:
.long 23
Also, "j" doesn't have a symbol tag, and is not put inside .bss section.
Did I get wrong with anything? Please help to correct. Thanks a lot.
(I am guessing you are using some Linux system)
They're declared insided either .data section(initialzed),or .bss section(uninitialzed).
No, you have many other sections, notably .comm (for the "common" section, with initialized data common to several object files, that the linker would "merge") and .rodata for read-only data. The ELF format is flexible enough to permit many sections and many segments (some of which are not loaded -more precisely memory mapped- in memory).
The description of sections in ELF files is much more complex than what you believe. Take time to read more, e.g. Linkers and loaders by Levine. Read also the documentation and the scripts of GNU binutils and also ld(1) & as(1). use objdump(1) and readelf(1) to explore existing ELF executables, object files and shared objects. Read also execve(2) & elf(5)
But when I used gcc to compiled a very simple .c file with -S command line option
When examining the assembler file generated by gcc I strongly recommend passing at least -fverbose-asm to ask gcc to emit some additional and useful comments in the assembler file. I also usually recommend to use some optimization flag -e.g. -O1 at least (or perhaps -Og on recent versions of gcc).
I noticed that:
(1) .s is not using both .data and .bss, but only .data
No, your generated code use the .comm section and put the value of j there.
(2) The declaration of an integer(.long) cost several statements, some of them seems redundant or useless to me.
These are mostly not assembler statements (translated into machine code) but assembler directives; they are very useful (and they don't waste space in the memory segment produced by ld, but the ELF format has information elsewhere). In particular .size and .type are both needed because the symbol tables in ELF files contain more than addresses (it also has a notion of size, and a very primitive notion of type).
The .LFB0 is a gcc (actually cc1-) generated label. GCC does not care about generating useless labels (it is simpler for the assembler generator in the GCC backend), since they don't appear in object files.
There's '.long 23', we know it's 4 bytes,
you might know that a long is 4 bytes, but that information (size of j) should go into the ELF file so requires explicit assembler directives....
(I don't have space or time to explain the ELF format, you need to read many pages about it, and it is a lot more complex and more complete than what you believe)
BTW, Drepper's How To Write Shared Libraries is quite long (more than 40 pages) and explains a good deal about ELF files, focusing on shared libraries.

Running windows shell commands NASM X86 Assembly language

I am writing a simple assembly program that will just execute windows commands. I will attach the current working code below. The code works if I hard code the base address of WinExec which is a function from Kernel32.dll, I used another program called Arwin to locate this address. However a reboot breaks this because of the windows memory protection Address Space Layout randomization (ASLR)
What I am looking to do is find a way to execute windows shell commands without having to hard code a memory address into my code that will change at the next reboot. I have found similar code around but nothing that I either understand or fits the purpose. I know this can be written in C but I am specifically using assembler to keep the size as small as possible.
Thanks for you advice/help.
;Just runs a simple netstat command.
;compile with nasm -f bin cmd.asm -o cmd.bin
[BITS 32]
global _start
section .text
_start:
jmp short command
function: ;Label
;WinExec("Command to execute",NULL)
pop ecx
xor eax,eax
push eax
push ecx
mov eax,0x77e6e5fd ;Address found by arwin for WinExec in Kernel32.dll
call eax
xor eax,eax
push eax
mov eax,0x7c81cafa
call eax
command: ;Label
call function
db "cmd.exe /c netstat /naob"
db 0x00
Just an update to say I found a way for referencing windows API hashes to perform any action I want in the stack. This negates the need to hard code memory addresses and allows you to write dynamic shellcode.
There are defenses against this however this would still work against the myriad of un-patched and out of date machines still around.
The following two sites were useful in finding what I needed:
http://blog.harmonysecurity.com/2009_08_01_archive.html
https://www.scriptjunkie.us/2010/03/shellcode-api-hashes/

Understanding assembly language _start label in a C program

I had written a simple c program and was trying to do use GDB to debug the program. I understand the use of following in main function:
On entry
push %ebp
mov %esp,%ebp
On exit
leave
ret
Then I tried gdb on _start and I got the following
xor %ebp,%ebp
pop %esi
mov %esp,%ecx
and $0xfffffff0,%esp
push %eax
push %esp
push %edx
push $0x80484d0
push $0x8048470
push %ecx
push %esi
push $0x8048414
call 0x8048328 <__libc_start_main#plt>
hlt
nop
nop
nop
nop
I am unable to understand these lines, and the logic behind this.
Can someone provide any guidance to help explain the code of _start?
Here is the well commented assembly source of the code you posted.
Summarized, it does the following things:
establish a sentinel stack frame with ebp = 0 so code that walks the stack can find its end easily
Pop the number of command line arguments into esi so we can pass them to __libc_start_main
Align the stack pointer to a multiple of 16 bits in order to comply with the ABI. This is not guaranteed to be the case in some versions of Linux so it has to be done manually just in case.
The addresses of __libc_csu_fini, __libc_csu_init, the argument vector, the number of arguments and the address of main are pushed as arguments to __libc_start_main
__libc_start_main is called. This function (source code here) sets up some glibc-internal variables and eventually calls main. It never returns.
If for any reason __libc_start_main should return, a hlt instruction is placed afterwards. This instruction is not allowed in user code and should cause the program to crash (hopefully).
The final series of nop instructions is padding inserted by the assembler so the next function starts at a multiple of 16 bytes for better performance. It is never reached in normal execution.
for gnu tools the _start label is the entry point of the program. for the C language to work you need to have a stack you need to have some memory/variables zeroed and some set to the values you chose:
int x = 5;
int y;
int fun ( void )
{
static int z;
}
all three of these variables x,y,z are essentially global, one is a local global. since we wrote it that way we assume that when our program starts x contains the value 5 and it is assumed that y is zero. in order for those things to happen, some bootstrap code is required and that is what happens (and more) between _start and main().
Other toolchains may choose to use a different label to define the entry/start point, but gnu tools use _start. there may be other things your tools require before main() is called C++ for example requires more than C.

Loading an exe from an exe

I am exporting a function [using _declspec(dllexport)] from a C++ exe. The function works fine when invoked by the exe itself. I am loading this exe (lets call this exe1) from another exe [the test project's exe - I'll call this exe2] using static linking i.e I use exe1's .lib file while compiling exe2 and exe2 loads it into memory on startup just like any dll. This causes the function to fail in execution.
The exact problem is revealed in the disassembly for a switch case statement within the function.
Assembly code when exe1 invokes the function
switch (dwType)
0040FF84 mov eax,dword ptr [dwType]
0040FF87 mov dword ptr [ebp-4],eax
0040FF8A cmp dword ptr [ebp-4],0Bh
0040FF8E ja $LN2+7 (40FFD2h)
0040FF90 mov ecx,dword ptr [ebp-4]
0040FF93 jmp dword ptr (40FFE0h)[ecx*4]
Consider the final two instructions. The mov moves the passed in argument into ecx. At 40EFF0h we have addresses to the various instructions for the respective case statements. Thus, the jmp would take us to the relevant case instructions
Assembly code when exe2 invokes the function
switch (dwType)
0037FF84 mov eax,dword ptr [dwType]
0037FF87 mov dword ptr [ebp-4],eax
0037FF8A cmp dword ptr [ebp-4],0Bh
0037FF8E ja $LN2+7 (37FFD2h)
0037FF90 mov ecx,dword ptr [ebp-4]
0037FF93 jmp dword ptr [ecx*4+40FFE0h]
Spot whats going wrong? The instruction addresses. The code has now been loaded into a different spot in memory. When exe1 was compiled, the compiler assumed that we will always be launching it and hence it would always be loaded at 0x0040000 [as is the case with all windows exes]. So it hard-coded a few values like 40FFE0h into the instructions. Only in the second case 40FFE0 is as good as junk memory since the instruction address table we are looking for is not located there.
How can I solve this without converting exe1 to a dll?
just don't do it. It doesn't worth the bother.
I've tried doing what you're trying a while ago. You can possibly solve the non-relocatable exe problem by changing the option in the properties window under "Linker->Advenced->Fixed base address" but then you'll have other problems.
The thing that finally made me realize its a waste of time is realizing that the EXE doesn't have a DllMain() function. This means that the CRT library is not getting initialized and that all sorts of stuff don't work the way you expect it to.
Here's the question I posted about this a while back
Have you considered another way of doing this? Such as making the 2nd .exe into a .dll and invoking it with rundll32 when you want to use it as an executable?
Otherwise:
The generated assembly is fine. The problem is that Win32 portable executables have a base address (0x0040000 in this case) and a section that contain details locations of addresses so that they can be rebased when required.
So one for two things is happening:
- Either the compiler isn't including the IMAGE_BASE_RELOCATION records when it builds the .exe.
- Or the runtime isn't performing the base relocations when it dynamiclaly loads the .exe
- (possibly both)
If the .exe does contain the relocation records, you can read them and perform the base relocation yourself. You'll have to jump through hoops like making sure you have write access to the memory (VirtualAlloc etc.) but it's conceptually quite simple.
If the .exe doesn't contain the relocation records you're stuffed - either find a compiler option to force their inclusion, or find another way to do what you're doing.
Edit: As shoosh points out, you may run into other problems once you fix this one.

Resources