dlsym crash when called from assembler - linux

I have a small program in assembler that loads an .so file using dlopen, and then tries to load a function pointer using dlsym. Calling dlopen seems to be fine but it crashes when I call dlsym.
SECTION .text
;default rel
EXTERN dlopen ; loads a dynamic library
EXTERN dlsym ; retrieves the address for a symbol in the dynamic library
; inputs:
; rdi: rdi the pointer to print
printHex:
sub rsp, 19 ; allocate space for the string 0x0123456789ABCDEF\n
mov BYTE [rsp + 0], '0'
mov BYTE [rsp + 1], 'x'
xor rcx, rcx ; int loop variable to 0
.LOOP1:
lea rsi, [rsp + rcx] ; rsi will we the offset where we will store the next hex charcter
mov rax, rdi
and rax, 0xf
sar rdi, 4 ; shift right 4 bits (divide by 16)
lea rdx, [hexLookUp + rax]
mov bl, [rdx]
mov BYTE [rsi +18], bl
dec rcx ; rcx--
cmp rcx, -16 ; while rcx > -16
jne .LOOP1
mov BYTE [rsp + 18], 10
; print
mov rax, 1 ; syscall: write
mov rdi, 1 ; stdout
mov rsi, rsp
mov rdx, 19
syscall
; release stack memory
add rsp, 19
ret
global _start ; "global" means that the symbol can be accessed in other modules. In order to refer to a global symbol from another module, you must use the "extern" keyboard
_start:
; load the library
mov rdi, str_libX11so
mov rsi, 2; RTLD_NOW=2
call dlopen wrt ..plt
; PLT stands for Procedure Linkage Table:
; used to call external library functions whose address is not know at link time,
; so it must be resolved by the dynamic linker at run time
; more info: https://reverseengineering.stackexchange.com/questions/1992/what-is-plt-got
mov [ptr_libX11so], rax ; the previous function call returned the value in rax
mov rdi, rax
call printHex
; load the function
mov rdi, [str_libX11so]
mov rsi, fstr_XOpenDisplay
call dlsym wrt ..plt
mov [fptr_XOpenDisplay], rax
mov rdi, rax
call printHex
mov rax, 60 ; syscal: exit
mov rdi, 0 ; return code
syscall
hexLookUp: db "0123456789ABCDEF"
str_libX11so: db "libX11.so", 0
; X11 function names
fstr_XOpenDisplay: db "XOpenDisplay", 0
SECTION .data
ptr_libX11so: dq 0 ; ptr to the X11 library
; X11 function ptrs
fptr_XOpenDisplay: dq 0
I have tried to make the same program in C and it seems to work. So I must be doing something wrong.
extern void* dlopen(const char* name, int);
extern void* dlsym(void* restrict handle, const char* restrict name);
int main()
{
void* libX11so = dlopen("libX11.so", 2);
void (*XOpenDisplay)() = dlsym(libX11so, "XOpenDisplay");
}
I tried to disassemble the C version and compare, but I can't still figure out what is the problem.
An interesting thing I noticed is that the pointer returned by dlopen (which is different in each execution), in the asm version is quite small compared to the C version (e.g 0x0000000001A932D vs 0x5555555592d0). But maybe that could be because I'm using the -no-pie flag for linking:
nasm -f elf64 -g -F dwarf minimal.asm && gcc -nostartfiles -no-pie minimal.o -ldl -o minimal && ./minimal

I just noticed my mistake:
; load the function
mov rdi, [str_libX11so]
should be:
; load the function
mov rdi, [ptr_libX11so]

Related

msvc x64: how to omit extra size check before __chkstk() for _alloca(size_t)

When call _alloca(size) with a runtime-known size, msvc x64 v19.* will call __chkstk(), but emits extra code that checks if size+15 is overflow, if that occurs it make size=0x0ffffffffffffff0, see: godbolt.org/z/YT4xE8s4q
extern void * _alloca(size_t); //x64 msvc v19.*
int f()
{
size_t n = 3 & (size_t)f;
void * p = _alloca(n);
return 3 & (int)(size_t)p;
}
compiled by x64 msvc v19.latest with option -O2 -GS-:
f PROC ; COMDAT
$LN5:
push rbp
sub rsp, 32 ; 00000020H
lea rbp, QWORD PTR [rsp+32]
lea rax, OFFSET FLAT:f
and eax, 3 ; eax = n
lea rcx, QWORD PTR [rax+15] ; rcx = n+15 for 16-byte align
cmp rcx, rax ; checks overflow
ja SHORT $LN3#f ; normally n+15 is above n
mov rcx, 1152921504606846960 ; 0ffffffffffffff0H
$LN3#f:
and rcx, -16 ; align the size
mov rax, rcx ; rax = argument for __chkstk
call __chkstk ; probe stack pages in sequence
sub rsp, rcx ; do allocation after probe
lea rax, QWORD PTR [rsp+32]
and eax, 3
mov rsp, rbp
pop rbp
ret 0
f ENDP
Such "overflow check" is needless for normal program, clang and gcc do not emit such checks.
It is unexpected that for every _alloca it inserts garbage instructions (cmp + ja + mov = 15 bytes).
I tried __assume(n+15>n) and __assume(n<0xFFFFu), but does not help and seems ignored.
I guess msvc backend (c2.dll) use some hardcoded "snippet" to handle _alloca().
So the question is, is there an option, documented or undocumented, to disable the "overflow check"?
Or, is there some global flag that could "control" the compiler backend's "snippet"?

Unable to use fopen More Than Once in YASM Assembly

Essentially what I am attempting to do is make a copy of a file--really, it's a lot more, but I can't get past this first hurdle--using YASM assembly on x86_64 Linux. My problem is that I seem to be unable to use fopen more than once.
My code so far:
segment .data
RW dd "w+", 0 ; RW -- read write (creates)
RO dd "r" , 0 ; RO -- read only
po dq 0 ; po -- pointer original
pn dq 0 ; pn -- pointer new
segment .text
global border
extern fopen
extern fclose
extern fputc
border:
push rbp
mov rbp, rsp
mov r8, rdi ; r8 -- the original file name
mov r9, rsi ; r9 -- the destination file
mov rdi, r8
lea rsi, [RO]
call fopen
mov [po], rax
mov rdi, r9
lea rsi, [RW]
call fopen
mov [pn], rax
mov rdi, "B" ; Just a test to know if worked.
mov rsi, [pn]
call fputc
EXIT:
mov rdi, [po]
call fclose
mov rdi, [pn]
call fclose
mov rsp, rbp
pop rbp
ret
And it's called from the following C program:
char* source = "TestNorm.txt";
char* dest = "TestDest.txt";
border(source, dest);
I've tried a few things, but ultimately it comes down to the second fopen not working--the file is not opened, and I obviously get a seg fault when I try to use the file pointer for it--but the first one works perfectly.
I'm utterly stumped on this one. Any help would be much appreciated.

Printing `argv[]` with nasm

I'm trying to print the command line arguments given to my program, using nasm:
GLOBAL main
EXTERN printf
section .rodata
fmt db "Argument: %s", 10, 0
section .text
main:
push ebp ; push ebp0
mov ebp, esp ; [ebp1] == ebp0
push dword[ebp+8] ; push argc
call print_args
mov eax, 0 ; return(0)
mov esp, ebp ; pop
pop ebp ; stack frame
ret
print_args:
push ebp ; pusheo ebp1
mov ebp, esp ; [ebp2] == ebp1
mov edi, dword[ebp+8] ; [ebp+8] == argc
jmp lop
postlop:
mov esp, ebp
pop ebp
ret
lop:
sub edi, 1
cmp edi, 0
jz postlop
mov esi, [ebp] ; [esi] == ebp1
mov ebx, [esi + 12] ; [esi+12] = [ebp1+12] = argv[0]?
push ebx
push fmt
call printf
jmp lop
However, this prints only garbage (I believe this should print argv[0], argc-1 times.).
I'm compiling my code with:
nasm -f elf32 main.asm
gcc -m32 main.o -o main.out
What is wrong?
By the way, using dword[ebp+8] works correctly to pick up argc.
I'm running this on ubuntu. Program does output Argument: ... argc-1 times, but the ... is garbage.
Just like [epb+8]is argc, [esi + 12] is argv, i.e. the address of the array of argument adresses. Thus, in order to find argv[0], you have to dereference once more.
mov esi, [ebp] ; [esi] == ebp1
mov ebx, [esi + 12] ; [esi+12] = [ebp1+12] = argv
push dword [ebx] ; [ebx] = argv[0]
;^^^^^^^^^^^
push fmt
call printf
I worked on this and this is all you need
Assemble as so. On my 32-bit debian 9 VM:
$ nasm -felf32 -g args.s -o args.o
$ gcc -no-pie args.o -o args
segment .data
format db "%s",0x0a,0
segment .text
global main ; let the linker know about main
extern printf ; resolve printf from libc
main:
push ebp ; prepare stack frame for main
mov ebp, esp
sub esp, 8
mov edi, dword[ebp+8] ; get argc into edi
mov esi, dword[ebp+12] ; get first argv string into esi
start_loop:
xor eax, eax
push dword [esi] ; must dereference esi; points to argv
push format
call printf
add esi, 4 ; advance to the next pointer in argv
dec edi ; decrement edi from argc to 0
cmp edi, 0 ; when it hits 0, we're done
jnz start_loop ; end with NULL pointer
end_loop:
xor eax, eax
leave
ret

Segmentation fault ASM on linux at printf

The following is a program from a book (Introduction to 64 Bit Intel Assembly Language Programming for Linux, by Seyfarth, 2012), chap 9. The fault (in gdb) is:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7aa10a5 in __printf_size (fp=0x400400, info=0x0,
args=) at printf_size.c:199
199 printf_size.c: No such file or directory.
Until this chapter, I successfully used the following to "produce an object file", as recommended,
yasm -f elf64 -g dwarf2 -l exit.lst exit.asm
and then,
ld -o prgm prgm.o
This is the program as copied from the book(l 10 push rbp; I had firstly rem'd the ; but had the same result):
segment .text
global main
extern printf
; void print_max ( long a, long b )
; {
a equ 0
b equ 8
print_max:
push rbp; ;normal stack frame
mov rbp, rsp
; leave space for a, b and max
sub rsp, 32
; int max;
max equ 16
mov [rsp+a], rdi ; save a
mov [rsp+b], rsi ; save b
; max = a;
mov [rsp+max], rdi
; if ( b > max ) max = b;
cmp rsi, rdi
jng skip
mov [rsp+max], rsi
skip:
; printf ( "max(%1d,%1d ) = %1d\n",
; a, b, max );
segment .data
fmt db 'max(%1d,%1d) = %1d',0xa,0
segment .text
lea rdi, [fmt]
mov rsi, [rsp+a]
mov rdx, [rsp+b]
mov rcx, [rsp+max]
call printf
; }
leave
ret
main:
push rbp
mov rbp, rsp
; print_max ( 100, 200 );
mov rdi, 100 ;first parameter
mov rsi, 200 ;second parameter
call print_max
xor eax, eax ;to return 0
leave
ret
After a similar segmentation fault with a previous program in this chap ("Hello World" example), I used
gcc -o prgm prgm.o
which had worked until this program.
using gcc to link is the easiest way to go if you are going to use functions from the C Library, since gcc takes care of a few things for you "behind the scenes".
To use just ld, you need to link against ld-linux-x86-64.so.2 and pass it -lc to link to the C Library.
Next, you are using printf wrong. If you are not using floating point registers (which you are not) you need to "zero out" rax.
Also, since you are linking against the C Library, you cannot just ret from the main but call exit.
lea rdi, [fmt]
mov rsi, [rsp+a]
mov rdx, [rsp+b]
mov rcx, [rsp+max]
xor rax, rax ; # of floating point registers used.
call printf
and:
; print_max ( 100, 200 );
mov rdi, 100 ;first parameter
mov rsi, 200 ;second parameter
call print_max
xor eax, eax ;to return 0
leave
xor rdi, rdi
call exit
ld -o $(APP) $(APP).o -lc -I/lib64/ld-linux-x86-64.so.2
and the output:
max(100,200) = 200
Gunner gave an excellent summary. The program should have placed a 0 in rax. This can be done using "xor eax, eax" which is the normal way to zero out a register in x86-64 mode. The top half of the register is zeroed out with xor with a 32 bit register and the lower half depends on the the bits of the 2 registers used (with eax, eax the result is 0).

x64 bit assembly

I started assembly (nasm) programming not too long ago. Now I made a C function with assembly implementation which prints an integer. I got it working using the extended registers, but when I want to write it with the x64 registers (rax, rbx, ..) my implementation fails. Does any of you see what I missed?
main.c:
#include <stdio.h>
extern void printnum(int i);
int main(void)
{
printnum(8);
printnum(256);
return 0;
}
32 bit version:
; main.c: http://pastebin.com/f6wEvwTq
; nasm -f elf32 -o printnum.o printnum.asm
; gcc -o printnum printnum.o main.c -m32
section .data
_nl db 0x0A
nlLen equ $ - _nl
section .text
global printnum
printnum:
enter 0,0
mov eax, [ebp+8]
xor ebx, ebx
xor ecx, ecx
xor edx, edx
push ebx
mov ebx, 10
startLoop:
idiv ebx
add edx, 0x30
push dx ; With an odd number of digits this will screw up the stack, but that's ok
; because we'll reset the stack at the end of this function anyway.
; Needs fixing though.
inc ecx
xor edx, edx
cmp eax, 0
jne startLoop
push ecx
imul ecx, 2
mov edx, ecx
mov eax, 4 ; Prints the string (from stack) to screen
mov ebx, 1
mov ecx, esp
add ecx, 4
int 80h
mov eax, 4 ; Prints a new line
mov ebx, 1
mov ecx, _nl
mov edx, nlLen
int 80h
pop eax ; returns the ammount of used characters
leave
ret
x64 version:
; main.c : http://pastebin.com/f6wEvwTq
; nasm -f elf64 -o object/printnum.o printnum.asm
; gcc -o bin/printnum object/printnum.o main.c -m64
section .data
_nl db 0x0A
nlLen equ $ - _nl
section .text
global printnum
printnum:
enter 0, 0
mov rax, [rbp + 8] ; Get the function args from the stac
xor rbx, rbx
xor rcx, rcx
xor rdx, rdx
push rbx ; The 0 byte of the string
mov rbx, 10 ; Dividor
startLoop:
idiv rbx ; modulo is in rdx
add rdx, 0x30
push dx
inc rcx ; increase the loop variable
xor rdx, rdx ; resetting the modulo
cmp rax, 0
jne startLoop
push rcx ; push the counter on the stack
imul rcx, 2
mov rdx, rcx ; string length
mov rax, 4
mov rbx, 1
mov rcx, rsp ; the string
add rcx, 4
int 0x80
mov rax, 4
mov rbx, 1
mov rcx, _nl
mov rdx, nlLen
int 0x80
pop rax
leave
ret ; return to the C routine
Thanks in advance!
I think your problem is that you're trying to use the 32-bit calling conventions in 64-bit mode. That won't fly, not if you're calling these assembly routines from C. The 64-bit calling convention is documented here: http://www.x86-64.org/documentation/abi.pdf
Also, don't open-code system calls. Call the wrappers in the C library. That way errno gets set properly, you take advantage of sysenter/syscall, you don't have to deal with the differences between the normal calling convention and the system-call argument convention, and you're insulated from certain low-level ABI issues. (Another of your problems is that write is system call number 1, not 4, for Linux/x86-64.)
Editorial aside: There are two, and only two, reasons to write anything in assembly nowadays:
You are writing one of the very few remaining bits of deep magic that cannot be written in C alone (a good example is the guts of libffi)
You are hand-optimizing an inner-loop subroutine that has been measured to be performance-critical and the C compiler doesn't do a good enough job on.
Otherwise just write whatever it is in C. Your successors will thank you.
EDIT: checked system call numbers.
I'm not sure if this answer is related to the problem you're seeing (since you didn't specify anything about what the failure is), but 64-bit code has a different calling convention than 32-bit code does. Both of the major 64-bit Intel ABIs (Windows & Linux/BSD/Mac OS) pass function parameters in registers and not on the stack. Your program appears to still be expecting them on the stack, which isn't the normal way to go about it.
Edit: Now that I see there is a C main() routine that calls your functions, my answer is exactly about the problem you're having.

Resources