I'm trying to learn some assembly, and I'm starting out by outputting text to the screen. I'm starting to think it might be my environment and/or compilation: by now, I'm so frustrated that I've literally copy-pasted assembly code but it just won't call the system calls. Here is the source code (mainly adapted from https://en.wikibooks.org/wiki/X86_Assembly/Interfacing_with_Linux)
.section .data
msg: .ascii "Hello World\n"
.section .text
.global main
main:
movq $1, %rdi # write to stdout
movq $msg, %rsi # use string "Hello World"
movq $12, %rdx # write 12 characters
syscall # make syscall
movq $60, %rax # use the _exit syscall
movq $0, %rdi # error code 0
syscall # make syscall
I'm on a 64-bit machine running Kali Linux, and am compiling with GCC. Like so:
gcc -c test.s
gcc test.o -no-pie
I've debugged the program with GDB and the syscall instruction always sets the eax register to 0xffffffffffffffda (-38) which does not seem right...
Can anyone give an insight?
Syscalls usually return a negative value in case of error, the absolute value being the errno value itself.
In your case 38is ENOSYS: Function not implemented.
But what syscall function are you calling? Let's see, the function number is stored into rax (eax in 32-bits) before issuing the syscall and your program loads... nothing!
It looks like you lost one line in your copy/paste:
movq $1, %rax ; use the write syscall
Your code is missing the first instruction from the sample code:
movq $1, %rax ; use the write syscall
Without this code, it ends up executing an unexpected (and probably invalid) system call, based on whatever happened to be in %rax when main was called.
Related
This question already has an answer here:
Nasm segmentation fault on RET in _start
(1 answer)
Closed 11 months ago.
I wrote some simple assembly code like below:
.global _start
.text
_start:
call _sum
subq $0x8, %rsp
popq %rax
ret
_sum:
ret
In order to get the value of %rax after 'popq' instruction,
I assembled that code using 'as' and 'ld' command.
and I started gdb debugger by putting break point at '_start'
and the result comes like below:
B+> │0x400078 <_start> callq 0x400083 <_sum>
│ │0x40007d <_start+5> sub $0x8,%rsp
│ │0x400081 <_start+9> pop %rax
│ │0x400082 <_start+10> retq
│ │0x400083 <_sum> retq
However, before going into pop instruction,
There comes an error message saying that
Program received signal SIGSEGV, Segmentation fault.
Cannot access memory at address 0x1
(when I changed the $0x8 into $0x0~$0x7, it all worked.)
It seems like at the first stage the sum function might be the problem. because It literally does nothing but return.
So, How can I modify this code to get the value of %rax after the popq instruction?
Thanks.
I think probably this question is a duplicate, but anyway, there is one problem in your code.
.global _start
.text
_start:
call _sum
subq $0x8, %rsp
popq %rax
ret # <-- return to where?
_sum:
ret
A main in C has to can return because _start eventually calls main, but here, you are writing _start directly. It returns to nowhere if you put a ret.
In place of ret, put this instead.
movl $60, %eax # syscall number for sys_exit
movl $0, %edi # or whatever value you want your process
# to return with from 0 to 255;
# xor %edi, %edi is usually better if you want 0
syscall
Leave a comment if it still crashes.
BTW, I was assuming your platform is Linux (because of the AT&T syntax..). The syscalls can be different for a different platform.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm trying to write an assembly program for Linux assembly x86 that erases a file from a directory. Any tips?
maybe something like this:
.section .data
fpath:
.asciz "/home/user/filename" # path to file to delete
.section .text
.globl _start
_start:
movl $10, %eax # unlink syscall
movl $fpath, %ebx # path to file to delete
int $0x80
movl %eax, %ebx # put syscall ret value in ebx
movl $1, %eax # exit syscall
int $0x80
then check the return value at the command line, 0 being success.
$>echo $?
I tried this code and it would not unlink a file if invoked from the same
directory as the file to delete, it would unlink files from other dirs.
This is 32 bit code so on a 64 bit os if your source file was un-link.s,
you would need to create the executable with:
$> as --32 -gdwarf2 un-link.s -o un-link.o
leave out the -gdwarf2 if you don't need to run it with gdb debugger,
then link with:
$> ld -m elf_i386 un-link.o -o un-link
hope it works for you
I see alot of people using int 0x80 for this, this is deprecated and for most things, you should not use it because it is extremely slow and is subject to possible removal at any point in future. The only cases where you should use int 0x80 would be in a case where space saving is much more important than speed or where sysenter is not supported on the target platform (pre-Pentium 4 CPUs).
In x86 Linux syscalls work by having the syscall code in eax and any arguments for it in the successive registers, ebx, ecx etc. The return value of the syscall will be placed in eax.
mov eax, 0xa ;0xa is the 'unlink' syscall, which removes a file
mov ebx, <location in memory of a null terminated string containing a path>
push <LABEL TO JUMP TO AFTER SYSENTER IS COMPLETE>
push ecx
push edx
push ebp
mov ebp, esp
sysenter
If you don't care about your registers being filled with garbage after the syscall finishes (such as in cases where you'll be overwriting the contents of those registers immediately anyway) you can use this trick to save some space and speed things up a little:
mov eax, 0xa ;0xa is the 'unlink' syscall, which removes a file
mov ebx, <location in memory of a null terminated string containing a path>
push <LABEL TO JUMP TO AFTER SYSENTER IS COMPLETE>
lea ebp, [esp-12]
sysenter
If you want to write assembly code that erases a file from a directory, you can use multiple methods, but one of the best ways is to use the linux systemcall 'unlink'. In order to have to use the proper system call, you have to figure out if you have an x86 system, or an x86-64 system, or any other kind of system. I will specify how to do this using only x86/x86-64.
So system calls work as following in linux:
1.
You put a number into return register (eax or rax depending on system) that corresponds to a system call's number, like $1 is sys_exit on x86 and $29 is sys_pause, you have to make sure you get the right number in %eax or it won't work. (I feel the need to emphasize this, this number is SYSTEM DEPENDENT, so an x86 syscall number in %eax will not do the same thing in an x86-64 system). Also, OS's may tamper with this as well, I will only talk about linux, I can't speak for any other OS.
2.
Then you move the arguments into the proper registers that is specified BY THE SYSTEM, again, find a reference for your specific system, and you will know which registers to put your arguments for the function into (some syscalls don't need arguments, so this step is unnecessary, why would you need an argument for sys_exit, idk).
3.
Then finally you use syscall for your system to let the system know to run a specific function (the one you specified by putting a number into %eax/%rax).
Let's look at an example of some assembly code that deletes a file in two different systems, x86, and x86-64 (windows has its own method, but I will not talk about that one, although it does exist). If I wanted to delete/unlink the file stored at address 0x7fff50 (let's say you don't want to specify fpath and you know the address)
In x86:
movl $10, %eax # defines which systemcall we are using (10th)
movl $0x7fff50, %ebx # moves the address of file we want to delete into %ebx (this is where the argument of sys_unlink(x86) is stored)
int $0x80 # equivalent of syscall that starts the appropriate function
In x86-64:
movq $87, %rax # defines which systemcall we are using (87th)
movq $0x7fff50, %rdi # moves the address of file we want to delete into %rdi (this is where the argument of sys_unlink(x86-64) is stored)
syscall # starts the appropriate function
If you are more of a normal person and don't know the absolute address of the file you are trying to delete, we have an easy way to write relative addresses of the file itself, like so (as copied from the other answer)
.section .data
fpath:
.asciz "/home/user/filename" # path to file to delete
then the system itself will figure out where the absolute path (something like 7fff34fb) to the file is, and move that into the register when you compile the assembly. Might look like this
In x86-64:
.section .data
fpath:
.asciz "/home/user/filename" # path to file to delete
.section .text
.globl _start
_start:
movq $87, %rax # defines which systemcall we are using (87th)
movq $fpath, %rdi # moves the address of file we want to delete into %rdi
# (this is where the argument of sys_unlink(x86-64) is stored)
syscall # starts the appropriate function
You can access the output of the function (usually 0 for success or 1 for fail status) by looking at the %eax register. After each syscall, an integer is returned in %eax
You can put this text into a text file 'program.s' (I would suggest using notepad/notepad++, '.s' stands for assembly code) and compile it using gcc with $ gcc -c program.s to get the object code, and you can print and look at the object code and the hex keys involved with each command using $ objdump -d program.o
For more information about how to look up which syscall is which number and which register to put the arguements in, go here:
For x86-64: https://filippo.io/linux-syscall-table/
movl %ebx, %esi
movl $.LC1, %edi
movl $0, %eax
call printf
I use the following asm code to print what is in EBX register. When I use
movl $1,%eax
int 0x80
and the echo $? I get the correct answer but segmentation fault in the first case. I am using the GNU Assembler and AT&T syntax. How can I fix this problem?
Judging by the code, you are probably in 64 bit mode (please confirm) in which case pointers are 64 bit in size. In a position-depended executable on Linux movl $.LC1, %edi is safe and what compilers use, but to make your code position-independent and able to handle symbol addresses being outside the low 32 bits you can use leaq .LC1(%rip), %rdi.
Furthermore, make sure that:
you are preserving value of rbx in your function
stack pointer is aligned as required
This code works for me in 64 bit:
.globl main
main:
push %rbx
movl $42, %ebx
movl %ebx, %esi
leaq .LC1(%rip), %rdi
movl $0, %eax
call printf
xor %eax, %eax
pop %rbx
ret
.data
.LC1: .string "%d\n"
Edit: As Jester noted, this answer only applies to x86 (32 bits) asm whereas the sample provided is more likely for x86-64.
That's because printf has a variable number of arguments. The printf call doesn't restore the stack for you, you need to do it yourself.
In your example, you'd need to write (32 bits assembly):
push %ebx
push $.LC1
call printf
add $8, %esp // 8 : 2 argument of 4 bytes
For my own sick pleasure, I'm writing a small program in x86_64 assembly for Linux. However, I've encountered a segfault that makes absolutely no sense to me, in an instruction comparing an immediate operand to a register. What gives?
Here's the code leading up to the crash:
_start:
sub $8, %rsp
mov %rsp, %rbx
lea le_string(%rip), %rsi
mov %rsi, %rdi
add $8, %rdi
mov $26, %cl
mov (%rsi), %al
cmp 'A', %al /* This line segfaults */
/* snip code that never runs */
le_string:
.ascii "YrFgevat"
I'm assembling with gcc -nostdlib, which is calling the GNU assembler.
Dumping the registers after the crash reveals:
%rsi contains the expected pointer to the string
%al contains the expected first character in the string
%rip points to an instruction that doesn't touch memory
Please ignore the lack of normal calling conventions—I'm not calling out to anything besides the syscall interface, and this crashes before it's even gotten that far!
'A' is being interpreted as an address after all. If you want to use it as a constant, you need to write:
cmp $'A', %al
This description is valid for Linux 32 bit:
When a Linux program begins, all pointers to command-line arguments are stored on the stack. The number of arguments is stored at 0(%ebp), the name of the program is stored at 4(%ebp), and the arguments are stored from 8(%ebp).
I need the same information for 64 bit.
Edit:
I have working code sample which shows how to use argc, argv[0] and argv[1]: http://cubbi.com/fibonacci/asm.html
.globl _start
_start:
popq %rcx # this is argc, must be 2 for one argument
cmpq $2,%rcx
jne usage_exit
addq $8,%rsp # skip argv[0]
popq %rsi # get argv[1]
call ...
...
}
It looks like parameters are on the stack. Since this code is not clear, I ask this question. My guess that I can keep rsp in rbp, and then access these parameters using 0(%rbp), 8(%rbp), 16(%rbp) etc. It this correct?
Despite the accepted answer being more than sufficient, I would like to give an explicit answer, as there are some other answers which might confuse.
Most important (for more information see examples below): in x86-64 the command line arguments are passed via stack:
(%rsp) -> number of arguments
8(%rsp) -> address of the name of the executable
16(%rsp) -> address of the first command line argument (if exists)
... so on ...
It is different from the function parameter passing in x86-64, which uses %rdi, %rsi and so on.
One more thing: one should not deduce the behavior from reverse engineering of the C main-function. C runtime provides the entry point _start, wraps the command line arguments and calls main as a common function. To see it, let's consider the following example.
No C runtime/GCC with -nostdlib
Let's check this simple x86-64 assembler program, which do nothing but returns 42:
.section .text
.globl _start
_start:
movq $60, %rax #60 -> exit
movq $42, %rdi #return 42
syscall #run kernel
We build it with:
as --64 exit64.s -o exit64.o
ld -m elf_x86_64 exit64.o -o exit64
or with
gcc -nostdlib exit64.s -o exit64
run in gdb with
./exit64 first second third
and stop at the breakpoint at _start. Let's check the registers:
(gdb) info registers
...
rsi 0x0 0
rdi 0x0 0
...
Nothing there. What about the stack?
(gdb) x/5g $sp
0x7fffffffde40: 4 140737488347650
0x7fffffffde50: 140737488347711 140737488347717
0x7fffffffde60: 140737488347724
So the first element on the stack is 4 - the expected argc. The next 4 values look a lot like pointers. Let's look at the second pointer:
(gdb) print (char[5])*(140737488347711)
$1 = "first"
As expected it is the first command line argument.
So there is experimental evidence, that the command line arguments are passed via stack in x86-64. However only by reading the ABI (as the accepted answer suggested) we can be sure, that this is really the case.
With C runtime
We have to change the program slightly, renaming _start into main, because the entry point _start is provided by the C runtime.
.section .text
.globl main
main:
movq $60, %rax #60 -> exit
movq $42, %rdi #return 42
syscall #run kernel
We build it with (C runtime is used per default):
gcc exit64gcc.s -o exit64gcc
run in gdb with
./exit64gcc first second third
and stop at the breakpoint at main. What is at the stack?
(gdb) x/5g $sp
0x7fffffffdd58: 0x00007ffff7a36f45 0x0000000000000000
0x7fffffffdd68: 0x00007fffffffde38 0x0000000400000000
0x7fffffffdd78: 0x00000000004004ed
It does not look familiar. And registers?
(gdb) info registers
...
rsi 0x7fffffffde38 140737488346680
rdi 0x4 4
...
We can see that rdi contains the argc value. But if we now inspect the pointer in rsi strange things happen:
(gdb) print (char[5])*($rsi)
$1 = "\211\307???"
But wait, the second argument of the main function in C is not char *, but char ** also:
(gdb) print (unsigned long long [4])*($rsi)
$8 = {140737488347644, 140737488347708, 140737488347714, 140737488347721}
(gdb) print (char[5])*(140737488347708)
$9 = "first"
And now we found our arguments, which are passed via registers as it would be for a normal function in x86-64.
Conclusion:
As we can see, the is a difference concerning passing of command line arguments between code using C runtime and code which doesn't.
It looks like section 3.4 Process Initialization, and specifically figure 3.9, in the already mentioned System V AMD64 ABI describes precisely what you want to know.
I do believe what you need to do is check out the x86-64 ABI. Specifically, I think you need to look at section 3.2.3 Parameter Passing.