I wrote the code that reads from stdin and writes to the stdout:
#include <stdio.h>
#include <unistd.h>
int main() /* copy input to output */
{
char buf[BUFSIZ];
int n;
while ((n = read(0, buf, BUFSIZ)) > 0)
write(1, buf, n);
return 0;
}
After I converted into the assembly code (a .s file) in 32-bit AT&T syntax:
.text
.globl _start
_start:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp #16 bit alignment
subl $8224, %esp #space for local variables
jmp _READ
_WRITE:
movl 8220(%esp), %eax
movl %eax, 8(%esp)
leal 28(%esp), %eax
movl %eax, 4(%esp)
movl $1, (%esp)
call write
int $0x80
_READ:
movl $8192, 8(%esp) #buffer length
leal 28(%esp), %eax
movl %eax, 4(%esp)
movl $0, (%esp)
call read
movl %eax, 8220(%esp)
cmpl $0, 8220(%esp)
jg _WRITE
movl $0, %eax
leave
ret
It works fine, but I'm not sure how to making the "read" and "write" system calls using plain assembly(i.e. moving numbers into certain registers and use "int 0x80" to execute the system calls).
My goal is to make it work even if it is compiled with the "-nostdlib" option.
Hint: 32-bit x86 is old, slow, weird and deprecated. You should use amd64 instead.
The list of system calls for Linux i386 is available in Linux source code:
https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_32.tbl
Or in glibc headers in asm/unistd_32.h. You can and should #include <asm/unistd.h> so you can use $__NR_write instead of $4 to make your asm source code self-documenting.
The system call number goes in eax. Parameter sequence is always ebx, ecx, edx, esi, edi, ebp. So code becomes:
.text
.globl _start
_start:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp #16 bit alignment
subl $8224, %esp #space for local variables
jmp _READ
_WRITE:
movl 8220(%esp), %edx
leal 28(%esp), %ecx
movl $1, %ebx
movl $4, %eax
int $0x80
_READ:
movl $8192, %edx #buffer length
leal 28(%esp), %ecx
movl $0, %ebx
movl $3, %eax
int $0x80
movl %eax, 8220(%esp)
cmpl $0, 8220(%esp)
jg _WRITE
movl $1, %eax
movl $0, %ebx
int $0x80
Assemble and link with:
$ as --32 hel.s -o hel.o
$ ld -melf_i386 hel.o -o hel
http://www.linuxjournal.com/article/4048
See also
What are the calling conventions for UNIX & Linux system calls on i386 and x86-64
https://blog.packagecloud.io/eng/2016/04/05/the-definitive-guide-to-linux-system-calls/
Related
So I have a task to do, which requires from me to scanf a char* in assembly. I tried this code:
.data
INPUT_STRING: .string "Give me a string: "
SCANF_STRING: .string "%s"
PRINTF_STRING: .string "String: %s\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $32, %esp
pushl $INPUT_STRING
call printf #printf("Give me a string: ")
addl $4, %esp
pushl -12(%ebp) # char*
pushl $SCANF_STRING # "%s"
call scanf scanf("%s", char*)
addl $8, %esp
pushl -12(%ebp)
pushl PRINTF_STRING
call printf #printf("String: %s\n")
addl $16, %esp
movl -4(%ebp), %ecx
xorl %eax, %eax
leave
leal -4(%ecx), %esp
ret
It writes down first printf correctly, then it waits for input (so scanf works), but then when I enter anything -> Segmentation fault.
I know, that the char* should be somehow initialized, but how can I do it from the assembly level?
I am compiling it on Manjaro 64 bit, with gcc -m32
GCC's stack-alignment code on entry to main is over-complicated:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $32, %esp
...
leave
leal -4(%ecx), %esp
ret
Do it so:
pushl %ebp
movl %esp, %ebp
subl $32, %esp # Space for 32 local bytes
andl $-16, %esp # Alignment by 16
...
leave
ret
The version of the i386 System V ABI used on modern Linux does guarantee/require 16-byte stack alignment before a call, so you could have re-aligned with 3 pushes (including the push %ebp) instead of an and. Unlike x86-64, most i386 library functions don't get compiled to use movaps or movdqa 16-byte aligned load/store on locals in their stack space, so you can often get away with unaligning the stack like you're doing with PUSHes before scanf. (ESP % 16 == 0 when you call printf the first time, though; that's correct.)
You want to use 12 bytes of the local stack frame for the string. scanf needs the start address of those 12 bytes. The address for that area isn't known at compile time. A -12(%ebp) gives you the value at this address, not the address itself. LEA is the instruction to calculate an address. So you have to insert this instruction to get the address at run time and to pass it to the C function:
leal -12(%ebp), %eax
pushl %eax # char*
And this is the working example (minor mistakes also corrected):
.data
INPUT_STRING: .string "Give me a string: "
SCANF_STRING: .string "%11s" ##### Accept only 11 characters (-1 because terminating null)
PRINTF_STRING: .string "String: %s\n"
.text
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
subl $32, %esp
mov $32, %ecx
mov %esp, %edi
mov $88, %al
rep stosb
pushl $INPUT_STRING
call printf # printf("Give me a string: ")
addl $4, %esp
leal -12(%ebp), %eax
pushl %eax # char*
pushl $SCANF_STRING # "%s"
call scanf # scanf("%s", char*)
addl $8, %esp
leal -12(%ebp), %eax
pushl %eax # char*
pushl $PRINTF_STRING ##### '$' was missing
call printf # printf("String: %s\n")
addl $8, %esp ##### 16 was wrong. Only 2 DWORD à 4 bytes were pushed
leave
ret
The goal of my code below to mimmic the cat program. It takes an input, and then spits it back out.
Such as:
$ Hi how are you
$ Hi how are you
$ good
$ good
What is happening though is:
$ Hi how are you
$ Hi how are you
$ good
$ good
$ow are you
My string is not being removed from my buffer and so if the input is shorter the second time, it will spit out the extra characters that have not been written over. I was wondering if anyone knew how to clear the buffer so this would not occur. Thanks
.file "rand.c"
.section .data
.buffer:
.space 10
.len:
.space 1
.globl _start
_start:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
loop:
movl $3, %eax
movl $1, %ebx
movl $.buffer, %ecx
movl $100, %edx
int $0x80
movl $4, %eax
movl $1, %ebx
movl $.buffer, %ecx
movl $100, %edx
int $0x80
jmp loop
movl $1, %eax
movl $0, %ebx
int $0x80
I wrote this to print argv[0] in x86:
.section .data
newline: .int 0xa, 0
.section .text
.globl _start
_start:
sub %al, %al
movl 4(%esp), %edi /* Pointer to argv[0]. */
sub %ecx, %ecx /* Set %ecx to 0.*/
not %ecx /* Set %ecx to -1.*/
repne scasb /* Search for %al over and over.*/
not %ecx /* Set %ecx to |%ecx| - 1.*/
dec %ecx
movl %ecx, %edx /* Move the strlen of argv[0] into %edx.*/
movl $4, %eax
movl $1, %ebx
movl 4(%esp), %ecx
int $0x80
movl $newline, %ecx
movl $1, %edx
int $0x80
movl $1, %eax
movl $0, %ebx
int $0x80
When I run this file ("print"), the output is this:
[08:27 assembly]$ ./print test
./print[08:30 assembly]$
When I ran this through gdb, the actual string length held in edx is 27, and the string it's checking is "/home/robert/assembly/print", not "./print". So I changed the %esp offsets to 8, to check argv[1]. With the same command as before, the output is this:
test
[08:33 assembly]$
Why does checking argv[0] cause the strange output, when argv[1] does as expected?
I think gdb is "helping" you by adding the full path to argv[0]. After printing, %eax holds the number of characters printed, so you'll want to reload %eax for sys_write again to print the $newline (%ebx should still be okay) - by luck, "test" is the right length. Lord knows what system call you're getting with that longer string!
I'd say you're doing good! (might be a good idea to check argc to make sure argv[1] is there before you try to print it).
asm_execve.s:
.section .data
file_to_run:
.ascii "/bin/sh"
.section .text
.globl main
main:
pushl %ebp
movl %esp, %ebp
subl $0x8, %esp # array of two pointers. array[0] = file_to_run array[1] = 0
movl file_to_run, %edi
movl %edi, -0x4(%ebp)
movl $0, -0x8(%ebp)
movl $11, %eax # sys_execve
movl file_to_run, %ebx # file to execute
leal -4(%ebp), %ecx # command line parameters
movl $0, %edx # environment block
int $0x80
leave
ret
makefile:
NAME = asm_execve
$(NAME) : $(NAME).s
gcc -o $(NAME) $(NAME).s
Program is executed, but sys_execve is not called:
alex#alex32:~/project$ make
gcc -o asm_execve asm_execve.s
alex#alex32:~/project$ ./asm_execve
alex#alex32:~/project$
Expected output is:
alex#alex32:~/project$ ./asm_execve
$ exit
alex#alex32:~/project$
This Assembly program is supposed to work like the following C code:
char *data[2];
data[0] = "/bin/sh";
data[1] = NULL;
execve(data[0], data, NULL);
Something wrong in system call parameters?
The execve system call is being called, but you are indeed passing it bad parameters.
(You can see this by running your executable using strace.)
There are three problems:
.ascii does not 0-terminate the string. (You might get lucky, as there is nothing following it in your .data section in this example, but that's not guaranteed...) Add a 0, or use .asciz (or .string) instead.
movl file_to_run, %edi moves the value pointed to by the file_to_run symbol into %edi, i.e. the first 4 bytes of the string (0x6e69622f). The address of the string is just the value of the symbol itself, so you need to use the $ prefix for literal values: movl $file_to_run, %edi. Similarly, you need to say movl $file_to_run, %ebx a few lines further down. (This is a common source of confusion between AT&T syntax and Intel syntax!)
The parameters are placed on the stack in the wrong order: -0x8(%ebp) is a lower address than -0x4(%ebp). So the address of the command string should be written to -0x8(%ebp), the 0 should be written to -0x4(%ebp), and the leal instruction should be leal -8(%ebp), %ecx.
Fixed code:
.section .data
file_to_run:
.asciz "/bin/sh"
.section .text
.globl main
main:
pushl %ebp
movl %esp, %ebp
subl $0x8, %esp # array of two pointers. array[0] = file_to_run array[1] = 0
movl $file_to_run, %edi
movl %edi, -0x8(%ebp)
movl $0, -0x4(%ebp)
movl $11, %eax # sys_execve
movl $file_to_run, %ebx # file to execute
leal -8(%ebp), %ecx # command line parameters
movl $0, %edx # environment block
int $0x80
leave
ret
You actually don't need to load anything in the other arguments. If you are doing this in x86 the following simpler code will also work:
.global _main
.section .text
.data
file_to_run:
.asciz "/bin/sh"
.section .text
.globl main
_main:
pushl %ebp
movl %esp, %ebp
movl $11, %eax # sys_execve
movl $file_to_run, %ebx # file to execute
movl $0, %ecx # Null value will work too
movl $0, %edx # Null will works too
int $0x80
leave
ret
This will essentially open a shell terminal after invoking the system call.
I'm trying to print a range of ascii characters with this assembly program.
I'm trying to do it using only the registers, but haven't been having much luck. Everything looks fine to me, but I'm a novice at assembly programming and might have missed something obvious. Any insight will be appreciated. Thanks :)
emphasized text
.text
.global _start
_start:
movl $1, %edx
movl $65, %ebx
start_loop:
addl $1, %ebx
movl $0x04, %eax
int $0x80
cmpl $126, %ebx
jle start_loop
jmp start_loop
exit
movl $0, %ebx
movl $1, %eax
int $0x80
You are invoking the sys_write system call. sys_write() takes three arguments, file descriptor of the output device(it should be 1 for stdout),address of the buffer where you stored the value to be printed, and the size of the data to be printed. So you have to store file descriptor in %ebx, and store address of the buffer in %ecx and size of the data in %edx. To store the file descriptor you can use the following instruction.
movl $1, %ebx // store 1 (stdout) in ebx)
To store the size of the data you can use:
movl $1, %edx // size is 1 byte
Now, you have to store the address of the buffer, you need to put your data in the memory some where and need to store the address of the memory in %ecx. Assume that you want store the data in the stack it self, then you can do like this:
subl $4, %esp // get 4 bytes of memory in the stack
movl $65, (%esp) // store data in the memory where esp points to
movl %esp, %ecx // store address of the data in the ecx
Now you can issue the int 0x80.
movl $04, %eax // store syscall number in eax
int $0x80 // issue the trap interrupt
As a whole you can write the following code:
movl $1, %ebx
subl $0x4, %esp
movl $64, (%esp)
start_loop:
movl (%esp), %eax
addl $1, %eax
movl %eax, (%esp)
movl %esp, %ecx
movl $1, %edx
movl $0x04, %eax
int $0x80
movl (%esp), %eax
cmpl $126, %eax
jle start_loop
addl $0x4, %esp
See Linux System Calls Part2 at http://www.rulingminds.com/syscallspart2 to know more about registers and system calls usage.
"Thank you very much for the informative answer, but is there a way to store and retrieve the value to be printed in a register without pointing to it?" -- this should probably have been edited into the question.
If you insist on using only syscalls (int $0x80) to interface with the system then the answer is no. You have to somehow pass a buffer to write and rullingminds answer applies.
Using the libc putchar(3) it's straight forward. I use %ebx to keep the ascii code as this register is on linux preserved between function calls. Simply assemble using gcc filename.S (remembering to use -m32 if you are on x86_64).
.text
.extern putchar
.global main
main:
# make room for argument to putchar on the stack
sub $4, %esp
# initialize ebx with first value to print
mov $'A', %ebx
1:
# give character to print as argument
mov %ebx, (%esp)
call putchar
# move to next character
inc %ebx
# are we done?
cmp $'~', %ebx
jle 1b
# print newline
movl $10, (%esp)
call putchar
# adjust stack back to normal
add $4, %esp
# return 0 from main
mov $0, %eax
ret