Popping and printing argc from the stack - linux

According to this paper and a few stackoverflow posts, argc is at the top of the stack and argv is below it.
I've tried about 3-4 different ways of doing it:
Popping it into an initialized variable (.data) - output done by calling printf.
Popping it into uninitialized space (.bss) - output done by calling sys_write()
A mixture of the above + tweaks.
I've been told that argc and argv aren't in the stack by someone on a forum, which I don't understand; how are other people doing it with similar code?
Here's an example of what I've attempted (3 days worth of knowledge - try not to giggle):
section .bss
argc: resd 1 ; alloc 4 bytes for popped value
section .text
global _start
_start:
pop dword[argc] ; pop argc, place in var
mov ebx,0x01 ; file descriptor = STDOUT
mov ecx,argc ; var (addr) - points to buffer
mov edx,1 ; length of buffer (single digit)
mov eax,0x04 ; syscall number for sys_write()
int 0x80 ; request the kernel to make syscall
exit:
mov ebx,0x00 ; arg for sys_exit() - sys_exit(0)
mov eax,0x01 ; syscall number for sys_exit()
int 0x80 ; request the kernel to make syscall
Solution:
section .data
msg db Value: %d\n
section .text
global main
extern printf
main:
push dword[esp+4]
push msg
call printf
add esp,8
mov eax,0
ret

The process of getting argc looks ok to me (for a 32-bit Linux machine), although you're 4 bytes off since the top of the stack most likely contains the return address to the startup code that called main.
Also, the sys_write system call expects a pointer to a string in ecx. What you're giving it is a pointer to an integer, which isn't the same thing.If you want to print the value of argc you'll have to convert it to a string first (or use the printf function).
Here's some example code (I'm using the GNU assembler since I don't have NASM on this machine):
format: .asciz "%d\n"
.text
.globl main
.type main, #function
main:
pushl 4(%esp) # push argc
pushl $format # push the format string
call printf
addl $8,%esp # pop the arguments
movl $0, %eax # return value
ret

Related

Print ARGC in NASM without printf

Any good NASM/Intel Assembly programmers out there? If so, I have a question for you!
Every tutorial I can find online, shows the usage of "printf" for printing the actual value of ARGC to the screen (fd:/dev/stdout). Is it not possible to simply print it with sys_write() for example:
SEGMENT .data ; nothing here
SEGMENT .text ; sauce
global _start
_start:
pop ECX ; get ARGC value
mov EAX, 4 ; sys_write()
mov EBX, 1 ; /dev/stdout
mov EDX, 1 ; a single byte
int 0x80
mov EAX, 1 ; sys_exit()
mov EBX, 0 ; return 0
int 0x80
SEGMENT .bss ; nothing here
When I run this, I get no output at all. I have tried copying ESP into EBP and tried using byte[EBP+4], (i was told the brackets de-reference the memory address).
I can confirm that the value when compared to a constant, works. For instance,
this code works:
pop ebp ; put the first argument on the stack
mov ebp, esp ; make a copy
cmp byte[ebp+4],0x5 ; does it equal 5?
je _good ; goto _good, &good, good()
jne _bad ; goto _bad, &bad, bad()
When we "pop" the stack, we technically should get the full number of arguments, no? Oh, btw, I compile with:
nasm -f elf test.asm -o test.o
ld -o test test.o
not sure if that is relevant. Let me know if i need to provide more information, or format my code for readability.
At least 2 problems.
You need to pass a pointer to the thing you want to print.
You probably want to convert to text.
Something like this should work:
SEGMENT .text ; sauce
global _start
_start:
mov ecx, esp ; pointer to ARGC on stack
add byte [esp], '0' ; convert to text assuming single digit
mov EAX, 4 ; sys_write()
mov EBX, 1 ; /dev/stdout
mov EDX, 1 ; a single byte
int 0x80
mov EAX, 1 ; sys_exit()
mov EBX, 0 ; return 0
int 0x80
Everyone's comments where very helpful! I am honored that you all pitched in and helped! I have used #Jester's code,
SEGMENT .text ; sauce
global _start
_start:
mov ecx, esp ; pointer to ARGC on stack
add byte [esp], '0' ; convert to text assuming single digit
mov EAX, 4 ; sys_write()
mov EBX, 1 ; /dev/stdout
mov EDX, 1 ; a single byte
int 0x80
mov EAX, 1 ; sys_exit()
mov EBX, 0 ; return 0
int 0x80
Which works perfectly when compiled, linked and loaded. The sys_write() function requires a pointer, such like in the common "Hello World" example, the symbol "msg" is a pointer as seen in the code below.
SECTION .data ; initialized data
msg: db "Hello World!",0xa
SECTION .text ; workflow
global _start
_start:
mov EAX, 4
mov EBX, 1
mov ECX, msg ; a pointer!
So first, we move the stack pointer into the counter register, ECX, with the code,
mov ecx, esp ; ecx now contains a pointer!
and then convert it to a string by adding a '0' char to the value pointed to by ESP (which is ARGC), by de-referencing it with square brackets, as [ESP] like so,
add byte[esp], '0' ; update the value stored at "esp"
Again, thank you all for the great help! <3

Why do I need to use [ ] (square brackets) when moving data from register to memory, but not when other way around?

This is the code I have and it works fine:
section .bss
bufflen equ 1024
buff: resb bufflen
whatread: resb 4
section .data
section .text
global main
main:
nop
read:
mov eax,3 ; Specify sys_read
mov ebx,0 ; Specify standard input
mov ecx,buff ; Where to read to...
mov edx,bufflen ; How long to read
int 80h ; Tell linux to do its magic
; Eax currently has the return value from linux system call..
add eax, 30h ; Convert number to ASCII digit
mov [whatread],eax ; Store how many bytes has been read to memory at loc **whatread**
mov eax,4 ; Specify sys_write
mov ebx,1 ; Specify standart output
mov ecx,whatread ; Get the address of whatread to ecx
mov edx,4 ; number of bytes to be written
int 80h ; Tell linux to do its work
mov eax, 1;
mov ebx, 0;
int 80h
Here is a simple run and output:
koray#koray-VirtualBox:~/asm/buffasm$ nasm -f elf -g -F dwarf buff.asm
koray#koray-VirtualBox:~/asm/buffasm$ gcc -o buff buff.o
koray#koray-VirtualBox:~/asm/buffasm$ ./buff
p
2koray#koray-VirtualBox:~/asm/buffasm$ ./buff
ppp
4koray#koray-VirtualBox:~/asm/buffasm$
My question is: What is with these 2 instructions:
mov [whatread],eax ; Store how many byte reads info to memory at loc whatread
mov ecx,whatread ; Get the address of whatread in ecx
Why the first one works with [] but the other one without?
When I try replacing the second line above with:
mov ecx,[whatread] ; Get the address of whatread in ecx
the executable will not run properly, it will not shown anything in the console.
Using brackets and not using brackets are basically two different things:
A bracket means that the value in the memory at the given address is meant.
An expression without a bracket means that the address (or value) itself is meant.
Examples:
mov ecx, 1234
Means: Write the value 1234 to the register ecx
mov ecx, [1234]
Means: Write the value that is stored in memory at address 1234 to the register ecx
mov [1234], ecx
Means: Write the value stored in ecx to the memory at address 1234
mov 1234, ecx
... makes no sense (in this syntax) because 1234 is a constant number which cannot be changed.
Linux "write" syscall (INT 80h, EAX=4) requires the address of the value to be written, not the value itself!
This is why you do not use brackets at this position!

Are there any examples of programs that generate text text files as output in NASM?

I need to make a program that outputs a text file with an extension of .dna, I don't know if I can really do that, and if the text file will even be compatible with what I need to compare it afterwards. Anyway, I'm not really sure how to do this. I tried to look for some examples for NASM, but I didn't find much. I have an idea of what I'd need to do, but I just don't know what to call to generate a file.
Afterwards I'd need to write stuff into it, I'm not really sure on how to go on about that. Could anyone point me to some examples or something? I just need to see what is required to write my own thing.
Here's an example using system calls. Basically, you just open the file, write some data to it, then close and exit:
; nasm -f elf file.asm
; ld -m elf_i386 file.o
BITS 32
section .data
; don't forget the 0 terminator if it akes a C string!
filename: db 'test.txt', 0
; an error message to be printed with write(). The function doesn't
; use a C string so no need for a 0 here, but we do need length.
error_message: db 'Something went wrong.', 10 ; 10 == \n
; this next line means current location minus the error_message location
; which works out the message length.
; many of the system calls use pointer+length pairs instead of
; 0 terminated strings.
error_message_length: equ $ - error_message
; a message we'll write to our file, same as the error message
hello: db 'Hello, file!', 10 ; the 10 is a newline at the end
hello_length: equ $ - hello
fd: dd 0 ; this is like a global int variable in C
; global variables are generally a bad idea and there's other
; ways to do it, but for simplicity I'm using one here as the
; other ways are a bit more work in asm
section .text
global _start
_start:
; first, open or create the file. in C it would be:
; // $ man 2 creat
; int fd = creat("file.txt", 0644); // the second argument is permission
; we get the syscall numbers from /usr/include/asm/unistd_32.h
mov eax, 8 ; creat
mov ebx, filename ; first argument
mov ecx, 644O ; the suffix O means Octal in nasm, like the leading 0 in C. see: http://www.nasm.us/doc/nasmdoc3.html
int 80h ; calls the kernel
cmp eax, -1 ; creat returns -1 on error
je error
mov [fd], eax ; the return value is in eax - the file descriptor
; now, we'll write something to the file
; // man 2 write
; write(fd, hello_pointer, hello_length)
mov eax, 4 ; write
mov ebx, [fd],
mov ecx, hello
mov edx, hello_length
int 80h
cmp eax, -1
; it should also close the file in a normal program upon write error
; since it is open, but meh, since we just terminate the kernel
; will clean up after us
je error
; and now we close the file
; // man 2 close
; close(fd);
mov eax, 6 ; close
mov ebx, [fd]
int 80h
; and now close the program by calling exit(0);
mov eax, 1 ; exit
mov ebx, 0 ; return value
int 80h
error:
mov eax, 4 ; write
mov ebx, 1 ; write to stdout - file #1
mov ecx, error_message ; pointer to the string
mov edx, error_message_length ; length of the string
int 80h ; print it
mov eax, 1 ; exit
mov ebx, 1 ; return value
int 80h
The file will be called a.out if you copied my link command above. The -o option to ld changes that.
We can also call C functions, which helps if you need to write out things like numbers.
; nasm -f elf file.asm
; gcc -m32 file.o -nostdlib -lc # notice that we're using gcc to link, makes things a bit easier
; # the options are: -m32, 32 bit, -nostdlib, don't try to use the C lib cuz it will look for main()
; # and finally, -lc to add back some of the C standard library we want
BITS 32
; docs here: http://www.nasm.us/doc/nasmdoc6.html
; we declare the C functions as external symbols. the leading underscore is a C thing.
extern fopen
extern fprintf
extern fclose
section .data
; don't forget the 0 terminator if it akes a C string!
filename: db 'test.txt', 0
filemode: db 'wt', 0 ; the mode for fopen in C
format_string: db 'Hello with a number! %d is it.', 10, 0 ; new line and 0 terminator
; an error message to be printed with write(). The function doesn't
; use a C string so no need for a 0 here, but we do need length.
error_message: db 'Something went wrong.', 10 ; 10 == \n
; this next line means current location minus the error_message location
; which works out the message length.
; many of the system calls use pointer+length pairs instead of
; 0 terminated strings.
error_message_length: equ $ - error_message
fp: dd 0 ; this is like a global int variable in C
; global variables are generally a bad idea and there's other
; ways to do it, but for simplicity I'm using one here as the
; other ways are a bit more work in asm
section .text
global _start
_start:
; first, open or create the file. in C it would be:
; FILE* fp = fopen("text.txt", "wt");
; arguments for C functions are pushed on to the stack, right from left.
push filemode ; "wt"
push filename ; "text.txt"
call fopen
add esp, 8 ; we need to clean up our own stack. Since we pushed two four-byte items, we need to pop the 8 bytes back off. Alternatively, we could have called pop twice, but a single add instruction keeps our registers cleaner.
; the return value is in eax, store it in our fp variable after checking for errors
; in C: if(fp == NULL) goto error;
cmp eax, 0 ; check for null
je error
mov [fp], eax;
; call fprintf(fp, "format string with %d", 55);
; the 55 is just a random number to print
mov eax, 55
push eax ; all arguments are pushed, right to left. We want a 4 byte int equal to 55, so eax is it
push format_string
mov eax, [fp] ; again using eax as an intermediate to store our 4 bytes as we push to the stack
push eax
call fprintf
add esp, 12 ; 3 words this time to clean up
; fclose(fp);
mov eax, [fp] ; again using eax as an intermediate to store our 4 bytes as we push to the stack
push eax
call fclose
; the rest is unchanged from the above example
; and now close the program by calling exit(0);
mov eax, 1 ; exit
mov ebx, 0 ; return value
int 80h
error:
mov eax, 4 ; write
mov ebx, 1 ; write to stdout - file #1
mov ecx, error_message ; pointer to the string
mov edx, error_message_length ; length of the string
int 80h ; print it
mov eax, 1 ; exit
mov ebx, 1 ; return value
int 80h
There's a lot more that can be done here, like a few techniques to eliminate those global variables, or better error checking, or even writing a C style main() in assembly. But this should get you started in writing out a text file. Tip: Files are the same as writing to the screen, you just need to open/create them first!
BTW don't mix the system calls and the C library functions at the same time. The C library (fprintf etc) buffers data, the system calls don't. If you mix them, the data might end up written to the file in a surprising order.
The code is similar, but slightly different in 64 bit.
Finally, this same pattern can be used to translate almost any C code to asm - the C calling convention is the same with different functions, and the linux system call convention with the argument placement etc. follows a consistent pattern too.
Further reading:
http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl on the C calling convention
http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html on linux system calls
What is the purpose of EBP in the following code? is another SO answer I wrote up a while ago about local variables in asm - this will have hints as to one way to get rid of that global and describes how the C compile does it. (the other way to get rid of that global is to either keep the fd/fp in a register and push and pop it onto the stack when you need to free up the register for something else)
And the man pages referenced in the code for each function. From your linux prompt, do things like man 2 write or man 3 fprintf to see more. (System calls are in manual section 2 and C functions are in manual section 3).

Does int 0x80 overwrite register values? [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I wrote a program which is supposed to behave like a for while loop, printing a string of text a certain number of times.
Here is the code:
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
section .text
_start:
mov r8, 0x10 ; <s> put imax in r8d, this will be our 'i' </s>
; just attempt 10 iterations
_loop_entry: ; loop entry point
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
dec r8 ; decrease i and jump on not zero
cmp r8,1 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80
The problem I have is the program runs into an infinite loop. I ran it inside gdb and the cause is:
int 0x80 is called to print the message, and this works correctly, however after the interrupt finishes, the contents of r8 is set to zero, rather than the value it should be. r8 is where the counter sits, counting (down) the number of times the string is printed.
Does int 0x80 modify register values? I noticed that rax, rbx, rcx, rdx were not affected in the same way.
Test Results
Answer: YES! It does modify r8.
I have changed two things in my program. Firstly I now cmp r8, 0, to get Hello World! the correct number of times, and
I have added
mov [i], r8 ; put away i
After _loop_entry:
and also I have added
mov r8, [i] ; get i back
after the first int 0x80.
Here is my now working program. More info to come on performance against C++.
;
; main.asm
;
;
; To be used with main.asm, as a test to see if optimized c++
; code can be beaten by me, writing a for / while loop myself.
;
;
; Absolute minimum code to be competative with asm.
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
i dd 0x0 ; defines i
section .text
_start:
mov r8, 0x10 ; put imax in r8d, this will be our 'i'
_loop_entry: ; loop entry point
mov [i], r8 ; put away i
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
mov r8, [i] ; get i back
dec r8 ; decrease i and jump on not zero
cmp r8,0 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80
int 0x80 just causes a software interrupt. In your case it's being used to make a system call. Whether or not any registers are affected will depend on the particular system call you're invoking and the system call calling convention of your platform. Read your documentation for the details.
Specifically, from the System V Application Binary Interface x86-64™ Architecture Processor Supplement [PDF link], Appendix A, x86-64 Linux Kernel Conventions:
The interface between the C library and the Linux kernel is the same as for the user-level applications...
For user-level applications, r8 is a scratch register, which means it's caller-saved. If you want it to be preserved over the system call, you'll need to do it yourself.

Reading from a file in assembly

I'm trying to learn assembly -- x86 in a Linux environment. The most useful tutorial I can find is Writing A Useful Program With NASM. The task I'm setting myself is simple: read a file and write it to stdout.
This is what I have:
section .text ; declaring our .text segment
global _start ; telling where program execution should start
_start: ; this is where code starts getting exec'ed
; get the filename in ebx
pop ebx ; argc
pop ebx ; argv[0]
pop ebx ; the first real arg, a filename
; open the file
mov eax, 5 ; open(
mov ecx, 0 ; read-only mode
int 80h ; );
; read the file
mov eax, 3 ; read(
mov ebx, eax ; file_descriptor,
mov ecx, buf ; *buf,
mov edx, bufsize ; *bufsize
int 80h ; );
; write to STDOUT
mov eax, 4 ; write(
mov ebx, 1 ; STDOUT,
; mov ecx, buf ; *buf
int 80h ; );
; exit
mov eax, 1 ; exit(
mov ebx, 0 ; 0
int 80h ; );
A crucial problem here is that the tutorial never mentions how to create a buffer, the bufsize variable, or indeed variables at all.
How do I do this?
(An aside: after at least an hour of searching, I'm vaguely appalled at the low quality of resources for learning assembly. How on earth does any computer run when the only documentation is the hearsay traded on the 'net?)
Ohh, this is going to be fun.
Assembly language doesn't have variables. Those are a higher-level language construct. In assembly language, if you want variables, you make them yourself. Uphill. Both ways. In the snow.
If you want a buffer, you're going to have to either use some region of your stack as the buffer (after calling the appropriate stack-frame-setup instructions), or use some region on the heap. If your heap is too small, you'll have to make a SYSCALL instruction (another INT 80h) to beg the operating system for more (via sbrk).
Another alternative is to learn about the ELF format and create a global variable in the appropriate section (I think it's .data).
The end result of any of these methods is a memory location you can use. But your only real "variables" like you're used to from the now-wonderful-seeming world of C are your registers. And there aren't very many of them.
The assembler might help you out with useful macros. Read the assembler documentation; I don't remember them off the top of my head.
Life is tough down there at the ASM level.
you must declare your buffer in bss section and the bufsize in data
section .data
bufsize dw 1024
section .bss
buf resb 1024
After the call to open, the file handle is in eax. You rightfully move eax it to ebx, where the call to read will look for it. Unfortunately, at this point you have already overwritten it with 3, the syscall for reading.

Resources