Reading from a file in assembly - linux

I'm trying to learn assembly -- x86 in a Linux environment. The most useful tutorial I can find is Writing A Useful Program With NASM. The task I'm setting myself is simple: read a file and write it to stdout.
This is what I have:
section .text ; declaring our .text segment
global _start ; telling where program execution should start
_start: ; this is where code starts getting exec'ed
; get the filename in ebx
pop ebx ; argc
pop ebx ; argv[0]
pop ebx ; the first real arg, a filename
; open the file
mov eax, 5 ; open(
mov ecx, 0 ; read-only mode
int 80h ; );
; read the file
mov eax, 3 ; read(
mov ebx, eax ; file_descriptor,
mov ecx, buf ; *buf,
mov edx, bufsize ; *bufsize
int 80h ; );
; write to STDOUT
mov eax, 4 ; write(
mov ebx, 1 ; STDOUT,
; mov ecx, buf ; *buf
int 80h ; );
; exit
mov eax, 1 ; exit(
mov ebx, 0 ; 0
int 80h ; );
A crucial problem here is that the tutorial never mentions how to create a buffer, the bufsize variable, or indeed variables at all.
How do I do this?
(An aside: after at least an hour of searching, I'm vaguely appalled at the low quality of resources for learning assembly. How on earth does any computer run when the only documentation is the hearsay traded on the 'net?)

Ohh, this is going to be fun.
Assembly language doesn't have variables. Those are a higher-level language construct. In assembly language, if you want variables, you make them yourself. Uphill. Both ways. In the snow.
If you want a buffer, you're going to have to either use some region of your stack as the buffer (after calling the appropriate stack-frame-setup instructions), or use some region on the heap. If your heap is too small, you'll have to make a SYSCALL instruction (another INT 80h) to beg the operating system for more (via sbrk).
Another alternative is to learn about the ELF format and create a global variable in the appropriate section (I think it's .data).
The end result of any of these methods is a memory location you can use. But your only real "variables" like you're used to from the now-wonderful-seeming world of C are your registers. And there aren't very many of them.
The assembler might help you out with useful macros. Read the assembler documentation; I don't remember them off the top of my head.
Life is tough down there at the ASM level.

you must declare your buffer in bss section and the bufsize in data
section .data
bufsize dw 1024
section .bss
buf resb 1024

After the call to open, the file handle is in eax. You rightfully move eax it to ebx, where the call to read will look for it. Unfortunately, at this point you have already overwritten it with 3, the syscall for reading.

Related

Joining strings from registers and printing them (CPUID)

Starting to learn NASM assembly, I was looking at some assembly questions here in Stack Overflow and found this one here:
Concatenating strings from registers and printing them
I believe that this question is not duplicated because I am trying to
replicate the code in NASM and also things were not very clear in the
other question.
I decided to replicate this code in NASM, but I did not quite understand the MASM code in question.
I learned about CPUID and did some testing programs.
In order, I'd like to know how we can concatenate registers and then print them on the screen USING NASM.
I want to print 'ebx' + 'edx' + 'ecx' because this is how the CPUID output is organized by what I see in GDB.
I called CPUID with eax=1
"String" is not a very precise term. The Vendor Identification String of CPUID/EAX=0 contains only 12 ASCII characters, packed into 3 DWORD registers. There is no termination character like in C nor a length information like in PASCAL. But it's always the same registers and it's always 3*4=12 bytes. This is ideal for the write-syscall:
section .bss
buff resb 12
section .text
global _start
_start:
mov eax, 0
cpuid
mov dword [buff+0], ebx ; Fill the first four bytes
mov dword [buff+4], edx ; Fill the second four bytes
mov dword [buff+8], ecx ; Fill the third four bytes
mov eax, 4 ; SYSCALL write
mov ebx, 1 ; File descriptor = STDOUT
mov ecx, buff ; Pointer to ASCII string
mov edx, 12 ; Count of bytes to send
int 0x80 ; Call Linux kernel
mov eax, 1 ; SYSCALL exit
mov ebx, 0 ; Exit Code
int 80h ; Call Linux kernel

Why do I need to use [ ] (square brackets) when moving data from register to memory, but not when other way around?

This is the code I have and it works fine:
section .bss
bufflen equ 1024
buff: resb bufflen
whatread: resb 4
section .data
section .text
global main
main:
nop
read:
mov eax,3 ; Specify sys_read
mov ebx,0 ; Specify standard input
mov ecx,buff ; Where to read to...
mov edx,bufflen ; How long to read
int 80h ; Tell linux to do its magic
; Eax currently has the return value from linux system call..
add eax, 30h ; Convert number to ASCII digit
mov [whatread],eax ; Store how many bytes has been read to memory at loc **whatread**
mov eax,4 ; Specify sys_write
mov ebx,1 ; Specify standart output
mov ecx,whatread ; Get the address of whatread to ecx
mov edx,4 ; number of bytes to be written
int 80h ; Tell linux to do its work
mov eax, 1;
mov ebx, 0;
int 80h
Here is a simple run and output:
koray#koray-VirtualBox:~/asm/buffasm$ nasm -f elf -g -F dwarf buff.asm
koray#koray-VirtualBox:~/asm/buffasm$ gcc -o buff buff.o
koray#koray-VirtualBox:~/asm/buffasm$ ./buff
p
2koray#koray-VirtualBox:~/asm/buffasm$ ./buff
ppp
4koray#koray-VirtualBox:~/asm/buffasm$
My question is: What is with these 2 instructions:
mov [whatread],eax ; Store how many byte reads info to memory at loc whatread
mov ecx,whatread ; Get the address of whatread in ecx
Why the first one works with [] but the other one without?
When I try replacing the second line above with:
mov ecx,[whatread] ; Get the address of whatread in ecx
the executable will not run properly, it will not shown anything in the console.
Using brackets and not using brackets are basically two different things:
A bracket means that the value in the memory at the given address is meant.
An expression without a bracket means that the address (or value) itself is meant.
Examples:
mov ecx, 1234
Means: Write the value 1234 to the register ecx
mov ecx, [1234]
Means: Write the value that is stored in memory at address 1234 to the register ecx
mov [1234], ecx
Means: Write the value stored in ecx to the memory at address 1234
mov 1234, ecx
... makes no sense (in this syntax) because 1234 is a constant number which cannot be changed.
Linux "write" syscall (INT 80h, EAX=4) requires the address of the value to be written, not the value itself!
This is why you do not use brackets at this position!

Are there any examples of programs that generate text text files as output in NASM?

I need to make a program that outputs a text file with an extension of .dna, I don't know if I can really do that, and if the text file will even be compatible with what I need to compare it afterwards. Anyway, I'm not really sure how to do this. I tried to look for some examples for NASM, but I didn't find much. I have an idea of what I'd need to do, but I just don't know what to call to generate a file.
Afterwards I'd need to write stuff into it, I'm not really sure on how to go on about that. Could anyone point me to some examples or something? I just need to see what is required to write my own thing.
Here's an example using system calls. Basically, you just open the file, write some data to it, then close and exit:
; nasm -f elf file.asm
; ld -m elf_i386 file.o
BITS 32
section .data
; don't forget the 0 terminator if it akes a C string!
filename: db 'test.txt', 0
; an error message to be printed with write(). The function doesn't
; use a C string so no need for a 0 here, but we do need length.
error_message: db 'Something went wrong.', 10 ; 10 == \n
; this next line means current location minus the error_message location
; which works out the message length.
; many of the system calls use pointer+length pairs instead of
; 0 terminated strings.
error_message_length: equ $ - error_message
; a message we'll write to our file, same as the error message
hello: db 'Hello, file!', 10 ; the 10 is a newline at the end
hello_length: equ $ - hello
fd: dd 0 ; this is like a global int variable in C
; global variables are generally a bad idea and there's other
; ways to do it, but for simplicity I'm using one here as the
; other ways are a bit more work in asm
section .text
global _start
_start:
; first, open or create the file. in C it would be:
; // $ man 2 creat
; int fd = creat("file.txt", 0644); // the second argument is permission
; we get the syscall numbers from /usr/include/asm/unistd_32.h
mov eax, 8 ; creat
mov ebx, filename ; first argument
mov ecx, 644O ; the suffix O means Octal in nasm, like the leading 0 in C. see: http://www.nasm.us/doc/nasmdoc3.html
int 80h ; calls the kernel
cmp eax, -1 ; creat returns -1 on error
je error
mov [fd], eax ; the return value is in eax - the file descriptor
; now, we'll write something to the file
; // man 2 write
; write(fd, hello_pointer, hello_length)
mov eax, 4 ; write
mov ebx, [fd],
mov ecx, hello
mov edx, hello_length
int 80h
cmp eax, -1
; it should also close the file in a normal program upon write error
; since it is open, but meh, since we just terminate the kernel
; will clean up after us
je error
; and now we close the file
; // man 2 close
; close(fd);
mov eax, 6 ; close
mov ebx, [fd]
int 80h
; and now close the program by calling exit(0);
mov eax, 1 ; exit
mov ebx, 0 ; return value
int 80h
error:
mov eax, 4 ; write
mov ebx, 1 ; write to stdout - file #1
mov ecx, error_message ; pointer to the string
mov edx, error_message_length ; length of the string
int 80h ; print it
mov eax, 1 ; exit
mov ebx, 1 ; return value
int 80h
The file will be called a.out if you copied my link command above. The -o option to ld changes that.
We can also call C functions, which helps if you need to write out things like numbers.
; nasm -f elf file.asm
; gcc -m32 file.o -nostdlib -lc # notice that we're using gcc to link, makes things a bit easier
; # the options are: -m32, 32 bit, -nostdlib, don't try to use the C lib cuz it will look for main()
; # and finally, -lc to add back some of the C standard library we want
BITS 32
; docs here: http://www.nasm.us/doc/nasmdoc6.html
; we declare the C functions as external symbols. the leading underscore is a C thing.
extern fopen
extern fprintf
extern fclose
section .data
; don't forget the 0 terminator if it akes a C string!
filename: db 'test.txt', 0
filemode: db 'wt', 0 ; the mode for fopen in C
format_string: db 'Hello with a number! %d is it.', 10, 0 ; new line and 0 terminator
; an error message to be printed with write(). The function doesn't
; use a C string so no need for a 0 here, but we do need length.
error_message: db 'Something went wrong.', 10 ; 10 == \n
; this next line means current location minus the error_message location
; which works out the message length.
; many of the system calls use pointer+length pairs instead of
; 0 terminated strings.
error_message_length: equ $ - error_message
fp: dd 0 ; this is like a global int variable in C
; global variables are generally a bad idea and there's other
; ways to do it, but for simplicity I'm using one here as the
; other ways are a bit more work in asm
section .text
global _start
_start:
; first, open or create the file. in C it would be:
; FILE* fp = fopen("text.txt", "wt");
; arguments for C functions are pushed on to the stack, right from left.
push filemode ; "wt"
push filename ; "text.txt"
call fopen
add esp, 8 ; we need to clean up our own stack. Since we pushed two four-byte items, we need to pop the 8 bytes back off. Alternatively, we could have called pop twice, but a single add instruction keeps our registers cleaner.
; the return value is in eax, store it in our fp variable after checking for errors
; in C: if(fp == NULL) goto error;
cmp eax, 0 ; check for null
je error
mov [fp], eax;
; call fprintf(fp, "format string with %d", 55);
; the 55 is just a random number to print
mov eax, 55
push eax ; all arguments are pushed, right to left. We want a 4 byte int equal to 55, so eax is it
push format_string
mov eax, [fp] ; again using eax as an intermediate to store our 4 bytes as we push to the stack
push eax
call fprintf
add esp, 12 ; 3 words this time to clean up
; fclose(fp);
mov eax, [fp] ; again using eax as an intermediate to store our 4 bytes as we push to the stack
push eax
call fclose
; the rest is unchanged from the above example
; and now close the program by calling exit(0);
mov eax, 1 ; exit
mov ebx, 0 ; return value
int 80h
error:
mov eax, 4 ; write
mov ebx, 1 ; write to stdout - file #1
mov ecx, error_message ; pointer to the string
mov edx, error_message_length ; length of the string
int 80h ; print it
mov eax, 1 ; exit
mov ebx, 1 ; return value
int 80h
There's a lot more that can be done here, like a few techniques to eliminate those global variables, or better error checking, or even writing a C style main() in assembly. But this should get you started in writing out a text file. Tip: Files are the same as writing to the screen, you just need to open/create them first!
BTW don't mix the system calls and the C library functions at the same time. The C library (fprintf etc) buffers data, the system calls don't. If you mix them, the data might end up written to the file in a surprising order.
The code is similar, but slightly different in 64 bit.
Finally, this same pattern can be used to translate almost any C code to asm - the C calling convention is the same with different functions, and the linux system call convention with the argument placement etc. follows a consistent pattern too.
Further reading:
http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl on the C calling convention
http://docs.cs.up.ac.za/programming/asm/derick_tut/syscalls.html on linux system calls
What is the purpose of EBP in the following code? is another SO answer I wrote up a while ago about local variables in asm - this will have hints as to one way to get rid of that global and describes how the C compile does it. (the other way to get rid of that global is to either keep the fd/fp in a register and push and pop it onto the stack when you need to free up the register for something else)
And the man pages referenced in the code for each function. From your linux prompt, do things like man 2 write or man 3 fprintf to see more. (System calls are in manual section 2 and C functions are in manual section 3).

Does int 0x80 overwrite register values? [duplicate]

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I wrote a program which is supposed to behave like a for while loop, printing a string of text a certain number of times.
Here is the code:
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
section .text
_start:
mov r8, 0x10 ; <s> put imax in r8d, this will be our 'i' </s>
; just attempt 10 iterations
_loop_entry: ; loop entry point
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
dec r8 ; decrease i and jump on not zero
cmp r8,1 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80
The problem I have is the program runs into an infinite loop. I ran it inside gdb and the cause is:
int 0x80 is called to print the message, and this works correctly, however after the interrupt finishes, the contents of r8 is set to zero, rather than the value it should be. r8 is where the counter sits, counting (down) the number of times the string is printed.
Does int 0x80 modify register values? I noticed that rax, rbx, rcx, rdx were not affected in the same way.
Test Results
Answer: YES! It does modify r8.
I have changed two things in my program. Firstly I now cmp r8, 0, to get Hello World! the correct number of times, and
I have added
mov [i], r8 ; put away i
After _loop_entry:
and also I have added
mov r8, [i] ; get i back
after the first int 0x80.
Here is my now working program. More info to come on performance against C++.
;
; main.asm
;
;
; To be used with main.asm, as a test to see if optimized c++
; code can be beaten by me, writing a for / while loop myself.
;
;
; Absolute minimum code to be competative with asm.
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
i dd 0x0 ; defines i
section .text
_start:
mov r8, 0x10 ; put imax in r8d, this will be our 'i'
_loop_entry: ; loop entry point
mov [i], r8 ; put away i
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
mov r8, [i] ; get i back
dec r8 ; decrease i and jump on not zero
cmp r8,0 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80
int 0x80 just causes a software interrupt. In your case it's being used to make a system call. Whether or not any registers are affected will depend on the particular system call you're invoking and the system call calling convention of your platform. Read your documentation for the details.
Specifically, from the System V Application Binary Interface x86-64™ Architecture Processor Supplement [PDF link], Appendix A, x86-64 Linux Kernel Conventions:
The interface between the C library and the Linux kernel is the same as for the user-level applications...
For user-level applications, r8 is a scratch register, which means it's caller-saved. If you want it to be preserved over the system call, you'll need to do it yourself.

Implementing cat>fileName command in NASM

I try to implement cat>filename command in NASM in Ubuntu 11.04 using system calls. My program is compiled successfully and run successfully (seems so). But whenever I tried to fire cat filename command it shows "No such file or directory" yet I see the file residing in the directory. And if I try to open the file by double clicking it shows me "You do not have the permissions necessary to open the file." Can you please help me to find the errors in my code?
The code is following:
section .data
msg: dd "%d",10,0
msg1: db "cat>",0
length: equ $-msg1
section .bss
a resb 100
len1 equ $-a
b resd 1
c resb 100
len2 equ $-c
section .txt
global main
main:
mov eax,4 ;;it will print cat>
mov ebx,1
mov ecx,msg1
mov edx,length
int 80h
start:
mov eax,3 ;;it will take the file name as input
mov ebx,0
mov ecx,a
mov edx,len1
int 80h
mov eax,5 ;;it will create the file by giving owner read/write/exec permission
mov ebx,a
mov ecx,0100
mov edx,1c0h
int 80h
cmp eax,0
jge inputAndWrite
jmp errorSegment
inputAndWrite:
mov [b],eax
mov eax,3 ;;take the input lines
mov ebx,0
mov ecx,c
mov edx,len2
int 80h
mov edx,eax ;;write the input lines in the file
mov eax,4
mov ebx,[b]
mov ecx,c
int 80h
jmp done
errorSegment:
jmp done
done:
mov eax, 1
xor ebx, ebx
int 80h
p.s. The above code is re-edited by taking the suggestions from RageD. Yet,the file I have created has not contain any lines of input given from "inputAndWrite" segment. I am looking for your suggestion.
Your major problem with permissions is that permissions are in octal and you have listed them in decimal. You are looking for 0700 in base 8, not base 10. So instead, you can try using 1c0h (0700 octal in hexadecimal). So the following code fix should fix your permissions problem:
;; This is file creation
mov eax, 5
mov ebx, a
mov ecx, 01h ; Edited here for completeness - forgot to update this initially (see edit)
mov edx, 1c0h
For your reference, a quick guide (maybe somewhat outdated, but for the most part correct) for linux system calls is to use the Linux System Call Table. It is extremely helpful in remembering how the registers need to be set, etc.
Another critical issue is writing to the file. I think you became a little confused on a few issues. First of all, be careful with your length variables. Assembly is done "in-line," that is, when you calculate len1, you calculate the distance between a plus everything in between a to len1. That said, your length values should look like this:
.section bss
a resb 100
len1 equ $ - a
b resd 1
c resb 100
len2 equ $ - c
Doing this should make sure that you have proper reads (although it is important to note that you are restricted by your buffer sizes here for input).
Another crucial issue I found is how you're trying to write to the file. You flipped the syscall registers.
;; Write to file
mov edx, eax ;; Amount of data to write
mov eax, 4 ;; Write syscall
mov ebx, [b] ;; File descriptor to write out to (I think this is where you stored this, I don't remember exactly)
mov ecx, c ;; Buffer to write out
From here, I would make a few more adjustments. First off, to end nicely (no segfault), I would suggest simply using exit. Unless this is in another program, ret may not always work properly (particularly if this is a standalone x86 program). The code for the exit syscall is below:
;; Exit
mov eax, 1 ;; Exit is syscall 1
xor ebx, ebx ;; This is the return value
int 80h ;; Interrupt
Also, as for cleanliness, I assume you are taking input buffered by a newline. If this is the case, I would suggest stripping away the newline character after the filename. The simplest way to do this is to simply null-terminate after the last character (which will be new line). So, after reading input for the filename, I would place some code similar to this:
;; Null-terminate the last character - this assumes it directly follows the read call
;; and so the contents of eax are the amount of bytes read
mov ebx, eax ;; How many bytes read (or offset to current null-terminator)
sub ebx, 1 ;; Offset in array to the last valid character
add ebx, a ;; Add the memory address (i.e. in C this looks like a[OFFSET])
mov BYTE [ebx], 0 ;; Null-terminated
Finally, it is polite in larger projects to close your file descriptors when you're done. It may not be necessary here since you are immediately exiting, but that would look something like:
;; Close fd
mov eax, 6 ;; close() is syscall 6
mov ebx, [b] ;; File descriptor to close
int 80h
EDIT
Sorry, I missed the writing issue. You are opening your file with value 100. What you want is 1 for O_RDWR (read and write capabilities). Also, you may want to consider simply using the sync system call (syscall number 0x24 with no arguments) to make sure your buffers get properly flushed; however, in my tests this was unnecessary since the line-feed to enter the data should technically do this, I believe. So the update bit of code to open the file properly should look like this:
; Open file
mov eax, 5
mov ebx, a
mov ecx, 01h
mov edx, 1c0h
int 80h
Hope this helps. Good luck!

Resources