How to open a file in assembler and modify it? - linux

I'm starting to learn Assembler and I'm working in Unix. I want to open a file and write 'Hello world' on it.
section .data
textoutput db 'Hello world!', 10
lentext equ $ - textoutput
filetoopen db 'hi.txt'
section .text
global _start
_start:
mov eax, 5 ;open
mov ebx, filetoopen
mov ecx, 2 ;read and write mode
int 80h
mov eax, 4
mov ebx, filetoopen ;I'm not sure what do i have to put here, what is the "file descriptor"?
mov ecx, textoutput
mov edx, lentext
mov eax, 1
mov ebx, 0
int 80h ; finish without errors
But when I compile it, it doesn't do anything. What am I doing wrong?
When I open a file where does the file descriptor value return to?

This is x86 Linux (x86 is not the only assembly language, and Linux is not the only Unix!)...
section .data
textoutput db 'Hello world!', 10
lentext equ $ - textoutput
filetoopen db 'hi.txt'
The filename string requires a 0-byte terminator: filetoopen db 'hi.txt', 0
section .text
global _start
_start:
mov eax, 5 ;open
mov ebx, filetoopen
mov ecx, 2 ;read and write mode
2 is the O_RDWR flag for the open syscall. If you want the file to be created if it doesn't already exist, you will need the O_CREAT flag as well; and if you specify O_CREAT, you need a third argument which is the permissions mode for the file. If you poke around in the C headers, you'll find that O_CREAT is defined as 0100 - beware of the leading zero: this is an octal constant! You can write octal constants in nasm using the o suffix.
So you need something like mov ecx, 0102o to get the right flags and mov edx, 0666o to set the permssions.
int 80h
The return code from a syscall is passed in eax. Here, this will be the file descriptor (if the open succeeded) or a small negative number, which is a negative errno code (e.g. -1 for EPERM). Note that the convention for returning error codes from a raw syscall is not quite the same as the C syscall wrappers (which generally return -1 and set errno in the case of an error)...
mov eax, 4
mov ebx, filetoopen ;I'm not sure what do i have to put here, what is the "file descriptor"?
...so here you need to mov ebx, eax first (to save the open result before eax is overwritten) then mov eax, 4. (You might want to think about checking that the result was positive first, and handling the failure to open in some way if it isn't.)
mov ecx, textoutput
mov edx, lentext
Missing int 80h here.
mov eax, 1
mov ebx, 0
int 80h ; finish without errors

Did you read the Linux Assembly HOWTO? It covers your question.
You can also compile some C code with gcc -S -fverbose-asm -O1 and look at the generated assembly. For example, with foo.c, run gcc -S -Wall -fverbose-asm -O1 foo.c (as a command in some terminal) then look (using some editor -perhaps GNU emacs) into the generated foo.s assembler file.
At last, I don't think it is worth bothering a lot about assembler. In 2020, a recent GCC compiler will surely generate better code than what you could write (if you invoke it with optimizations, at least -O2). See this draft report for more.

This is a x64 Linux sample
; Program to open and write to file
; Compile with:
; nasm -f elf64 -o writeToFile64.o writeToFile64.asm
; Link with:
; ld -m elf_x86_64 -o writeToFile64 writeToFile64.o
; Run with:
; ./writeToFile64
;==============================================================================
; Author : Rommel Samanez
;==============================================================================
global _start
%include 'basicFunctions.asm'
section .data
fileName: db "testFile.txt",0
fileFlags: dq 0102o ; create file + read and write mode
fileMode: dq 00600o ; user has read write permission
fileDescriptor: dq 0
section .rodata ; read only data section
msg1: db "Write this message to the test File.",0ah,0
msglen equ $ - msg1
msg2: db "File Descriptor=",0
section .text
_start:
mov rax,2 ; sys_open
mov rdi,fileName ; const char *filename
mov rsi,[fileFlags] ; int flags
mov rdx,[fileMode] ; int mode
syscall
mov [fileDescriptor],rax
mov rsi,msg2
call print
mov rax,[fileDescriptor]
call printnumber
call printnewline
; write a message to the created file
mov rax,1 ; sys_write
mov rdi,[fileDescriptor]
mov rsi,msg1
mov rdx,msglen
syscall
; close file Descriptor
mov rax,3 ; sys_close
mov rdi,[fileDescriptor]
syscall
call exit

It depends what assembler you are using and if you expect to be using the C runtime or not. In this case which appears to be the Hello World text example from rosettacode they are using nasm. Given you have a _start field you are not needing the C runtime so you assemble this to an elf object file and link it into a program:
nasm -felf hello.asm
ld hello.o -o hello
Now you can run the hello program.
A slightly more portable example that uses the C runtime to do the work rather than linux syscalls might look like the sample below. If you link this as described it can use printf to do the printing.
;;; helloworld.asm -
;;;
;;; NASM code for Windows using the C runtime library
;;;
;;; For windows - change printf to _printf and then:
;;; nasm -fwin32 helloworld.asm
;;; link -subsystem:console -out:helloworld.exe -nodefaultlib -entry:main
;;; helloworld.obj msvcrt.lib
;;; For gcc (linux, unix etc):
;;; nasm -felf helloworld.asm
;;; gcc -o helloworld helloworld.o
extern printf
section .data
message:
db 'Hello, World', 10, 0
section .text
global main
main:
push dword message ; push function parameters
call printf ; call C library function
add esp, 4 ; clean up the stack
mov eax, 0 ; exit code 0
ret
For information about file descriptors - read the open(2) manual page or look at wikipedia. It is how posix refers to an open i/o stream. In your case, stdout.

Related

Why is 64-bit NASM insisting on the RSI register ? Why can't I put "hello world" into RCX register and use SYSCALL?

I have this x86 assembly code for a "hello world" program.
global _start
section .text
_start:
mov eax, 1 ; system call for write
mov ebx, 1 ; file handle 1 is stdout
mov ecx, message ; address of string to output
mov edx, message_len ; length of the string
syscall ; invoke operating system to do the write
mov eax, 60 ; system call for exit
mov ebx, 0 ; exit code 0
syscall ; invoke operating system to ex
section .data
message: db "Hello, World!!!!", 10 ; newline at the end
message_len equ $-message ; length of the string
This doesn't compile with nasm -felf64 hello.asm && ld hello.o && ./a.out on a 64-bit Linux machine.
But if I change the third line mov ecx, message to mov rsi, message it works!
My question is why is 64-bit NASM insisting on the RSI register? Because I have seen people compiling with ECX on 32-bit Arch Linux.
x86 does not use the same calling convention as x64.
In x86, the first argument is EBX which contains the descriptor, ECX contains the buffer, EDX contains the length and EAX contains the system call ordinal.
In x64, the first argument is contained in RDI, second in RSI, third in RDX and fourth in RCX while RAX contains the ordinal for the system call.
That's why your call is working on x86 but needs to be adjusted to work on x64 as well.

assembly code unexpectedly printing .shstrtab.text.data

i am very new to assembly although i have lots of c and c++ experience.
my assembly code is supposed to print hello world like all first programs in a new language.
it prints out hello world but also prints out some extra text:
hello world!
.shstrtab.text.data
and here is my assembly program:
section .text
global _start ;for the linker
_start:
mov edx, length ; message length
mov ecx, message ; message to write
mov ebx, 1 ; file descriptor stdout
mov eax, 4 ; system call number
int 0x80 ; call kernel
mov eax, 1 ;system call number for sys_exit to exit program
int 0x80 ; call kernel
section .data
message db "hello world!"
length DD 10
if you know how to fix this also explain why is this happening.
thanks.
extra info: i am using nasm assembler with ld linker
so the problem is in adding length as it gives the address of length variable but not the value. the answer is to use move edx, [length]. thanks to Jester for pointing me that out
length equ $ - message | instead of length dd 10

Segmentation Fault on simple ASM code

For my Question when I tried to create a example of NASM under ubuntu 64-bit version and execute it after assembled and linked into ELF. It return error messages as below when I execute
NASM -f elf64 -o firstasm.o firstasm.asm
ld -o firstasm firstasm.o
firstasm
Segmentation fault (core dumped)
My NASM code would be below where I tried to perform simple write() and exit() function
section .data ;Data segment
msg db "This line is test", 0x0a
section .text ;text segment
global _start ;Default entry point for ELF linking
_start:
; SYSCALL : write (1,msg,14)
xor rax,rax
xor rbx,rbx
xor rcx,rcx
xor rdx,rdx
mov rax,64 ; make a syscall write 4
mov rbx,1 ; put 1 into rbx and also stdout is 1
mov rcx,msg ;put address of string in rcx
mov rdx,19 ; put length of string into rdx
int 0x80 ; call kernel to made syscall
; SYSCALL : exit(0)
xor rax,rax
xor rbx,rbx
mov rax,93 ; make a syscall exit 93
mov rbx, 0 ; store 0 argument into rbx, success to exit
int 0x80
Can someone pointed me what is problem to my NASM code and suggestions to fix the problem of "Segmentation fault (core dumped)". Appreciate thanks to anyone could help.
Uh, where are you getting the system call numbers? Are you pulling them out of the air?
64bit sys_exit = 60
32bit sys_exit = 1
64bit sys_write = 1
32bit sys_write = 4
Linux 64-bit System Call List
Linux 32-bit System Call List
Linux System Call Table for x86_64
The above link will show what registers are used for what.
the 32 bit system call - int 0x80 does not use the 64bit registers and the register parameters are different. The 64 bit system call is - syscall.
32 bit sys_exit:
mov ebx, ERR_CODE
mov eax, sys_exit ; 1
int 80h
64 bit sys_exit:
mov rdi, ERR_CODE
mov rax, sys_exit ; 60
syscall
see the difference?
if you want to create an inc file of the system call names and numbers for YOUR system (maybe they are different for some reason)
grep __NR /usr/include/asm/unistd_64.h | grep define | sed -e 's/\#/\%/' -e 's/__NR_/sys_/' > unistd_64.inc
of course, adjust the path to unistd_64.h for your system. It will be the same for 32 bit but the file is called unistd_32.h I believe.
Now that I showed you the difference between the exit sys call, and with the provided links, you can fix your write system call to be correct.

nasm,86_64,linux,"hello world" program. when link ,it says "relocation truncated to fit"

[section .data]
strHello db "Hello World"
STRLEN equ $-strHello
MessageLength equ 9
Message db "hi!!!! "
[section .text]
global main
main:
mov edx,STRLEN;
mov ecx,strHello;
mov ebx,1
mov eax,4
int 0x80
call DispStr
mov ebx,0
mov eax,1
int 0x80
DispStr:
mov ax,MessageLength
mov dh,0
mul dh
add ax,Message
mov bp,ax
mov ax,ds
mov es,ax
mov cx,MessageLength
mov ax,01301h
mov bx,0007h
mov dl,0
int 10h
ret
Compile and run:
$ nasm -f elf64 helloworld.asm -o helloworld.o
$ gcc -s -o helloworld helloworld.o
helloworld.o: In function `DispStr':
helloworld.asm:(.text+0x31): relocation truncated to fit: R_X86_64_16 against `.data'
collect2: ld return 1
This exact error happens because at:
add ax,Message
ax is only 16-bit wide, but Message is a 64-bit wide address, so it won't fit during relocation.
I have explained this error in detail at: https://stackoverflow.com/a/32639540/895245
The solution in this case is to use a linker script as mentioned at: Using .org directive with data in .data section: In connection with ld
This repository contains working examples of boot sectors and BIOS: https://github.com/cirosantilli/x86-bare-metal-examples/tree/d217b180be4220a0b4a453f31275d38e697a99e0
Since you're in 64-bit mode, you won't be able to use BIOS functions (i.e. the int 10h instruction). Even if you could, BIOS uses a different addressing mechanism, so attempting to use the address of Message wouldn't work anyway.
Also, wouldn't the first 3 lines of the DispStr function zero out ax? (since you're multiplying by dh, which was just set to zero)

nasm calling subroutine from another file

I'm doing a project that attaches a subroutine that I wrote to a main file included by the teacher. He gave us the instructions for making our subroutine global but apparently I'm an idiot. The two asm files are in the same folder, I'm using nasm -f elf -g prt_dec.asm and ld prt_dec and then doing the same for main.asm. Here's the relevant code in the main.asm:
SECTION .text ; Code section.
global _start ; let loader see entry point
extern prt_dec
_start:
mov ebx, 17
mov edx, 214123
mov edi, 2223187809
mov ebp, 1555544444
mov eax, dword 0x0
call prt_dec
call prt_lf
The line call prt_dec throws "undefined reference to prt_dec" when i use ld main.o
Here's the a code segment from my prt_dec.asm:
Section .text
global prt_dec
global _start
start:
prt_dec:
(pushing some stuff)
L1_top:
(code continues)
You want to call a routine in another asm file or object file?
if you are Assembling prt_dec.asm and are linking multiple asm files to use in your main program, here is a sample, 2 asm files Assembled and linked together... * NOTE * hello.asm *DOES NOT * have a start label!
Main asm file: hellothere.asm
sys_exit equ 1
extern Hello
global _start
section .text
_start:
call Hello
mov eax, sys_exit
xor ebx, ebx
int 80H
Second asm file: hello.asm
sys_write equ 4
stdout equ 1
global Hello
section .data
szHello db "Hello", 10
Hello_Len equ ($ - szHello)
section .text
Hello:
mov edx, Hello_Len
mov ecx, szHello
mov eax, sys_write
mov ebx, stdout
int 80H
ret
makefile:
APP = hellothere
$(APP): $(APP).o hello.o
ld -o $(APP) $(APP).o hello.o
$(APP).o: $(APP).asm
nasm -f elf $(APP).asm
hello.o: hello.asm
nasm -f elf hello.asm
Now, if you just want to separate your code into multiple asm files, you can include them into your main source: with %include "asmfile.asm" at the beginning of your main source file and just assemble and link your main file.

Resources