assembly code unexpectedly printing .shstrtab.text.data - linux

i am very new to assembly although i have lots of c and c++ experience.
my assembly code is supposed to print hello world like all first programs in a new language.
it prints out hello world but also prints out some extra text:
hello world!
.shstrtab.text.data
and here is my assembly program:
section .text
global _start ;for the linker
_start:
mov edx, length ; message length
mov ecx, message ; message to write
mov ebx, 1 ; file descriptor stdout
mov eax, 4 ; system call number
int 0x80 ; call kernel
mov eax, 1 ;system call number for sys_exit to exit program
int 0x80 ; call kernel
section .data
message db "hello world!"
length DD 10
if you know how to fix this also explain why is this happening.
thanks.
extra info: i am using nasm assembler with ld linker

so the problem is in adding length as it gives the address of length variable but not the value. the answer is to use move edx, [length]. thanks to Jester for pointing me that out

length equ $ - message | instead of length dd 10

Related

Why is 64-bit NASM insisting on the RSI register ? Why can't I put "hello world" into RCX register and use SYSCALL?

I have this x86 assembly code for a "hello world" program.
global _start
section .text
_start:
mov eax, 1 ; system call for write
mov ebx, 1 ; file handle 1 is stdout
mov ecx, message ; address of string to output
mov edx, message_len ; length of the string
syscall ; invoke operating system to do the write
mov eax, 60 ; system call for exit
mov ebx, 0 ; exit code 0
syscall ; invoke operating system to ex
section .data
message: db "Hello, World!!!!", 10 ; newline at the end
message_len equ $-message ; length of the string
This doesn't compile with nasm -felf64 hello.asm && ld hello.o && ./a.out on a 64-bit Linux machine.
But if I change the third line mov ecx, message to mov rsi, message it works!
My question is why is 64-bit NASM insisting on the RSI register? Because I have seen people compiling with ECX on 32-bit Arch Linux.
x86 does not use the same calling convention as x64.
In x86, the first argument is EBX which contains the descriptor, ECX contains the buffer, EDX contains the length and EAX contains the system call ordinal.
In x64, the first argument is contained in RDI, second in RSI, third in RDX and fourth in RCX while RAX contains the ordinal for the system call.
That's why your call is working on x86 but needs to be adjusted to work on x64 as well.

linux syscall uname for x86

I only study assembler (nasm) and have more question. For example i want make asm code that get info about operating system. I use linux 86 bit. In a code i use syscall uname. In a browser have more information about this syscall and code. I found this link:
https://github.com/hc0d3r/asm/blob/master/uname.asm
Uname syscall in buffer overflow
But i use 86 bit system. So, i tried rewrite code for my system. I understand, that in register eax i should move value of syscall (0x7a or 122) and in register ebx addres of array.
I used first link as example, but get error. So, can you help me decide this problem?
This is my main code:
extern printf
SYS_WRITE equ 4
SYS_UNAME equ 122
SYS_EXIT equ 60
STDOUT equ 1
section .data
str: db '%s',10,0
UTSNAME_SIZE equ 65
space db ' '
break_line db 0xa
section .bss
uname_res resb UTSNAME_SIZE*5
section .text
global main
main:
mov eax, 0x7A
mov ebx, uname_res
int 80h
push dword [uname_res]
push dword str
call printf
mov eax, 1
int 80h
and I got this error:
segmentation error (memory dump made)
This mistake on printf. Sorry for my crooked english
I wrote code for linux x86. Look it here (maybe will be useful)
https://github.com/OlegInfoSecurity/uname_x86
This error occurred when i output (print) info. I changed code for output info and program is work.

Joining strings from registers and printing them (CPUID)

Starting to learn NASM assembly, I was looking at some assembly questions here in Stack Overflow and found this one here:
Concatenating strings from registers and printing them
I believe that this question is not duplicated because I am trying to
replicate the code in NASM and also things were not very clear in the
other question.
I decided to replicate this code in NASM, but I did not quite understand the MASM code in question.
I learned about CPUID and did some testing programs.
In order, I'd like to know how we can concatenate registers and then print them on the screen USING NASM.
I want to print 'ebx' + 'edx' + 'ecx' because this is how the CPUID output is organized by what I see in GDB.
I called CPUID with eax=1
"String" is not a very precise term. The Vendor Identification String of CPUID/EAX=0 contains only 12 ASCII characters, packed into 3 DWORD registers. There is no termination character like in C nor a length information like in PASCAL. But it's always the same registers and it's always 3*4=12 bytes. This is ideal for the write-syscall:
section .bss
buff resb 12
section .text
global _start
_start:
mov eax, 0
cpuid
mov dword [buff+0], ebx ; Fill the first four bytes
mov dword [buff+4], edx ; Fill the second four bytes
mov dword [buff+8], ecx ; Fill the third four bytes
mov eax, 4 ; SYSCALL write
mov ebx, 1 ; File descriptor = STDOUT
mov ecx, buff ; Pointer to ASCII string
mov edx, 12 ; Count of bytes to send
int 0x80 ; Call Linux kernel
mov eax, 1 ; SYSCALL exit
mov ebx, 0 ; Exit Code
int 80h ; Call Linux kernel

How to open a file in assembler and modify it?

I'm starting to learn Assembler and I'm working in Unix. I want to open a file and write 'Hello world' on it.
section .data
textoutput db 'Hello world!', 10
lentext equ $ - textoutput
filetoopen db 'hi.txt'
section .text
global _start
_start:
mov eax, 5 ;open
mov ebx, filetoopen
mov ecx, 2 ;read and write mode
int 80h
mov eax, 4
mov ebx, filetoopen ;I'm not sure what do i have to put here, what is the "file descriptor"?
mov ecx, textoutput
mov edx, lentext
mov eax, 1
mov ebx, 0
int 80h ; finish without errors
But when I compile it, it doesn't do anything. What am I doing wrong?
When I open a file where does the file descriptor value return to?
This is x86 Linux (x86 is not the only assembly language, and Linux is not the only Unix!)...
section .data
textoutput db 'Hello world!', 10
lentext equ $ - textoutput
filetoopen db 'hi.txt'
The filename string requires a 0-byte terminator: filetoopen db 'hi.txt', 0
section .text
global _start
_start:
mov eax, 5 ;open
mov ebx, filetoopen
mov ecx, 2 ;read and write mode
2 is the O_RDWR flag for the open syscall. If you want the file to be created if it doesn't already exist, you will need the O_CREAT flag as well; and if you specify O_CREAT, you need a third argument which is the permissions mode for the file. If you poke around in the C headers, you'll find that O_CREAT is defined as 0100 - beware of the leading zero: this is an octal constant! You can write octal constants in nasm using the o suffix.
So you need something like mov ecx, 0102o to get the right flags and mov edx, 0666o to set the permssions.
int 80h
The return code from a syscall is passed in eax. Here, this will be the file descriptor (if the open succeeded) or a small negative number, which is a negative errno code (e.g. -1 for EPERM). Note that the convention for returning error codes from a raw syscall is not quite the same as the C syscall wrappers (which generally return -1 and set errno in the case of an error)...
mov eax, 4
mov ebx, filetoopen ;I'm not sure what do i have to put here, what is the "file descriptor"?
...so here you need to mov ebx, eax first (to save the open result before eax is overwritten) then mov eax, 4. (You might want to think about checking that the result was positive first, and handling the failure to open in some way if it isn't.)
mov ecx, textoutput
mov edx, lentext
Missing int 80h here.
mov eax, 1
mov ebx, 0
int 80h ; finish without errors
Did you read the Linux Assembly HOWTO? It covers your question.
You can also compile some C code with gcc -S -fverbose-asm -O1 and look at the generated assembly. For example, with foo.c, run gcc -S -Wall -fverbose-asm -O1 foo.c (as a command in some terminal) then look (using some editor -perhaps GNU emacs) into the generated foo.s assembler file.
At last, I don't think it is worth bothering a lot about assembler. In 2020, a recent GCC compiler will surely generate better code than what you could write (if you invoke it with optimizations, at least -O2). See this draft report for more.
This is a x64 Linux sample
; Program to open and write to file
; Compile with:
; nasm -f elf64 -o writeToFile64.o writeToFile64.asm
; Link with:
; ld -m elf_x86_64 -o writeToFile64 writeToFile64.o
; Run with:
; ./writeToFile64
;==============================================================================
; Author : Rommel Samanez
;==============================================================================
global _start
%include 'basicFunctions.asm'
section .data
fileName: db "testFile.txt",0
fileFlags: dq 0102o ; create file + read and write mode
fileMode: dq 00600o ; user has read write permission
fileDescriptor: dq 0
section .rodata ; read only data section
msg1: db "Write this message to the test File.",0ah,0
msglen equ $ - msg1
msg2: db "File Descriptor=",0
section .text
_start:
mov rax,2 ; sys_open
mov rdi,fileName ; const char *filename
mov rsi,[fileFlags] ; int flags
mov rdx,[fileMode] ; int mode
syscall
mov [fileDescriptor],rax
mov rsi,msg2
call print
mov rax,[fileDescriptor]
call printnumber
call printnewline
; write a message to the created file
mov rax,1 ; sys_write
mov rdi,[fileDescriptor]
mov rsi,msg1
mov rdx,msglen
syscall
; close file Descriptor
mov rax,3 ; sys_close
mov rdi,[fileDescriptor]
syscall
call exit
It depends what assembler you are using and if you expect to be using the C runtime or not. In this case which appears to be the Hello World text example from rosettacode they are using nasm. Given you have a _start field you are not needing the C runtime so you assemble this to an elf object file and link it into a program:
nasm -felf hello.asm
ld hello.o -o hello
Now you can run the hello program.
A slightly more portable example that uses the C runtime to do the work rather than linux syscalls might look like the sample below. If you link this as described it can use printf to do the printing.
;;; helloworld.asm -
;;;
;;; NASM code for Windows using the C runtime library
;;;
;;; For windows - change printf to _printf and then:
;;; nasm -fwin32 helloworld.asm
;;; link -subsystem:console -out:helloworld.exe -nodefaultlib -entry:main
;;; helloworld.obj msvcrt.lib
;;; For gcc (linux, unix etc):
;;; nasm -felf helloworld.asm
;;; gcc -o helloworld helloworld.o
extern printf
section .data
message:
db 'Hello, World', 10, 0
section .text
global main
main:
push dword message ; push function parameters
call printf ; call C library function
add esp, 4 ; clean up the stack
mov eax, 0 ; exit code 0
ret
For information about file descriptors - read the open(2) manual page or look at wikipedia. It is how posix refers to an open i/o stream. In your case, stdout.

NASM Length of an Argument

I am writing a simple program to display a name supplied by the user. The result is that I should be able to enter the command and get the expected result.
Command
./hello John
Result
Hello, John.
Yet when the program gets around to displaying the name, it doesn't. I believe it has something to do with calculating the length of the argument. May you guys please take a look at my code and tell me what you think?
; hello.asm
;
; Assemble: nasm -f elf hello.asm
; Link: ld -o hello hello.o
; Run: ./hello <name>
section .data
period: db ".", 10
periodLen: equ $-period
helloMsg: db "Hello, "
helloMsgLen: equ $-helloMsg
usageMsg: db "Usage: hello <name>", 10
usageMsgLen: equ $-usageMsg
section .text
global _start
_start:
pop eax ; Get number of arguments
cmp eax, 2 ; If one argument
jne _help ; Not equal, show help + exit
mov eax, 4 ; System call to write
mov ebx, 1 ; Write to console
mov ecx, helloMsg ; Display "Hello, "
mov edx, helloMsgLen ; Length of hello message
int 80h
mov eax, 4 ; System call to write
mov ebx, 1 ; Write to console
pop ecx ; Get program name
pop ecx ; Get name
mov edx, $ ; Beginning of line
sub edx, ecx ; Get length of name
int 80h
mov eax, 4 ; System call to write
mov ebx, 1 ; Write to console
mov ecx, period ; Display a period
mov edx, periodLen ; Length of period
int 80h
mov eax, 1 ; System call to exit
mov ebx, 0 ; No errors
int 80h
_help:
mov eax, 4 ; System call to write
mov ebx, 1 ; Write to console
mov ecx, usageMsg ; Display usage message
mov edx, usageMsgLen ; Length of usage message
int 80h
mov eax, 1 ; System call to exit
mov ebx, 0 ; No errors
int 80h
Ok, since you never used a debugger, I'll show you how. First, compile with nasm -f elf -g hello.asm. The -g switch helps the debugger, this way you can set breakpoints etc. Now start it typing gdb ./hello -q and type break 34. This tells gdb to stop at line 34. Run the program (type run emi (emi is my name :P)). You should see something like this:
blackbear#blackbear-laptop:~$ gdb ./hello -q
Reading symbols from /home/blackbear/hello...done.
(gdb) break 34
Breakpoint 1 at 0x80480a9: file hello.asm, line 34.
(gdb) run emi
Starting program: /home/blackbear/hello emi
Hello,
Breakpoint 1, _start () at hello.asm:34
34 pop ecx ; Get name
(gdb)
Ok, let's see what ecx is, typing display (char *) $ecx:
(gdb) display (char *) $ecx
1: (char *) $ecx = 0xbffff63e "/home/blackbear/hello"
You can use step to continue by one instruction:
(gdb) step
35 mov edx, $ ; Beginning of line
1: (char *) $ecx = 0xbffff654 "emi"
Ok, here we are. ecx points to my name, so the problem isn't here. Now we don't need to watch ecx anymore, so using undisplay gdb won't show it anymore. But we need to check edx:
(gdb) undisplay
Delete all auto-display expressions? (y or n) y
(gdb) display $edx
2: $edx = 7
(gdb) step
36 sub edx, ecx ; Get length of name
2: $edx = 134512810
(gdb) step
37 int 80h
2: $edx = 1208257110
Mmh, guess you didn't expect this, right? :) The problem seems to be here: mov edx, $. I don't get that $ (never used NASM), could you please explain?
EDIT
Ok got it. You misunderstood what the tutorial said. The $ represents the current location of it, in fact:
36 sub edx, ecx ; Get length of name
11: $edx = 134512810
(gdb) display (void *) $edx
12: (void *) $edx = (void *) 0x80480aa
(gdb) display (void *) $eip
13: (void *) $eip = (void *) 0x80480af
now edx contains the address of the instruction mov edx, $. which is 5 bytes long (opcode (1 byte) + address (4 bytes)), that's why eip - edx = 5.
In order to get the length of the argument your only way is to use something like strlen(), but I can't help you here, NASM isn't my assembler. :)

Resources