Does int 0x80 overwrite register values? [duplicate] - linux

This question already has an answer here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
Closed 4 years ago.
I wrote a program which is supposed to behave like a for while loop, printing a string of text a certain number of times.
Here is the code:
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
section .text
_start:
mov r8, 0x10 ; <s> put imax in r8d, this will be our 'i' </s>
; just attempt 10 iterations
_loop_entry: ; loop entry point
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
dec r8 ; decrease i and jump on not zero
cmp r8,1 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80
The problem I have is the program runs into an infinite loop. I ran it inside gdb and the cause is:
int 0x80 is called to print the message, and this works correctly, however after the interrupt finishes, the contents of r8 is set to zero, rather than the value it should be. r8 is where the counter sits, counting (down) the number of times the string is printed.
Does int 0x80 modify register values? I noticed that rax, rbx, rcx, rdx were not affected in the same way.
Test Results
Answer: YES! It does modify r8.
I have changed two things in my program. Firstly I now cmp r8, 0, to get Hello World! the correct number of times, and
I have added
mov [i], r8 ; put away i
After _loop_entry:
and also I have added
mov r8, [i] ; get i back
after the first int 0x80.
Here is my now working program. More info to come on performance against C++.
;
; main.asm
;
;
; To be used with main.asm, as a test to see if optimized c++
; code can be beaten by me, writing a for / while loop myself.
;
;
; Absolute minimum code to be competative with asm.
global _start
section .data
msg db "Hello World!",10 ; define the message
msgl equ $ - msg ; define message length
; use minimal size of storage space
imax dd 0x00001000 ; defines imax to be big!
i dd 0x0 ; defines i
section .text
_start:
mov r8, 0x10 ; put imax in r8d, this will be our 'i'
_loop_entry: ; loop entry point
mov [i], r8 ; put away i
mov eax, 4 ; setup the message to print
mov ebx, 1 ; write, stdout, message, length
mov ecx, msg
mov edx, msgl
int 0x80 ; print message
; this is valid because registers do not change
mov r8, [i] ; get i back
dec r8 ; decrease i and jump on not zero
cmp r8,0 ; compare values to jump
jnz _loop_entry
mov rax, 1 ; exit with zero
mov rbx, 0
int 0x80

int 0x80 just causes a software interrupt. In your case it's being used to make a system call. Whether or not any registers are affected will depend on the particular system call you're invoking and the system call calling convention of your platform. Read your documentation for the details.
Specifically, from the System V Application Binary Interface x86-64™ Architecture Processor Supplement [PDF link], Appendix A, x86-64 Linux Kernel Conventions:
The interface between the C library and the Linux kernel is the same as for the user-level applications...
For user-level applications, r8 is a scratch register, which means it's caller-saved. If you want it to be preserved over the system call, you'll need to do it yourself.

Related

Joining strings from registers and printing them (CPUID)

Starting to learn NASM assembly, I was looking at some assembly questions here in Stack Overflow and found this one here:
Concatenating strings from registers and printing them
I believe that this question is not duplicated because I am trying to
replicate the code in NASM and also things were not very clear in the
other question.
I decided to replicate this code in NASM, but I did not quite understand the MASM code in question.
I learned about CPUID and did some testing programs.
In order, I'd like to know how we can concatenate registers and then print them on the screen USING NASM.
I want to print 'ebx' + 'edx' + 'ecx' because this is how the CPUID output is organized by what I see in GDB.
I called CPUID with eax=1
"String" is not a very precise term. The Vendor Identification String of CPUID/EAX=0 contains only 12 ASCII characters, packed into 3 DWORD registers. There is no termination character like in C nor a length information like in PASCAL. But it's always the same registers and it's always 3*4=12 bytes. This is ideal for the write-syscall:
section .bss
buff resb 12
section .text
global _start
_start:
mov eax, 0
cpuid
mov dword [buff+0], ebx ; Fill the first four bytes
mov dword [buff+4], edx ; Fill the second four bytes
mov dword [buff+8], ecx ; Fill the third four bytes
mov eax, 4 ; SYSCALL write
mov ebx, 1 ; File descriptor = STDOUT
mov ecx, buff ; Pointer to ASCII string
mov edx, 12 ; Count of bytes to send
int 0x80 ; Call Linux kernel
mov eax, 1 ; SYSCALL exit
mov ebx, 0 ; Exit Code
int 80h ; Call Linux kernel

Display contents of register

hi i need help displaying contents of a register.my code is below.i have been able to display values of the data register but i want to display flag states. eg 1 or 0. and it would be helpful if to display also the contents of other registers like esi,ebp.
my code is not printing the states of the flags ..what am i missing
section .text
global _start ;must be declared for using gcc
_start : ;tell linker entry point
mov eax,msg ; moves message "rubi" to eax register
mov [reg],eax ; moves message from eax to reg variable
mov edx, 8 ;message length
mov ecx, [reg];message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax, 100
mov ebx, 100
cmp ebx,eax
pushf
pop dword eax
mov [save_flags],eax
mov edx, 8 ;message length
mov ecx,[save_flags] ;message to write
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80
mov eax, 1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db "rubi",10
section .bss
reg resb 100
save_flags resw 100
I'm not going for anything fancy here since this appears to be a homework assignment (two people have asked the same question today). This code should be made as a function, and it can have its performance enhanced. Since I don't get an honorary degree or an A in the class it doesn't make sense to me to offer the best solution, but one you can work from:
BITS_TO_DISPLAY equ 32 ; Number of least significant bits to display (1-32)
section .text
global _start ; must be declared for using gcc
_start : ; tell linker entry point
mov edx, msg_len ; message length
mov ecx, msg ; message to write
mov ebx, 1 ; file descriptor (stdout)
mov eax, 4 ; system call number (sys_write)
int 0x80 ; call kernel
mov eax, 100
mov ebx, 100
cmp ebx,eax
pushf
pop dword eax
; Convert binary to string by shifting the right most bit off EAX into
; the carry flag (CF) and convert the bit into a '0' or '1' and place
; in the save_flags buffer in reverse order. Nul terminate the string
; in the event you ever wish to use printf to print it
mov ecx, BITS_TO_DISPLAY ; Number of bits of EAX register to display
mov byte [save_flags+ecx], 0 ; Nul terminate binary string in case we use printf
bin2ascii:
xor bl, bl ; BL = 0
shr eax, 1 ; Shift right most bit into carry flag
adc bl, '0' ; bl = bl + '0' + Carry Flag
mov [save_flags-1+ecx], bl ; Place '0'/'1' into string buffer in reverse order
dec ecx
jnz bin2ascii ; Loop until all bits processed
mov edx, BITS_TO_DISPLAY ; message length
mov ecx, save_flags ; address of binary string to write
mov ebx, 1 ; file descriptor (stdout)
mov eax, 4 ; system call number (sys_write)
int 0x80
mov eax, 1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db "rubi",10
msg_len equ $ - msg
section .bss
save_flags resb BITS_TO_DISPLAY+1 ; Add one byte for nul terminator in case we use printf
The idea behind this code is that we continually shift the bits (using the SHR instruction) in the EAX register to the right one bit at a time. The bit that gets shifted out of the register gets placed in the carry flag (CF). We can use ADC to add the value of the carry flag (0/1) to ASCII '0' to get an ASCII value of '0` and '1'. We place these bytes into destination buffer in reverse order since we are moving from right to left through the bits.
BITS_TO_DISPLAY can be set between 1 and 32 (since this is 32-bit code). If you are interested in the bottom 8 bits of a register set it to 8. If you want to display all the bits of a 32-bit register, specify 32.
Note that you can pop directly into memory.
And if you want to binary dump register and flag data with write(2), your system call needs to pass a pointer to the buffer, not the data itself. Use a mov-immediate to get the address into the register, rather than doing a load. Or lea to use a RIP-relative addressing mode. Or pass a pointer to where it's sitting on the stack, instead of copying it to a global!
mov edx, 8 ;message length
mov ecx,[save_flags] ;message to write ;;;;;;; <<<--- problem
mov ebx, 1 ;file descriptor (stdout)
mov eax, 4 ;system call number (sys_write)
int 0x80
Passing a bad address to write(2) won't cause your program to receive a SIGSEGV, like it would if you used that address in user-space. Instead, write will return EFAULT. And you're not checking the return status from your system calls, so your code doesn't notice.
mov eax,msg ; moves message "rubi" to eax register
mov [reg],eax ; moves message from eax to reg variable
mov ecx, [reg];
This is silly. You should just mov ecx, msg to get the address of msg into ecx, rather than bouncing it through memory.
Are you building for 64bit? I see you're using 8 bytes for a message length. If so, you should be using the 64bit function call ABI (with syscall, not int 0x80). The system-call numbers are different. See the table in one of the links at x86. The 32bit ABI can only accept 32bit pointers. You will have a problem if you try to pass a pointer that has any of the high32 bits set.
You're probably also going to want to format the number into a string, unless you want to pipe your program's output into hexdump.

NASM get console size

I am new to NASM (and assembler in general) and I am looking for way to get console size (number of console cols and rows) in NASM. Like AH=0Fh and INT 10h: http://en.wikipedia.org/wiki/INT_10H
Now, I understand, that in NASM (and linux in general) I can not do BIOS interruption, so there have to be other way.
The idea is to print some output to fill the screen and then wait for user to press ENTER until print more output.
If you are programming in Linux, then you must use the available system calls to achieve your aims. It's not that there are no interrupts. The system call itself is executed with an interrupt call. Outside of the kernel, however, you will be unable to access them and, since the kernel runs in protected mode, even if you could they likely wouldn't do what you would expect.
To your question, however. To obtain the console size you would need to make use of the ioctl system call. This is accessed with the value of 0x36 in EAX. I'd suggest that you have a read through the manual page for ioctl and you may also find this system call table very useful!
This is a problem I've got to deal with a time ago. The code for unistd.inc and termio.inc can be found here in the includes folder. The program can be found and the makefile you can find in de tree programs/basics/terminal-winsize.
The vaules for rows and columns you can get on any terminal (console). xpixels and ypixels you can get only from some terminals. (xterm yes, gnome-terminal depends). So if you don't get the x and y pixels (screensize) on from some terminals, the terminal is text-based I guess. Correct me if it has another reason for this behaviour.
You can convert this program easily to 32 bits since it make use of nasmx macros for the syscalls,. The only thing you have to do is to replace the 64 bit registers in 32 bit registers and put some parameters in the right register. Look for agguro on github to see all include files.
I hope this is helpfull to you
; Name: winsize
; Build: see makefile
; Run: ./winsize
; Description: Show the screen dimension of a terminal in rows/columns.
BITS 64
[list -]
%include "unistd.inc"
%include "termio.inc"
[list +]
section .bss
buffer: resb 5
.end:
.length: equ $-buffer
lf: resb 1
section .data
WINSIZE winsize
; keep the lengths the same or the 'array' construction will fail!
array: db "rows : "
db "columns : "
db "xpixels : "
db "ypixels : "
.length: equ $-array
.items: equ 4
.itemsize: equ array.length / array.items
section .text
global _start
_start:
mov BYTE[lf], 10 ; end of line in byte after
buffer
; fetch the winsize structure data
syscall ioctl, STDOUT, TIOCGWINSZ, winsize
; initialize pointers and used variables
mov rsi, array ; pointer to array of strings
mov rcx, array.items ; items in array
.nextVariable:
; print the text associated with the winsize variable
push rcx ; save remaining strings to process
push rdx ; save winsize pointer
syscall write, STDOUT, rsi, array.itemsize
pop rax ; restore winsize pointer
push rax ; save winsize pointer
; convert variable to decimal
mov ax, WORD[rax] ; get value form winsize structure
mov rdi, buffer.end-1
.repeat:
xor rbx, rbx ; convert value in decimal
mov bx, 10
xor rdx, rdx
div bx
xchg rax, rdx
or al, "0"
std
stosb
xchg rax, rdx
cmp al, 0
jnz .repeat
push rsi ; save pointer to text
; print the variable value
mov rsi, rdi
mov rdx, buffer.end ; length of variable
sub rdx, rsi
inc rsi
syscall write, STDOUT, rsi, rdx
pop rsi
pop rdx
; calculate pointer to next variable value in winsize
add rdx, 2
; calculate pointer to next string in strings
add rsi, array.itemsize
; if all strings processed
pop rcx ; remaining arrayitems
loop .nextVariable
; exit the program
syscall exit, 0

Why do I need to use [ ] (square brackets) when moving data from register to memory, but not when other way around?

This is the code I have and it works fine:
section .bss
bufflen equ 1024
buff: resb bufflen
whatread: resb 4
section .data
section .text
global main
main:
nop
read:
mov eax,3 ; Specify sys_read
mov ebx,0 ; Specify standard input
mov ecx,buff ; Where to read to...
mov edx,bufflen ; How long to read
int 80h ; Tell linux to do its magic
; Eax currently has the return value from linux system call..
add eax, 30h ; Convert number to ASCII digit
mov [whatread],eax ; Store how many bytes has been read to memory at loc **whatread**
mov eax,4 ; Specify sys_write
mov ebx,1 ; Specify standart output
mov ecx,whatread ; Get the address of whatread to ecx
mov edx,4 ; number of bytes to be written
int 80h ; Tell linux to do its work
mov eax, 1;
mov ebx, 0;
int 80h
Here is a simple run and output:
koray#koray-VirtualBox:~/asm/buffasm$ nasm -f elf -g -F dwarf buff.asm
koray#koray-VirtualBox:~/asm/buffasm$ gcc -o buff buff.o
koray#koray-VirtualBox:~/asm/buffasm$ ./buff
p
2koray#koray-VirtualBox:~/asm/buffasm$ ./buff
ppp
4koray#koray-VirtualBox:~/asm/buffasm$
My question is: What is with these 2 instructions:
mov [whatread],eax ; Store how many byte reads info to memory at loc whatread
mov ecx,whatread ; Get the address of whatread in ecx
Why the first one works with [] but the other one without?
When I try replacing the second line above with:
mov ecx,[whatread] ; Get the address of whatread in ecx
the executable will not run properly, it will not shown anything in the console.
Using brackets and not using brackets are basically two different things:
A bracket means that the value in the memory at the given address is meant.
An expression without a bracket means that the address (or value) itself is meant.
Examples:
mov ecx, 1234
Means: Write the value 1234 to the register ecx
mov ecx, [1234]
Means: Write the value that is stored in memory at address 1234 to the register ecx
mov [1234], ecx
Means: Write the value stored in ecx to the memory at address 1234
mov 1234, ecx
... makes no sense (in this syntax) because 1234 is a constant number which cannot be changed.
Linux "write" syscall (INT 80h, EAX=4) requires the address of the value to be written, not the value itself!
This is why you do not use brackets at this position!

How to print a number in assembly NASM?

Suppose that I have an integer number in a register, how can I print it? Can you show a simple example code?
I already know how to print a string such as "hello, world".
I'm developing on Linux.
If you're already on Linux, there's no need to do the conversion yourself. Just use printf instead:
;
; assemble and link with:
; nasm -f elf printf-test.asm && gcc -m32 -o printf-test printf-test.o
;
section .text
global main
extern printf
main:
mov eax, 0xDEADBEEF
push eax
push message
call printf
add esp, 8
ret
message db "Register = %08X", 10, 0
Note that printf uses the cdecl calling convention so we need to restore the stack pointer afterwards, i.e. add 4 bytes per parameter passed to the function.
You have to convert it in a string; if you're talking about hex numbers it's pretty easy. Any number can be represented this way:
0xa31f = 0xf * 16^0 + 0x1 * 16^1 + 3 * 16^2 + 0xa * 16^3
So when you have this number you have to split it like I've shown then convert every "section" to its ASCII equivalent.
Getting the four parts is easily done with some bit magic, in particular with a right shift to move the part we're interested in in the first four bits then AND the result with 0xf to isolate it from the rest. Here's what I mean (soppose we want to take the 3):
0xa31f -> shift right by 8 = 0x00a3 -> AND with 0xf = 0x0003
Now that we have a single number we have to convert it into its ASCII value. If the number is smaller or equal than 9 we can just add 0's ASCII value (0x30), if it's greater than 9 we have to use a's ASCII value (0x61).
Here it is, now we just have to code it:
mov si, ??? ; si points to the target buffer
mov ax, 0a31fh ; ax contains the number we want to convert
mov bx, ax ; store a copy in bx
xor dx, dx ; dx will contain the result
mov cx, 3 ; cx's our counter
convert_loop:
mov ax, bx ; load the number into ax
and ax, 0fh ; we want the first 4 bits
cmp ax, 9h ; check what we should add
ja greater_than_9
add ax, 30h ; 0x30 ('0')
jmp converted
greater_than_9:
add ax, 61h ; or 0x61 ('a')
converted:
xchg al, ah ; put a null terminator after it
mov [si], ax ; (will be overwritten unless this
inc si ; is the last one)
shr bx, 4 ; get the next part
dec cx ; one less to do
jnz convert_loop
sub di, 4 ; di still points to the target buffer
PS: I know this is 16 bit code but I still use the old TASM :P
PPS: this is Intel syntax, converting to AT&T syntax isn't difficult though, look here.
Linux x86-64 with printf
main.asm
default rel ; make [rel format] the default, you always want this.
extern printf, exit ; NASM requires declarations of external symbols, unlike GAS
section .rodata
format db "%#x", 10, 0 ; C 0-terminated string: "%#x\n"
section .text
global main
main:
sub rsp, 8 ; re-align the stack to 16 before calling another function
; Call printf.
mov esi, 0x12345678 ; "%x" takes a 32-bit unsigned int
lea rdi, [rel format]
xor eax, eax ; AL=0 no FP args in XMM regs
call printf
; Return from main.
xor eax, eax
add rsp, 8
ret
GitHub upstream.
Then:
nasm -f elf64 -o main.o main.asm
gcc -no-pie -o main.out main.o
./main.out
Output:
0x12345678
Notes:
sub rsp, 8: How to write assembly language hello world program for 64 bit Mac OS X using printf?
xor eax, eax: Why is %eax zeroed before a call to printf?
-no-pie: plain call printf doesn't work in a PIE executable (-pie), the linker only automatically generates a PLT stub for old-style executables. Your options are:
call printf wrt ..plt to call through the PLT like traditional call printf
call [rel printf wrt ..got] to not use a PLT at all, like gcc -fno-plt.
Like GAS syntax call *printf#GOTPCREL(%rip).
Either of these are fine in a non-PIE executable as well, and don't cause any inefficiency unless you're statically linking libc. In which case call printf can resolve to a call rel32 directly to libc, because the offset from your code to the libc function would be known at static linking time.
See also: Can't call C standard library function on 64-bit Linux from assembly (yasm) code
If you want hex without the C library: Printing Hexadecimal Digits with Assembly
Tested on Ubuntu 18.10, NASM 2.13.03.
It depends on the architecture/environment you are using.
For instance, if I want to display a number on linux, the ASM code will be different from the one I would use on windows.
Edit:
You can refer to THIS for an example of conversion.
I'm relatively new to assembly, and this obviously is not the best solution,
but it's working. The main function is _iprint, it first checks whether the
number in eax is negative, and prints a minus sign if so, than it proceeds
by printing the individual numbers by calling the function _dprint for
every digit. The idea is the following, if we have 512 than it is equal to: 512 = (5 * 10 + 1) * 10 + 2 = Q * 10 + R, so we can found the last digit of a number by dividing it by 10, and
getting the reminder R, but if we do it in a loop than digits will be in a
reverse order, so we use the stack for pushing them, and after that when
writing them to stdout they are popped out in right order.
; Build : nasm -f elf -o baz.o baz.asm
; ld -m elf_i386 -o baz baz.o
section .bss
c: resb 1 ; character buffer
section .data
section .text
; writes an ascii character from eax to stdout
_cprint:
pushad ; push registers
mov [c], eax ; store ascii value at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; copy c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; writes a digit stored in eax to stdout
_dprint:
pushad ; push registers
add eax, '0' ; get digit's ascii code
mov [c], eax ; store it at c
mov eax, 0x04 ; sys_write
mov ebx, 1 ; stdout
mov ecx, c ; pass the address of c to ecx
mov edx, 1 ; one character
int 0x80 ; syscall
popad ; pop registers
ret ; bye
; now lets try to write a function which will write an integer
; number stored in eax in decimal at stdout
_iprint:
pushad ; push registers
cmp eax, 0 ; check if eax is negative
jge Pos ; if not proceed in the usual manner
push eax ; store eax
mov eax, '-' ; print minus sign
call _cprint ; call character printing function
pop eax ; restore eax
neg eax ; make eax positive
Pos:
mov ebx, 10 ; base
mov ecx, 1 ; number of digits counter
Cycle1:
mov edx, 0 ; set edx to zero before dividing otherwise the
; program gives an error: SIGFPE arithmetic exception
div ebx ; divide eax with ebx now eax holds the
; quotent and edx the reminder
push edx ; digits we have to write are in reverse order
cmp eax, 0 ; exit loop condition
jz EndLoop1 ; we are done
inc ecx ; increment number of digits counter
jmp Cycle1 ; loop back
EndLoop1:
; write the integer digits by poping them out from the stack
Cycle2:
pop eax ; pop up the digits we have stored
call _dprint ; and print them to stdout
dec ecx ; decrement number of digits counter
jz EndLoop2 ; if it's zero we are done
jmp Cycle2 ; loop back
EndLoop2:
popad ; pop registers
ret ; bye
global _start
_start:
nop ; gdb break point
mov eax, -345 ;
call _iprint ;
mov eax, 0x01 ; sys_exit
mov ebx, 0 ; error code
int 0x80 ; край
Because you didn't say about number representation I wrote the following code for unsigned number with any base(of course not too big), so you could use it:
BITS 32
global _start
section .text
_start:
mov eax, 762002099 ; unsigned number to print
mov ebx, 36 ; base to represent the number, do not set it too big
call print
;exit
mov eax, 1
xor ebx, ebx
int 0x80
print:
mov ecx, esp
sub esp, 36 ; reserve space for the number string, for base-2 it takes 33 bytes with new line, aligned by 4 bytes it takes 36 bytes.
mov edi, 1
dec ecx
mov [ecx], byte 10
print_loop:
xor edx, edx
div ebx
cmp dl, 9 ; if reminder>9 go to use_letter
jg use_letter
add dl, '0'
jmp after_use_letter
use_letter:
add dl, 'W' ; letters from 'a' to ... in ascii code
after_use_letter:
dec ecx
inc edi
mov [ecx],dl
test eax, eax
jnz print_loop
; system call to print, ecx is a pointer on the string
mov eax, 4 ; system call number (sys_write)
mov ebx, 1 ; file descriptor (stdout)
mov edx, edi ; length of the string
int 0x80
add esp, 36 ; release space for the number string
ret
It's not optimised for numbers with base of power of two and doesn't use printf from libc.
The function print outputs the number with a new line. The number string is formed on stack. Compile by nasm.
Output:
clockz
https://github.com/tigertv/stackoverflow-answers/tree/master/8194141-how-to-print-a-number-in-assembly-nasm

Resources