Itoa assembly implementation, div operation causes segfault? [duplicate] - linux

This question already has answers here:
8086 assembly on DOSBox: Bug with idiv instruction?
(1 answer)
Why should EDX be 0 before using the DIV instruction?
(2 answers)
Closed 4 years ago.
So I'm trying to implement itoa, which converts an int into a string.
So far, the implementation is working if I don't loop in the .loop section, and stick to small numbers. As soon as it loops, my program segfaults.
Here is the code:
section .data
buffer times 11 db 0
section .text
global ft_itoa
extern ft_strrevd
extern malloc
ft_itoa:
mov rcx, 1 ;initialize our counter at 1 for the terminating null byte
mov rax, rdi ;move number in RAX for DIV instruction
push rbx ;save RBX
mov bl, 10
.check_negative:
and edi, 0xf0000000
mov rdi, buffer
jz .loop ;number is positive, proceed to main loop
not rax ;else
inc rax ;compute absolute value with binary complement
mov r9, 1 ;set neg flag
.loop:
cmp rax, 0
jz .check_neg_flag
div bl
add ah, 48 ;convert int to char
mov byte[rdi + rcx - 1], ah ;copy char in buffer
sub ah, 48
inc rcx
jmp .loop ;commenting this line prevents crash
.check_neg_flag:
cmp r9, 1
jne .dup
mov byte[rdi + rcx - 1], '-'
inc rcx
.dup:
mov byte[rdi + rcx - 1], 0
call ft_strrevd ;copy buffer string in memory and return pointer
.end:
pop rbx ;restore RBX
ret
It's most likely caused by the div, but I'm having trouble understanding how it works.
If anyone could point me towards a solution it'd be highly appreciated.

So in order to fix this I have to use div ebx instead of div bl, and xor edx, edx before each div.
Here is a working version:
section .data
buffer times 11 db 0
nega db "neg",0
posi db "pos",0
section .text
global ft_itoa
extern ft_strdup
extern malloc
ft_itoa:
xor rcx, rcx ;initialize counter
xor r9, r9 ;set neg flag to 0
mov eax, edi ;move number in RAX for DIV instruction
push rbx ;save RBX
mov ebx, 10
.check_negative:
and edi, 0x80000000
mov rdi, buffer
jz .divide ;number is positive, proceed to main loop
not eax ;else
inc eax ;compute absolute value with binary complement
inc r9 ;set neg flag
.divide:
xor edx, edx
div ebx
add edx, 48 ;convert int to char
push rdx
inc rcx
cmp eax, 0
jnz .divide
.check_neg_flag:
cmp r9, 1
jne .buff_string
mov byte[rdi], '-'
.buff_string:
pop rdx
mov byte[rdi + r9], dl
dec rcx
inc r9
cmp rcx, 0
jnz .buff_string
.dup:
mov byte[rdi + r9], 0
call ft_strdup ;copy buffer string in memory and return pointer
pop rbx ;restore RBX
ret

Related

Windows multicore program worked before but suddenly threads execute randomly

Below is a multithreaded (4-core) NASM 64 program. It's not minimal, but it's complete; I posted the complete code because a minimal example may not reveal the problem in this code.
This is my first multithreaded multicore program in NASM, and I was happy to see yesterday afternoon that it worked correctly (all four cores) and returned the values I was expecting. I ran it successfully several times yesterday and again several times this morning.
Suddenly, about 10 minutes after my last run, without making any changes to the code, without re-assembling the dll, I ran it again and this time the threads execute randomly -- not all threads execute, and the configuration of threads that execute varies from run to run -- sometimes thread 1 only; sometimes threads 2-4 but not thread 1, sometimes only thread 2 and 4, etc.
My code did not have ExitThread or CloseHandle calls, so I added them in (see label_899:), rebooted and ran it again. The first run after a fresh reboot still shows random execution of threads.
I thought the problem may be that CreateThread was failing, so I also returned the thread handles. Every time, all four thread handles were created even though all did not execute, so that's not it.
I have done a lot of research but I haven't seen this issue discussed at all. Even though this is NASM, the same thing could also be applicable to C or C++ multithreading. I use WaitForMultipleObjects to wait on all threads to complete.
This is a dll, and the entry point is Main_Entry_fn. The dll is called from Python using ctypes, but I doubt that makes any difference. I use ctypes frequently and have never had a problem with it.
Thanks for any ideas.
P.S. ignore the realloc code because the buffers are large enough that realloc is never called.
UPDATE: Here is how I constructed the progam flow: the entry point is Main_Entry_fn toward the bottom of the program listing (it's an export because it has to be exported for a dll call). Main_Entry_fn takes the input data from the calling function, preserves registers (because the whole program is register optimized) and allocates output buffers. Main_Entry_fn then calls Init_Cores_fn (top of program listing), where we create four arrays with data for the registers on entry to each core (remember all vars are stored in registers -- see the call to DupThreadInfo), and creates four threads in a loop. The threads all call Prime_Number_fn, and the threads are set to start immediately upon creation.
Register assignments:
r15 list_of_results_ptr (pointer to list_of_results)
r14 list_of_results_ctr (counter for list_of_results)
r13 list_of_results_length
r12 numbers_ptr (pointer to "numbers" input array)
r11 numbers_ctr
r10 numbers_length
r9 num
r8 i
xmm15 num_float
xmm14: result
; Header Section
[BITS 64]
[default rel]
extern malloc, calloc, realloc, free
global Main_Entry_fn
export Main_Entry_fn
global FreeMem_fn
export FreeMem_fn
extern CreateThread, CloseHandle, ExitThread
extern WaitForMultipleObjects, GetCurrentThread
section .data align=16
Return_Pointer_Array: dq 0, 0, 0
Input_Length_Array: dq 0, 0,
list_of_results_ptr: dq 0
list_of_results_ctr: dq 0
list_of_results_length: dq 0
data_master_ptr: dq 0
initial_dynamic_length: dq 0
internal_dynamic_length: dq 5000
XMM_Stack: dq 0, 0, 0
numbers_ptr: dq 0
numbers_length: dq 0
numbers: dq 0
list_of_results: dq 0
num_float: dq 0.0
loop_counter_401: dq 0
numbers_ctr: dq 0
num: dq 0
result: dq 0
const_1: dq 1
i: dq 0
range_loop_start_i: dq 0
range_loop_end_i: dq 0
const_0: dq 0
; New vars for threads:
ThreadCount: times 4 dq 0
ThreadInfo: times 10 dq 0
ThreadInfo2: times 10 dq 0
ThreadInfo3: times 10 dq 0
ThreadInfo4: times 10 dq 0
TestInfo: times 4 dq 0
ThreadHandles: times 4 dq 0
Return_Data_Array: times 8 dq 0 ; ptr,ptr,ptr,ptr,length,length,length,length
StartByte: dq 0
stride: dq 8
output_buffer_pointers: times 4 dq 0 ; for malloc
Division_Size: dq 0
Division_Start: dq 0
section .text
; ______________________________________
Init_Cores_fn:
; Calculate the data divisions
mov rax,r10
mov rbx,4 ;cores
xor rdx,rdx
div rbx
mov [Division_Size],rax
mov [Division_Start],rax
; Populate the ThreadInfo array with vars to pass
; ThreadInfo: length, startbyte, stride, vars into registers on entry to each core
mov rdi,ThreadInfo
mov rax,0 ;ThreadInfoLength
mov [rdi],rax ; length (number of vars into registers plus 3 elements)
mov rax,[stride]
mov [rdi+16],rax ; 8 x number of cores (32 in this example)
; Vars (these registers are populated on main entry)
mov rbp,output_buffer_pointers
mov rax,[rbp]
mov [rdi+24],rax ;for r15
mov [rdi+32],r14
mov [rdi+40],r13
mov [rdi+48],r12
mov rax,[Division_Start]
mov [rdi+56],rax
mov rax,0
mov [rdi+64],rax
call DupThreadInfo
mov rbp,rsp ; preserve caller's stack frame
sub rsp,56 ; Shadow space (was 32)
; _____
label_0:
mov rax,[StartByte]
cmp rax,0
jne sb2
mov rdi,ThreadInfo
jmp sb5
sb2:cmp rax,8
jne sb3
mov rdi,ThreadInfo2
jmp sb5
sb3:cmp rax,16
jne sb4
mov rdi,ThreadInfo3
jmp sb5
sb4:cmp rax,24
jne sb5
mov rdi,ThreadInfo4
sb5:
; _____
; Create Threads
mov rcx,0 ; lpThreadAttributes (Security Attributes)
mov rdx,0 ; dwStackSize
mov r8,PrimeNumber_fn ; lpStartAddress (function pointer)
mov r9,rdi ; lpParameter (array of data passed to each core)
mov rax,0
mov [rsp+32],rax ; use default creation flags
mov rdi,ThreadCount
mov [rsp+40],rdi ; ThreadID
call CreateThread
; Move the handle into ThreadHandles array (returned in rax)
mov rdi,ThreadHandles
mov rcx,[StartByte]
mov [rdi+rcx],rax
mov rax,[StartByte]
add rax,8
mov [StartByte],rax
mov rbx,32 ; Four cores
cmp rax,rbx
jl label_0
; _____
; Wait
mov rcx,4 ;rax ; number of handles
mov rdx,ThreadHandles ; pointer to handles array
mov r8,0 ; wait for all threads to complete
mov r9,5000 ; milliseconds to wait
call WaitForMultipleObjects
; _____
mov rsp,rbp
jmp label_900
; ______________________________________
PrimeNumber_fn:
; Populate registers
;(note: rcx is the return value for ThreadProc)
mov rdi,rcx
mov rax,[rdi]
mov r15,[rdi+24]
mov r13,[rdi+40]
mov r12,[rdi+48]
mov r10,[rdi+56]
xor r11,r11
xor r9,r9
mov r8,[rdi+8] ; start byte
pxor xmm15,xmm15
pxor xmm15,xmm14
pxor xmm15,xmm13
; Get the ThreadID based on startbyte
mov rax,[rdi+72] ; 0, 8, 16, 24 for 4-core
push rax
;______
label_401:
mov rdi,r12 ; Pointer
cmp r8,r10
jge label_899
movsd xmm0,qword[rdi+r8]
movsd xmm15,xmm0
add r8,8
;______
cvttsd2si rax,xmm15
mov r9,rax
;______
label_8010:
label_801:
cmp r9,[const_1]
jle label_401
;______
label_12010:
mov rax,2
sub rax,1
movq xmm13,rax
label_1201:
movq rcx,xmm13
movq xmm1,[const_1]
addsd xmm13,xmm1
movq rcx,xmm13
mov rdx,r9
cmp rcx,rdx
jge label_12020
;______
label_16010:
label_1601:
mov rax,r9
movq rbx,xmm13
xor rdx,rdx
div rbx
mov rax,rdx
mov rdx,0
cmp rax,rdx
jne label_16020
;______
movq xmm14,[const_0]
;______
movq rax,xmm14
cvtsi2sd xmm0,rax
movsd [r15+r14],xmm14
add r14,8
mov rax,r13
cmp r14,rax
jl next_17
;[Irrelevant code omitted]
next_17:
;______
jmp label_401
;______
label_1602:
label_16020:
;______
mov rax,r9
cvtsi2sd xmm14,rax
;______
movq rax,xmm14
cvtsi2sd xmm0,rax
movsd [r15+r14],xmm14
add r14,8
mov rax,r13
cmp r14,rax
jl next_21
;[Irrelevant code omitted]
next_21:
jmp label_1201
;______
label_1202:
label_12020:
;______
movq xmm14,[const_0]
;______
movq rax,xmm14
cvtsi2sd xmm0,rax
movsd [r15+r14],xmm14
add r14,8
mov rax,r13
cmp r14,rax
jl next_25
;[Irrelevant code omitted]
next_25:
jmp label_401
;______
label_899:
pop rax
mov rdi,Return_Data_Array
mov [rdi+rax],r15
mov [rdi+rax+32],r14 ; 32 = four cores
call ExitThread
call GetCurrentThread
mov rcx,rax
call CloseHandle
ret
; __________
label_900:
exit_label_for_PrimeNumber_fn:
mov rdi,Return_Data_Array ; Final return to the calling process
mov rax,rdi
ret
;__________
;Free the memory
FreeMem_fn:
sub rsp,40
call free
add rsp,40
ret
; __________
; Main Entry
Main_Entry_fn:
push rdi
push rbp
push rbx
push r15
xor r15,r15
push r14
xor r14,r14
push r13
xor r13,r13
push r12
xor r12,r12
push r11
xor r11,r11
push r10
xor r10,r10
push r9
xor r9,r9
movsd [XMM_Stack+0],xmm15
movsd [XMM_Stack+8],xmm14
movsd [XMM_Stack+16],xmm13
push r8
xor r8,r8
mov [numbers_ptr],rcx
mov [data_master_ptr],rdx
; Now assign lengths
lea rdi,[data_master_ptr]
mov rbp,[rdi]
xor rcx,rcx
movsd xmm0,qword[rbp+rcx]
cvttsd2si rax,xmm0
mov [numbers_length],rax
add rcx,8
; __________
; malloc for dynamic arrays
lea rdi,[data_master_ptr]
mov rbp,[rdi]
movsd xmm0,qword[rbp]
cvttsd2si rax,xmm0
mov r8,rax
mov rdx,10
mul rdx
mov rdx,10000000
cmp rax,rdx
jl malloc_next
mov rax,r8
malloc_next:
mov rax,50000000
mov [initial_dynamic_length],rax
;__________
; Output buffer #1
mov rcx,qword[initial_dynamic_length] ; Initial size
xor rax,rax
sub rsp,40
call malloc
mov rdi,output_buffer_pointers
mov [rdi],rax
add rsp,40
mov rax,qword[initial_dynamic_length]
mov [list_of_results_length],rax
;__________
; Output buffer #2
mov rcx,qword[initial_dynamic_length] ; Initial size
xor rax,rax
sub rsp,40
call malloc
mov rdi,output_buffer_pointers
mov [rdi+8],rax
add rsp,40
mov rax,qword[initial_dynamic_length]
mov [list_of_results_length],rax
;__________
; Output buffer #3
mov rcx,qword[initial_dynamic_length] ; Initial size
xor rax,rax
sub rsp,40
call malloc
mov rdi,output_buffer_pointers
mov [rdi+16],rax
add rsp,40
mov rax,qword[initial_dynamic_length]
mov [list_of_results_length],rax
;__________
; Output buffer #4
mov rcx,qword[initial_dynamic_length] ; Initial size
xor rax,rax
sub rsp,40
call malloc
mov rdi,output_buffer_pointers
mov [rdi+24],rax
add rsp,40
mov rax,qword[initial_dynamic_length]
mov [list_of_results_length],rax
; __________
; Write variables to assigned registers
lea rdi,[rel list_of_results_ptr]
mov r15,qword[rdi]
mov r14,[list_of_results_ctr]
mov r13,[list_of_results_length]
lea rdi,[rel numbers_ptr]
mov r12,qword[rdi]
mov r11,[numbers_ctr]
mov r10,[numbers_length]
mov r9,[num]
movsd xmm15,[num_float]
movsd xmm14,[result]
movsd xmm13,[i]
mov r8,[loop_counter_401]
; __________
call Init_Cores_fn
exit_label_for_Main_Entry_fn:
pop r8
movsd xmm13,[XMM_Stack+0]
movsd xmm14,[XMM_Stack+8]
movsd xmm15,[XMM_Stack+16]
pop r9
pop r10
pop r11
pop r12
pop r13
pop r14
pop r15
pop rbx
pop rbp
pop rdi
ret
;_____________
DupThreadInfo:
mov rdi,ThreadInfo2
; StartByte and EndByte
mov rax,[Division_Start]
mov [rdi+8],rax
add rax,[Division_Size]
mov [Division_Start],rax
sub rax,8
mov [rdi+56],rax
mov rax,[stride]
mov [rdi+16],rax ; 8 x number of cores (32 in this example)
; Vars (these registers are populated on main entry)
mov rbp,output_buffer_pointers
mov rax,[rbp+8]
mov [rdi+24],rax
mov rax,8
mov [rdi+32],rax
mov [rdi+40],r13
mov [rdi+48],r12
; 56 is above
mov rax,8
mov [rdi+64],rax
mov rax,8 ; Thread number based on startbyte, for Return_Data_Array
mov [rdi+72],rax
; _____
mov rdi,ThreadInfo3
; StartByte and EndByte
mov rax,[Division_Start]
mov [rdi+8],rax
mov rbx,[Division_Size]
add rax,rbx
mov [Division_Start],rax
sub rax,8
mov [rdi+56],rax
mov rax,[stride]
mov [rdi+16],rax ; 8 x number of cores (32 in this example)
; Vars (these registers are populated on main entry)
mov rbp,output_buffer_pointers
mov rax,[rbp+16]
mov [rdi+24],rax
mov rax,16
mov [rdi+32],rax
mov [rdi+40],r13
mov [rdi+48],r12
; [rdi+56] is above
mov rax,16
mov [rdi+64],rax
mov rax,16 ; Thread number based on startbyte, for Return_Data_Array
mov [rdi+72],rax
mov rdi,ThreadInfo4
; StartByte and EndByte
mov rax,[Division_Start]
mov [rdi+8],rax
mov rax,[numbers_length] ; final segment goes to the end
mov [rdi+56],rax
mov rax,[stride]
mov [rdi+16],rax ; 8 x number of cores (32 in this example)
; Vars (these registers are populated on main entry)
mov rbp,output_buffer_pointers
mov rax,[rbp+24]
mov [rdi+24],rax
mov rax,24
mov [rdi+32],rax
mov [rdi+40],r13
mov [rdi+48],r12
; [rdi+56] is above
mov rax,24
mov [rdi+64],rax
mov rax,24 ; Thread number based on startbyte, for Return_Data_Array
mov [rdi+72],rax
ret

How to compare the count of command line arguments correctly in NASM?

I am learning x86_64 NASM assembly on Ubuntu 16.10 on Docker for Mac.
The following program takes two command line arguments, and sum these.
If number of command line arguments is not two, print error message (jump to argcError).
When I exec this program, it jump to argcError section despite passed to two command line arguments.
Why this program jump to argError?
section .data
SYS_WRITE equ 1
STD_IN equ 1
SYS_EXIT equ 60
EXIT_CODE equ 0
NEW_LINE db 0xa
WRONG_ARGC db "Must be two command line arguments", 0xa
section .text
global _start
_start:
pop rcx
cmp rcx, 3
jne argcError
add rsp, 8
pop rsi
call str_to_int
mov r10, rax
pop rsi
call str_to_int
mov r11, rax
add r10, r11
argcError:
mov rax, 1
mov rdi, 1
mov rsi, WRONG_ARGC
mov rdx, 35
syscall
jmp exit
str_to_int:
xor rax, rax
mov rcx, 10
next:
cmp [rsi], byte 0
je return_str
mov bl, [rsi]
sub bl, 48
mul rcx ; rax = rax * rcx
add rax, rbx
inc rsi
jmp next
return_str:
ret
int_to_str:
mov rdx, 0
mov rbx, 10
div rbx
add rdx, 48
add rdx, 0x0
push rdx
inc r12
cmp rax, 0x0
jne int_to_str
jmp print
print:
; calculate byte length of number string
mov rax, 1
mul r12
mov r12, 8
mul r12
mov rdx, rax
; print sum
mov rax, SYS_WRITE
mov rdi, STD_IN
mov rsi, rsp
syscall
jmp printNewline
printNewline:
mov rax, SYS_WRITE
mov rdi, STD_IN
mov rsi, NEW_LINE
mov rdx, 1
syscall
jmp exit
exit:
mov rax, SYS_EXIT
mov rdi, EXIT_CODE
syscall
There probably other errors in your code as pointed out by Micheal Petch, but the way you've initialized RSI is incorrect. Yes, ESP does point to the number of arguments passed, but popping it off the stack and then adding 8 to ESP again is functionally equivalent too.
mov rcx, [rsp]
Then by popping into RSI it only becomes a copy of RCX. If you want to do that it should look like this
pop rcx
.......
add rsp, 24 ; Now RSP is pointing to proper place in array of pointers
pop rsi
add rsp, 16 ; Now point to pointer to second argument
pop rsi
An alternative would be this next example only because my personal preference is not to use stack pointer for other than that which it was intended.
mov rsi, rsp
lodsq ; Read # of arguments passed by OS
add rsi, 8 ; bounce over application name
cmp al, 3
jnz argError
push rsi
lodsq
mov rsi, rax ; RSI points to first agument
call Convert
pop rsi
lodsq
mov rsi, rax
call Convert

write number to file using NASM

How do I write a variable to a file using NASM?
For example, if I execute some mathematical operation - how do I write the result of the operation to write a file?
My file results have remained empty.
My code:
%include "io.inc"
section .bss
result db 2
section .data
filename db "Downloads/output.txt", 0
section .text
global CMAIN
CMAIN:
mov eax,5
add eax,17
mov [result],eax
PRINT_DEC 2,[result]
jmp write
write:
mov EAX, 8
mov EBX, filename
mov ECX, 0700
int 0x80
mov EBX, EAX
mov EAX, 4
mov ECX, [result]
int 0x80
mov EAX, 6
int 0x80
mov eax, 1
int 0x80
jmp exit
exit:
xor eax, eax
ret
You have to implement ito (integer to ascii) subsequently len for this manner. This code tested and works properly in Ubuntu.
section .bss
answer resb 64
section .data
filename db "./output.txt", 0
section .text
global main
main:
mov eax,5
add eax,44412
push eax ; Push the new calculated number onto the stack
call itoa
mov EAX, 8
mov EBX, filename
mov ECX, 0x0700
int 0x80
push answer
call len
mov EBX, EAX
mov EAX, 4
mov ECX, answer
movzx EDX, di ; move with extended zero edi. length of the string
int 0x80
mov EAX, 6
int 0x80
mov eax, 1
int 0x80
jmp exit
exit:
xor eax, eax
ret
itoa:
; Recursive function. This is going to convert the integer to the character.
push ebp ; Setup a new stack frame
mov ebp, esp
push eax ; Save the registers
push ebx
push ecx
push edx
mov eax, [ebp + 8] ; eax is going to contain the integer
mov ebx, dword 10 ; This is our "stop" value as well as our value to divide with
mov ecx, answer ; Put a pointer to answer into ecx
push ebx ; Push ebx on the field for our "stop" value
itoa_loop:
cmp eax, ebx ; Compare eax, and ebx
jl itoa_unroll ; Jump if eax is less than ebx (which is 10)
xor edx, edx ; Clear edx
div ebx ; Divide by ebx (10)
push edx ; Push the remainder onto the stack
jmp itoa_loop ; Jump back to the top of the loop
itoa_unroll:
add al, 0x30 ; Add 0x30 to the bottom part of eax to make it an ASCII char
mov [ecx], byte al ; Move the ASCII char into the memory references by ecx
inc ecx ; Increment ecx
pop eax ; Pop the next variable from the stack
cmp eax, ebx ; Compare if eax is ebx
jne itoa_unroll ; If they are not equal, we jump back to the unroll loop
; else we are done, and we execute the next few commands
mov [ecx], byte 0xa ; Add a newline character to the end of the character array
inc ecx ; Increment ecx
mov [ecx], byte 0 ; Add a null byte to ecx, so that when we pass it to our
; len function it will properly give us a length
pop edx ; Restore registers
pop ecx
pop ebx
pop eax
mov esp, ebp
pop ebp
ret
len:
; Returns the length of a string. The string has to be null terminated. Otherwise this function
; will fail miserably.
; Upon return. edi will contain the length of the string.
push ebp ; Save the previous stack pointer. We restore it on return
mov ebp, esp ; We setup a new stack frame
push eax ; Save registers we are going to use. edi returns the length of the string
push ecx
mov ecx, [ebp + 8] ; Move the pointer to eax; we want an offset of one, to jump over the return address
mov edi, 0 ; Set the counter to 0. We are going to increment this each loop
len_loop: ; Just a quick label to jump to
movzx eax, byte [ecx + edi] ; Move the character to eax.
movsx eax, al ; Move al to eax. al is part of eax.
inc di ; Increase di.
cmp eax, 0 ; Compare eax to 0.
jnz len_loop ; If it is not zero, we jump back to len_loop and repeat.
dec di ; Remove one from the count
pop ecx ; Restore registers
pop eax
mov esp, ebp ; Set esp back to what ebp used to be.
pop ebp ; Restore the stack frame
ret ; Return to caller

Displaying 64-bit register in ASM

This is my first attempt in 64-bit assembly under Linux. I am using FASM.
I am converting a 64-bit register hex value to string. It is working fine until it reaches the final digit. I can't figure out exactly what's wrong with my code. Maybe there is something about 64-programming that I don't know or with the syscall (I am a linux noob as well)
format ELF64 executable 3
entry start
segment readable executable
start:
mov rax,3c5677h ;final '7' is not displayed
push rax
call REG
;call line
xor edi,edi ;exit
mov eax,60
syscall
;----------------------------------
REG:push rbp ;stack frame setup
mov rbp,rsp
sub rsp,8 ;space for local char
mov rax,[rbp+16];arg
lea r9,[rsp-8] ;local char
mov rcx,16 ;divisor
mov rsi,16 ;16 hex digits for register
.begin: ;get the digit
xor rdx,rdx ;by division
div rcx ;of 16
push rdx ;from back to front
dec rsi
test rsi,rsi
jz .disp
jmp .begin
.disp: ;convert and display digit
inc rsi
pop rax ;In reverse order
add rax,30h ;convert digit to string
cmp rax,39h ;if alpha
jbe .normal
add rax,7 ;add 7
.normal:
mov [r9],rax ;copy the value
push rsi ;save RSI for syscall
mov rsi,r9 ;address of char
mov edx,1 ;size
mov edi,1 ;stdout
mov eax,1 ;sys_write
syscall
pop rsi ;restore RSI for index
cmp rsi,16
je .done
jmp .disp
.done:
add rsp,8 ;stack balancing
pop rbp
ret
Thanks in advance for your help.
I believe the problem printing the last digit comes from how you load r9. If on entry, rsp was 100. You subtract 8 (rsp = 92), then load r9 with rsp - 8 (r9 = 84). Presumably you meant r9 to 100, so try changing that to:
lea r9, [rsp+8]
For a more efficient solution, how about something more like this (assumes value in rbx):
mov r9, 16 ; How many digits to print
mov rsi, rsp ; memory to write digits to
sub rsp, 8 ; protect our stack
mov edx, 1 ; size is always 1
mov edi, 1 ; stdout is always 1
.disp:
rol rbx, 4 ; Get the next nibble
mov cl, bl ; copy it to scratch
and cl, 15 ; mask out extra bits
add cl, 0x30 ; Convert to char
cmp cl, 0x39 ; if alpha
jbe .normal
add cl, 7 ; Adjust for letters
.normal:
mov [rsi], cl ; copy the value
mov eax, 1 ; sys_write
syscall ; overwrites rcx, rax, r11
dec r9 ; Finished a digit
jnz .disp ; Are we done?
add rsp, 8 ; Done with the memory

Printing a number in x86-64 assembly

Okay, all I am trying to do is print a number (up to 18446744073709551616) in x86-64 assembly for Linux. Can anyone please tell me why this program will not work? All that happens is that it runs and exits. Thank you for all the help you can give!
GLOBAL _start
SECTION .text
;PRINTCHAR
; MOV [LETTER],RAX
;
; MOV RAX,1
; MOV RDI,1
; MOV RSI,LETTER
; MOV RDX,1
; SYSCALL
; RET
PRINTDEC:
MOV R9,18 ;SO IT CAN POINT TO THE END OF THE BUFFER
MOV R10,0
START:
MOV R8,NUMBER
MOV RDX,0 ;CLEAR OUT RDX TO AVOID ERRORS
MOV RBX,10 ;WHAT TO DIVIDE BY
DIV RBX ;DIVIDE OUR NUMBER BY TEN
CMP RAX,0 ;IF OUR QUOTENT IS ZERO THEN WE ARE DONE, PRINT THE BUFFER
JE END
JMP ADDBUF
ADDBUF:
ADD R8,R9 ;MOV TO THE CURRENT LOCATION IN OUR BUFFER
ADD RDX,0x30
; ADD R8,R10
MOV [R8],RDX ;MOV THE LAST NUMBER IN OUR BUFFER TO RDX
DEC R9
INC R10
JMP START
END:
ADD R8,R9 ;add the very last digit
MOV [R8],RDX
INC R10
MOV RAX,1
MOV RDI,1
MOV RSI,R8
MOV RDX,R10
SYSCALL
RET
_start:
MOV RAX,55
CALL PRINTDEC
MOV RAX,60
MOV RDI,0
SYSCALL
SECTION .bss
LETTER: RESB 1
NUMBER: RESB 19
PRINTDEC:
LEA R9, [NUMBER + 18] ; last character of buffer
MOV R10, R9 ; copy the last character address
MOV RBX, 10 ; base10 divisor
DIV_BY_10:
XOR RDX, RDX ; zero rdx for div
DIV RBX ; rax:rdx = rax / rbx
ADD RDX, 0x30 ; convert binary digit to ascii
TEST RAX,RAX ; if rax == 0 exit DIV_BY_10
JZ LAST_REMAINDER
MOV byte [R9], DL ; save remainder
SUB R9, 1 ; decrement the buffer address
JMP DIV_BY_10
LAST_REMAINDER:
TEST DL, DL ; if DL (last remainder) != 0 add it to the buffer
JZ CHECK_BUFFER
MOV byte [R9], DL ; save remainder
SUB R9, 1 ; decrement the buffer address
CHECK_BUFFER:
CMP R9, R10 ; if the buffer has data print it
JNE PRINT_BUFFER
MOV byte [R9], '0' ; place the default zero into the empty buffer
SUB R9, 1
PRINT_BUFFER:
ADD R9, 1 ; address of last digit saved to buffer
SUB R10, R9 ; end address minus start address
ADD R10, 1 ; R10 = length of number
MOV RAX, 1 ; NR_write
MOV RDI, 1 ; stdout
MOV RSI, R9 ; number buffer address
MOV RDX, R10 ; string length
SYSCALL
RET

Resources