For context I'm using NASM on a 64 bit Debian distro.
I'm still learning Assembly as part of writing my own programming language but I recently ran into a problem that I'm not sure how to handle. The following is a snippet of code that my compiler spits out:
section .text
global _start
section .var_1 write
char_1 db 'Q', 0
section .var_2 write
string_1 db 'Asdf', 0
section .var_3 write
char_2 db 'W', 0
section .text
_start:
push 4 ; String length onto stack
push string_1
;; Push a raw char onto the stack
mov bl, [char_1]
push bx
pop ax
pop rbx
pop rcx
mov byte [rbx+rcx], al
If I then print out the value of string_1, I see AsdfWQ. As I understand it, this is because of the mov command I am using to append combined with the fact that I have some data declared after the string's termination character. I've been trying to search around on Google with no luck about how to resolve this problem (partially because I don't know exactly what to search for). Conceptually I would think I could move the address of everything after string_1 by the offset of the length of what I'm appending but this seems highly inefficient if I had something like 40 different pieces of data after that. So what I'm trying to sort out is, how do I manage dynamic data that could increase or decrease in size in assembly?
Edit
Courtesy of fuz pointing out that dynamic memory allocation via the brk calls works, I've revised the program a little but am still experience come issues:
section .var_1 write
hello_string db '', 0
section .var_2 write
again_string db 'Again!', 0
section .text
_start:
;; Get current break address
mov rdi, 0
mov rax, 12
syscall
;; Attempt to allocate 8 bytes for string
mov rdi, rax
add rdi, 8
mov rax, 12
syscall
;; Set the memory address to some label
mov qword [hello_string], rax
;; Try declaring a string
mov byte [hello_string], 'H'
mov byte [hello_string+1], 'e'
mov byte [hello_string+2], 'l'
mov byte [hello_string+3], 'l'
mov byte [hello_string+4], 'o'
mov byte [hello_string+5], ','
mov byte [hello_string+6], ' '
mov byte [hello_string+7], 0
;; Print the string
mov rsi, hello_string
mov rax, 1
mov rdx, 8
mov rdi, 1
syscall
;; Print the other string
mov rsi, again_string
mov rax, 1
mov rdx, 5
mov rdi, 1
syscall
This results in Hello, ello, which means that I'm still overwriting data associated with the again_string label? But I was under the impression that using brk to allocate would do so after the data had been initialized?
Edit: This is similar to this: Reset a string variable to print multitple user inputs in a loop (NASM Assembly). But it is not the same issue.
From the other post, I was able to prevent additional characters from being printed. However, I still cannot prevent those additional characters from being read when the program goes back to the point in which it asks the user for input.
I'm creating a program that asks an user for input, and then prints it. Afterwards, it asks the user to enter 'y' if they want to print another text, or press anything else to close the program.
My issue is that if the user enters more characters than expected, those extra characters don't go away, and when the program goes back to ask the user for input, there's no chance to enter input because the program takes the remaining characters from the last time it received input.
For example:
The user is asked to enter text to print, and they enter: "Heyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
Leftover is "yyy"
At this point, the program should ask the user to enter 'y' to repeat the process, or anything else to close the program.
Output:
Heyyyyyyyyyyyyyyyyyyyyyyyyyy
Wanna try again? If yes, enter y. If not, enter anything else to close the program
Enter your text:
Output: yy
Wanna try again? If yes, enter y. If not, enter anything else to close the program.
And only now it asks for user input again
Since "yyy" is still in buffer, the user doesn't get a chance to actually enter input in this case.
What can I do to fix this issue? I've just started to learn about this recently, so I'd appreciate any help.
Thanks.
This is what I've done
prompt db "Type your text here.", 0h
retry db "Wanna try again? If yes, enter y. If not, enter anything else to close the program"
section .bss
text resb 50
choice resb 2
section .text
global _start
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, 21
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax ;This is what I added to prevent additional characters
;from being printed
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, 83
syscall
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
mov r8b, [choice] ;If choice is different from y, go to end and close the program. Otherwhise, go back to start.
cmp byte r8b, 'y'
jne end
jmp _start
end:
mov rax, 60
mov rdi, 0
syscall
The simple way to clear stdin is to check if the 2nd char in choice is the '\n' (0xa). If it isn't, then characters remain in stdin unread. You already know how to read from stdin, so in that case, just read stdin until the '\n' is read1, e.g.
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
Beyond that, you should determine your prompt lengths when you declare them, e.g.
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
That way you do not have to hardcode lengths in case you change your prompts, e.g.
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
If you put it altogether, you can do:
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
section .bss
text resb 50
choice resb 2
section .text
global _start
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
end:
mov rax, 60
mov rdi, 0
syscall
Example Use/Output
$ ./bin/emptystdin
Type your text here. abc
abc
Try again (y/n)? y
Type your text here. def
def
Try again (y/n)? yes please!
Type your text here. geh
geh
Try again (y/n)? yyyyyyyyyyyyyeeeeeeeeeesssssssss!!!!
Type your text here. ijk
ijk
Try again (y/n)? n
Now even a cat stepping on the keyboard at your (y/n)? prompt won't cause problems. There are probably more elegant ways to handle this that would be more efficient that repetitive reads, with syscall, but this will handle the issue.
Additional Considerations And Error-Checks
As mentioned above, the simplistic reading and checking of a character-at-a-time isn't a very efficient approach, though it is conceptually the easiest extension without making other changes. #PeterCordes makes a number of good points in the comments below related to approaches that are more efficient and more importantly about error conditions that can arise that should be protected against as well.
For starters when you are looking for information on the individual system call use, Anatomy of a system call, part 1 provides a bit of background on approaching their use supplemented by the Linux manual page, for read man 2 read for details on the parameter types and return type and values.
The original solution above does not address what happens if the user generates a manual end-of-file by pressing Ctrl+d or if a read error actually occurs. It simply addressed the user-input and emptying stdin question asked. With any user-input, before you use the value, you must validate that the input succeeded by checking the return. (not just for the yes/no input, but all inputs). For purposes here, you can consider zero input (manual end-of-file) or a negative return (read error) as a failed input.
To check whether you have at least one valid character of input, you can simply check the return (read returns the number of characters read, sys_read placing that value in rax after the syscall). A zero or negative value indicating no input was received. A check could be:
cmp rax, 0 ; check for 0 bytes read or error
jle error
You can write a short diagnostic to the user and then handle the error as wanted, this example simply exits after outputting a diagnostic, e.g.
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
...
; your call to read here
cmp rax, 0 ; check for 0 bytes read or error
jle error
...
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
Now moving on to a more efficient manner for emptying stdin. The biggest hindrance indicate in the original answer was the repeated system calls to sys_read to read one character at a time reusing your 2-byte choice buffer. The obvious solution is to make choice bigger, or just use stack space to read more characters each time. (you can look at the comments for a couple of approaches) Here, for example we will increase choice to 128-bytes which in the case of the anticipate "y\n" input will only make use of two of those bytes, but in the case of an excessively long input will read 128-bytes at a time until the '\n' is found. For setup you have:
choicesz equ 128
...
section .bss
text resb 50
choice resb 128
Now after you ask for (y/n)? your read would be:
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
Now there are two conditions to check. First, compare the number of characters read with your buffer size choicesz and if the number of characters read is less than choicesz, no characters remain unread in stdin. Second, if the return equals the buffer size, you may or may not have characters remaining in stdin. You need to check the last character in the buffer to see if it is the '\n' to indicate whether you have read all the input. If the last character is other than the '\n' characters remain unread (unless the user just happened to generate a manual end-of-file at the 128th character) You can check as:
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
(note: as noted above, there is a further case to cover, not covered here, such as where the user enters valid input, but then generates a manual end-of-file instead of just pressing Enter after the 128th character (or a multiple of 128). There you can't just look for a '\n' it doesn't exist, and if there are no more chacters and call sys_read again, it will block wating on input. Conceivably you will need to use a non-blocking read and putback of a single character to break that ambiguity -- that is left to you)
A comlete example with the improvements would be:
prompt db "Type your text here. ", 0x0
plen equ $-prompt
retry db "Try again (y/n)? ", 0x0
rlen equ $-retry
textsz equ 50
choicesz equ 128
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
section .bss
text resb 50
choice resb 128
section .text
global _start
_start:
mov rax, 1 ; Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ; Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, textsz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
mov r8, rax
mov rax, 1 ; Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ; Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
end:
mov rax, 60
mov rdi, 0
syscall
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
There are surely more efficient ways to optimize this, but for purposes of discussion of "How do I empty stdin?", this second approach with the buffer size used alieviates the repetitive calls to sys_read to read one character at-a-time is a good step forward. "How do it completely optimize the check?" is a whole separate question.
Let me know if you have further questions.
Footnotes:
1. In this circumstance where the user is typing input, the user generates a '\n' by pressing Enter, allowing you to check for the '\n' as the final character in emptying stdin. The user can also generate a manual end-of-file by pressing Ctrl+d so the '\n' isn't guaranteed. There are many still other ways stdin can be filled, such as redirecting a file as input where there should be a ending '\n' to be POSIX compliant, there too that isn't a guarantee.
I have written an assembly code to print numbers from 1 to 9 but the code only prints 1 and no other element other than 1 is printed and only one output is received.It means that the loop is also not being run. I cant figure out what is wrong with my code.
section .bss
lena equ 1024
outbuff resb lena
section .data
section .text
global _start
_start:
nop
mov cx,0
incre:
inc cx
add cx,30h
mov [outbuff],cx
cmp cx,39h
jg done
cmp cx,39h
jl print
print:
mov rax,1 ;sys_write
mov rdi,1
mov rsi,outbuff
mov rdx,lena
syscall
jmp incre
done:
mov rax,60 ;sys_exit
mov rdi,0
syscall
My OS is 64 bit linux. this code is built using nasm with the following commands : nasm -f elf64 -g -o num.o num.asm and ld -o num num.asm
Answer rewritten after some experimentation.
There two errors in your code, and a few inefficiencies.
First, you add 0x30 to the number (to turn it from the number 1 to the ASCII 1). However, you do that increment inside the loop. As a result, your first iteration cx is 0x31, second 0x62 ("b"), third 0x93 (invalid UTf-8 sequence) etc.
Just initialize cx to 0x30 and remove the add from inside the loop.
But there's another problem. RCX is clobbered during system calls. Replacing cx with r12 causes the program to work.
In addition to that, you pass the buffer's length to write, but it only has one character. The program so far:
section .bss
lena equ 1024
outbuff resb lena
section .data
section .text
global _start
_start:
nop
mov r12,30h
incre:
inc r12
mov [outbuff],r12
cmp r12,39h
jg done
cmp r12,39h
jl print
print:
mov rax,1 ;sys_write
mov rdi,1
mov rsi,outbuff
mov rdx,1
syscall
jmp incre
done:
mov rax,60 ;sys_exit
mov rdi,0
syscall
Except even now, the code is extremely inefficient. You have two compares on the same condition, one of them branches to the very next instruction.
Also, your code would be much much much faster and smaller if you moved the breaking condition to the end of the code. Also, cx is a 16 bit register. r12 is a 64 bit register. We actually only need 8 bits. Using larger registers than needed means all of our immediates waste up space in memory and the cache. We therefor switch to the 8 bit variant of r12. After these changes, we get:
section .bss
lena equ 1024
outbuff resb lena
section .data
section .text
global _start
_start:
nop
mov r12b,30h
incre:
inc r12b
mov [outbuff],r12b
mov rax,1 ;sys_write
mov rdi,1
mov rsi,outbuff
mov rdx,1
syscall
cmp r12b,39h
jl incre
mov rax,60 ;sys_exit
mov rdi,0
syscall
There's still lots more you can do. For example, you call the write system call 9 times, instead of filling the buffer and then calling it once (despite the fact that you've allocated a 1024 bytes buffer). It will probably be faster to initialize r12 with zero (xor r12, r12) and then add 0x30. (not relevant for the 8 bit version of the register).
I have a 64bit NASM assembly assignment to capitalize (all letters should be lowercase,except those which are at the beginning of the sentence) letters of input text. I'm totally new to assembler and I can't find anywhere how I should read each char from string incrementally, when I read the text like this:
section .data
prompt db "Enter your text: ", 10
length equ $ - prompt
text times 255 db 0
textsize equ $ - text
section .text
global main
main:
mov rax, 1
mov rdi, 1
mov rsi, prompt
mov rdx, length
syscall ;print prompt
mov rax, 0
mov rdi, 0
mov rsi, text
mov rdx, textsize
syscall ;read text input from keyboard
exit:
mov rax, 60
mov rdi, 0
syscall
Also, I'm not sure how to find out when the text is over, so I could know when I have to exit the program. Should I do some operations with text size or there is some king of special symbol which shows the EOL? Thank you for your answers.
After returning from sys_read (syscall rax=0) RAX register should contain the number of characters actually has been read. Notice, that in Linux, sys_read will return when /n is accepted, even if there is more place in the buffer provided.
Then organize a loop from 0 to RAX and process each character the way you want:
mov byte ptr [text+rax], 0 ; make the string zero terminated for future use.
mov rcx, rax ; rcx will be the character counter.
mov rsi, text ; a pointer to the current character. Start from the beginning.
process_loop:
mov al, [rsi] ; is it correct NASM syntax?
; here process al, according to your needs...
; .....
inc rsi
dec rcx
jnz process_loop
The above code can be optimized of course, for example to use string instructions or loop instructions, but IMO, this way is better for a beginner.