NASM append to string with memory immediately after - string

For context I'm using NASM on a 64 bit Debian distro.
I'm still learning Assembly as part of writing my own programming language but I recently ran into a problem that I'm not sure how to handle. The following is a snippet of code that my compiler spits out:
section .text
global _start
section .var_1 write
char_1 db 'Q', 0
section .var_2 write
string_1 db 'Asdf', 0
section .var_3 write
char_2 db 'W', 0
section .text
_start:
push 4 ; String length onto stack
push string_1
;; Push a raw char onto the stack
mov bl, [char_1]
push bx
pop ax
pop rbx
pop rcx
mov byte [rbx+rcx], al
If I then print out the value of string_1, I see AsdfWQ. As I understand it, this is because of the mov command I am using to append combined with the fact that I have some data declared after the string's termination character. I've been trying to search around on Google with no luck about how to resolve this problem (partially because I don't know exactly what to search for). Conceptually I would think I could move the address of everything after string_1 by the offset of the length of what I'm appending but this seems highly inefficient if I had something like 40 different pieces of data after that. So what I'm trying to sort out is, how do I manage dynamic data that could increase or decrease in size in assembly?
Edit
Courtesy of fuz pointing out that dynamic memory allocation via the brk calls works, I've revised the program a little but am still experience come issues:
section .var_1 write
hello_string db '', 0
section .var_2 write
again_string db 'Again!', 0
section .text
_start:
;; Get current break address
mov rdi, 0
mov rax, 12
syscall
;; Attempt to allocate 8 bytes for string
mov rdi, rax
add rdi, 8
mov rax, 12
syscall
;; Set the memory address to some label
mov qword [hello_string], rax
;; Try declaring a string
mov byte [hello_string], 'H'
mov byte [hello_string+1], 'e'
mov byte [hello_string+2], 'l'
mov byte [hello_string+3], 'l'
mov byte [hello_string+4], 'o'
mov byte [hello_string+5], ','
mov byte [hello_string+6], ' '
mov byte [hello_string+7], 0
;; Print the string
mov rsi, hello_string
mov rax, 1
mov rdx, 8
mov rdi, 1
syscall
;; Print the other string
mov rsi, again_string
mov rax, 1
mov rdx, 5
mov rdi, 1
syscall
This results in Hello, ello, which means that I'm still overwriting data associated with the again_string label? But I was under the impression that using brk to allocate would do so after the data had been initialized?

Related

print out skewed text in x86, but can't see why the 'monitored command dumped core' timeout occurs

I am trying to write skewed Text on Terminal via assembly to get my skills going.
Example [] = spaces :
H
[]E
[][]L
[][][] L
[][][][] O
But I seem to run into a timeout, because I am trying to move a character between registers / get the character via memory address - like getting a char at an index. But it won't really work. I tried working with allocating a byte to increment in the .bss section before, but that lead to a different error which I couldn't work around (relocation truncated to fit: R_X86_64_8 against `.bss')
Maybe there is a better approach to handle this, and if you could help me get on the right path I would be so so happy. To get more context, here is the 'problematic' snippet of my latest code.
inc rcx ; increase pointer of rcx
space_loop:
push rdi ;save destination register
cmp byte [rcx], 0 ;end of string found
je eos ; jump to exit subroutine
mov rax, 4; SYS Write
mov rbx, 0 ; STDOUT
cmp rsp,[rbp] ; compare spaceIndex to String Index
je print_char
mov rdi, space ;print space
mov rdx,1 ; length = 1
syscall
inc byte [rsp] ; increase spaceIndex
jnz space_loop
print_char:
;print out char
mov rax, 4
mov rbx, 0
mov rcx,[rcx] ; <- this line breaks it dont know how to access index here
mov rdx, 1
syscall
call new_Line ; make line break
inc rcx ; increase rcx counter
inc byte [rbp] ; increase index counter
mov byte [rsp], 0 ;reset space counter
jmp space_loop; next Line

Why is this register value in x86 assembly from user input different than expected?

Whenever the user inputs s, the expected value in the rax register that the buffer is moved to would be 73, but instead it is a73. Why is this? I need these two values to be equal in order to perform the jumps I need for the user input menu.
On any user input, the information in the register is always preceded by an a, while the register that I use to check for the value is not. This makes it impossible to compare them for a jump.
Any suggestions?
section .data
prompt: db 'Enter a command: '
section .bss
buffer: resb 100; "reserve" 32 bytes
section .text ; code
global _start
_start:
mov rax, 4 ; write
mov rbx, 1 ; stdout
mov rcx, prompt ; where characters start
mov rdx, 0x10 ; 16 characters
int 0x80
mov rax, 3 ; read
mov rbx, 0 ; from stdin
mov rcx, buffer ; start of storage
mov rdx, 0x10; no more than 64 (?) chars
int 0x80
mov rax, [buffer]
mov rbx, "s"
cmp rax, rbx
je _s
; return to Linux
mov rax, 1
mov rbx, 0
int 0x80
_s:
add r8, [buffer]
; dump buffer that was read
mov rdx, rax ; # chars read
mov rax, 4 ; write
mov rbx, 1 ; to stdout
mov rcx, buffer; starting point
int 0x80
jmp _start
If the user types s, followed by <enter>, the memory starting at the address of buffer will contain bytes ['s', '\n', '\0', '\0', ...] (where the newline byte '\n' is from pressing <enter> and the null bytes '\0' are from the .bss section being initialized to 0). As integers, represented in hex, the corresponding values in memory are [0x73, 0x0A, 0x00, 0x00, ...].
The mov rax, [buffer] instruction will copy 8 bytes of memory starting at the address of buffer to the rax register. The byte ordering is little endian on x86, so the 8 bytes will be loaded from memory in reversed order, resulting in rax having 0x0000000000000A73.
Workarounds
This workaround is based on Peter Cordes's comment below. The idea is to compare 1) the first byte starting at the address of buffer with 2) the byte 's'. This would replace the three lines in your question that 1) move [buffer] to rax, 2) move 's' to rbx, and 3) cmp rax, rbx.
cmp byte [buffer], 's'
je _s
This would check that the first character entered is 's', even if followed by other characters. If your intent is to check that only a single character 's' is entered (optionally followed by '\n' in the case that <enter> was pressed to end the input, as opposed to <ctrl-d>), a more thorough approach could utilize the value returned by the read system call, which indicates how many bytes were read.
Without checking how many characters are read, you might want to clear the buffer on each iteration. As is, a user could enter 's' on one iteration, followed by <ctrl-d> on the next iteration, and the buffer would still start with an 's'.
Band-aid Workarounds
(I had originally proposed the following two ideas as workarounds, but they have their own problems that Peter Cordes's identifies in the comments below)
To work around the issue, one option could be to add the newline to your target for comparison.
mov rax, [buffer]
mov rbx, `s\n` ; the second operand was formerly "s"
cmp rax, rbx
je _s
Alternatively, specifying that the read system call only consume 1 byte could be another approach to address the issue.
mov rax, 3 ; read
mov rbx, 0 ; from stdin
mov rcx, buffer ; start of storage
mov rdx, 0x01 ; the second operand was formerly 0x10
int 0x80

Update the string that has already been printed every second in NASM

I am completely new to the assembly thing and googling for several hours and searching on SO didn't clear things out so I came to ask here.
What I want to achieve:
[first second]: hello (stays on the screen for 1 second)
[second second]: world (hello disappeared and now we have `world` in place of it)
And this flow is in an infinite loop
In other words, I want my terminal's stdout to flicker(change) between hello and world without appending any newlines, writing strings or any other things, I just want the existing, already printed text to be replaced with some another text.
I have written the infinite loop that will print hello, then wait for a second, then print world, and wait for a second. I have also put this code in an infinite loop.
Here is the code I have as of now:
section .data
hello db "hello",10,0
world db "world",10,0
delay dq 1,0
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello
mov rdx, 6
syscall
mov rax, 35
mov rdi, delay
mov rsi, 0
syscall
mov rax, 1
mov rdi, 1
mov rsi, world
mov rdx, 6
syscall
mov rax, 35
mov rdi, delay
mov rsi, 0
syscall
call _start
Note that I use elf64 asm format and it is highly appreciated to suggest a solution in that format.
Print a \r carriage return (ASCII code 13) to put the cursor at the beginning of the line without scrolling the terminal the way \n line feed (ASCII 10) does.
Then you can overwrite the last thing you wrote. If it's shorter, you can print spaces after your visible characters to "erase" the later characters still there.
e.g. since both words are the same length you could do this:
section .rodata
hello db 13, "hello" ; `\rhello`
hello_len equ $ - hello
world db 13, "world"
world_len equ $ - world
Notice that , 0 is not needed in your data because you're passing these buffers to write not printf, so they don't need to be 0-terminated implicit-length C strings. You also don't need to hard-code mov rdx, 6, you can use mov rdx, hello_len with the assembler calculating that for you.
For the sleeping part, you can use the sleep libc function, but for raw system calls you'll have to use nanosleep. (Like you're already doing.)
For the looping, don't use call _start; use jmp. You don't want to push a return address; that would eventually stack overflow (after about 1 million seconds: 8MiB stack size limit and call pushes an 8-byte return address.)
Solved this using helpful guidance of Peter Cordes.
The working code looks like this:
section .data
hello db "hello",13,0
world db "world",13,0
delay dq 1,0
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello
mov rdx, 6
syscall
mov rax, 35
mov rdi, delay
mov rsi, 0
syscall
mov rax, 1
mov rdi, 1
mov rsi, world
mov rdx, 6
syscall
mov rax, 35
mov rdi, delay
mov rsi, 0
syscall
jmp _start

strlen in NASM Linux

Excuse me again. I am trying understand learn assembly languaje. However I have many problems. I am trying working with strings in NASM. I have copy a string constant to string variable. The maximum size is 50. So I want verify this bound. However this program throw a segmentation fault. I use a example in MASM, so perhaps exist a use error with NASM syntax.
My program is the following:
section .data
MAXTEXTSIZE equ 50
_cte_hola db "Hola", 0
_cte_mundo db "Mundo", 0
section .bss
MAIN_d resb MAXTEXTSIZE+1
section .text
global _start
strlen:
mov bx, 0
strl01:
cmp WORD [SI+BX],0 t
je strend
inc bx
jmp strl01
strend:
ret
strcpy:
call strlen
cmp bx, MAXTEXTSIZE
jle copiarsizeok
mov bx, MAXTEXTSIZE
copiarsizeok:mov cx, bx
cld
rep movsb
mov al,0
mov BYTE [DI], al
ret
_start:
mov ds, ax
mov es, ax
mov si, [MAIN_d]
mov di, [_cte_hola]
call strcpy
mov eax, 1
mov ebx, 0
int 80h
Thanks in advance and excuse me. My question are stupid for a assembly programmer.
I believe you are trying to make 32bit program in Linux, but your examples are 16bit.
In Linux, all pointers are 32bit. So, use extended registers: esi, edi, ebx etc. You still can use 8 and 16bit registers for arithmetics and data processing but not as memory pointers.
In strlen you have to compare byte [esi+ebx], 0 not word.
Don't set the segment registers in Linux. They will be set by the OS and you can't touch them. In Linux all memory is one flat area and you don't have to use segment registers anymore.
Here's a more concrete example of how you could write your strlen function (which is the first of your problems)
section .data
MAXTEXTSIZE equ 50
_cte_hola db "Hola", 0xa, 0
_cte_mundo db "Mundo", 0
section .bss
MAIN_d resb MAXTEXTSIZE+1
section .text
global _start
strlen:
mov ebx, 0
strlen_loop:
cmp BYTE [esi+ebx], 0
je strlen_end
inc ebx
jmp strlen_loop
strlen_end:
mov eax, ebx
ret
_start:
mov esi, _cte_hola
call strlen ; Get the length of _cte_hola
mov edx, eax ; The length was stored in eax by strlen
mov ecx, _cte_hola
mov ebx,1
mov eax, 4
int 0x80 ; Write to stdout
mov eax, 1
int 0x80 ; Exit
There are definitely better ways of implementing this (I'd use repne to implement strlen, for example) but I wanted to keep it close to your implementation.
Hope this helps!

How to read each char from string NASM assembly 64bit linux

I have a 64bit NASM assembly assignment to capitalize (all letters should be lowercase,except those which are at the beginning of the sentence) letters of input text. I'm totally new to assembler and I can't find anywhere how I should read each char from string incrementally, when I read the text like this:
section .data
prompt db "Enter your text: ", 10
length equ $ - prompt
text times 255 db 0
textsize equ $ - text
section .text
global main
main:
mov rax, 1
mov rdi, 1
mov rsi, prompt
mov rdx, length
syscall ;print prompt
mov rax, 0
mov rdi, 0
mov rsi, text
mov rdx, textsize
syscall ;read text input from keyboard
exit:
mov rax, 60
mov rdi, 0
syscall
Also, I'm not sure how to find out when the text is over, so I could know when I have to exit the program. Should I do some operations with text size or there is some king of special symbol which shows the EOL? Thank you for your answers.
After returning from sys_read (syscall rax=0) RAX register should contain the number of characters actually has been read. Notice, that in Linux, sys_read will return when /n is accepted, even if there is more place in the buffer provided.
Then organize a loop from 0 to RAX and process each character the way you want:
mov byte ptr [text+rax], 0 ; make the string zero terminated for future use.
mov rcx, rax ; rcx will be the character counter.
mov rsi, text ; a pointer to the current character. Start from the beginning.
process_loop:
mov al, [rsi] ; is it correct NASM syntax?
; here process al, according to your needs...
; .....
inc rsi
dec rcx
jnz process_loop
The above code can be optimized of course, for example to use string instructions or loop instructions, but IMO, this way is better for a beginner.

Resources