Joining two strings together in NASM

Joining two strings together in NASM - linux

I looked all over google for ways to do this, I found some but I really found them to be overly complex for what I need. For starters, I need this to be done through a loop, the place where I'm putting my strings is also initially empty, so I'm sure that is bound to create some issues.
Anyway this is my code:
%include "io.mac"
.DATA
filename_msg db 'Enter the file name: ', 0
number_prompt_msg db 'Enter the number of bases: ',0 ;asks for the number of bases to be used
finish_msg db 'Operation completed, DNA file generated',0 ;tells the user when the file is complete
error_msg db 'Operation failed, please try again', 10
base_A db 'A',0
base_C db 'C',0
base_G db 'G',0
base_T db 'T',0
base_length equ $ - base_A
;----------------------------------------------------------------------------------------------------
.UDATA
number_of_bases rest 1 ;defined by the user
random_number resb 1
filename: resd 20 ;defined by the user
base rest 1
file_descriptor resd 1 ;used to generate the file
characters_to_write rest 1
;-------------------------------------------------------------------
;start of code, and message prompts for the user
.CODE
.STARTUP
;asks user for filename
ask_details:
PutStr filename_msg
GetStr filename, 300
;asks user for the number of bases
PutStr number_prompt_msg
GetLInt [number_of_bases]
;------------------------------------------------------------
;file creation
mov EAX, 8 ;creates the file
mov EBX, filename
mov ECX, 644O ;octal instruction
int 80h ;kernel interrupt
cmp EAX,0 ;throws error if something is amiss
jbe error
mov [file_descriptor],EAX
mov ECX,[number_of_bases]
;-------------------------------------------------------------
;randomization of base numbers
writing_loop:
rdtsc
mov EAX, EDX
mov EDX, 0
div ECX
mov EDX, 0
mov EBX, 4
div EBX
mov [random_number], EDX
mov EDX, 0
mov EAX,[random_number]
cmp EAX,0
je assignment_A
cmp EAX,1
je assignment_C
cmp EAX,2
je assignment_G
cmp EAX,3
je assignment_T
join_char:
mov [base + EBX],EBX
loop writing_loop
PutStr base
.EXIT
;------------------------------------------------------------
;file generation error message
error:
PutStr error_msg
jmp ask_details
;------------------------------------------------------------
;assignments
assignment_A:
mov EBX, [base_A]
jmp join_char
assignment_C:
mov EBX, [base_C]
jmp join_char
assignment_T:
mov EBX, [base_T]
jmp join_char
assignment_G:
mov EBX, [base_G]
jmp join_char
First it compares some random numbers I obtained with rdtsc, depending on what comes up, it will assign a letter to EBX this letter(base_A,base_C,base_T or base_G) is then supposed to go into base. I tried using
mov [base + EBX],EBX but that just printed an empty space, I used this because it seemed to work in the examples I looked at, but I'm not really sure how concatenating works in NASM. I don't know if anyone knows any simple methods to concatenate those characters together, if it is possible. This is really minor, so I'm hoping I don't have to add a lot of code, the only thing I need this string for is to write it in a file later. I would do that without the string but I need all my registers to write to the file so I can't do it letter by letter.
EDIT: What I need to know how to do is how to join each letter once it has been picked. Base is empty for example, then after a letter is picked, it gets thrown in there, however after the loop runs again, another letter will be picked, and I need to add it to base after all that's done.

Related

To display characters in reverse order using nasm [infinite loop running]

THE PROGRAM IS USED TO ACCEPT CHARACTERS AND DISPLAY THEM IN REVERSE ORDER
The code is included here:
section .bss
num resb 1
section .text
global _start
_start:
call inputkey
call outputkey
;Output the number entered
mov eax, 1
mov ebx, 0
int 80h
inputkey:
;Read and store the user input
mov eax, 3
mov ebx, 2
mov ecx, num
mov edx, 1
int 80h
cmp ecx, 1Ch
je .sub2
push ecx
jmp inputkey
.sub2:
push ecx
ret
outputkey:
pop ecx
;Output the message
mov eax, 4
mov ebx, 1
;mov ecx, num
mov edx, 1
int 80h
cmp ecx, 1Ch
je .sub1
jmp outputkey
.sub1:
ret
The code to compile and run the program
logic.asm
is given here:
nasm -f elf logic.asm
ld -m elf_i386 -s -o logic logic.o
./logic

There are a few problems with the code. Firstly, for the sys_read syscall (eax = 3) you supplied 2 as the file descriptor, however 2 refers to stderr, but in this case you'd want stdin, which is 0 (I like to remember it as the non-zero numbers 1 and 2 being the output).
Next, an important thing to realize about the ret instruction is that it pops the value off the top of the stack and returns to it (treating it as an address). Meaning that even if you got to the .sub2 label, you'd likely get a segfault. With this in mind, the stack also tends to not be permanent storage, as in it is not preserved throughout procedures, so I'd recommend just making your buffer larger to e.g. 256 bytes and increment a value to point to an index in the buffer. (Using a fixed-size buffer will keep you from getting into the complications of memory allocation early, though if you want to go down that route you could do an external malloc call or just an mmap syscall.)
To demonstrate what I mean by an index into the reserved buffer:
section .bss
buf resb 256
; ...
inputkey:
xor esi, esi ; clear esi register, we'll use it as the index
mov eax, 3
mov ebx, 0 ; stdin file descriptor
mov edx, 1 ; read one byte
.l1: ; loop can start here instead of earlier, since the values eax, ebx and edx remain unchanged
lea ecx, [buf+esi] ; load the address of buf + esi
int 80h
cmp [buf+esi], 0x0a ; check for a \n character, meaning the user hit enter
je .e1
inc esi
jmp .l1
.e1:
ret
In this case, we also get to preserve esi up until the output, meaning that to reverse the input, we just print in descending order.
outputkey:
mov eax, 4
mov ebx, 1 ; stdout
mov edx, 1
.l2:
lea ecx, [buf+esi]
int 80h
test esi, esi ; if esi is zero it will set the ZF flag
jz .e2:
jmp .l2
.e2:
ret
Note: I haven't tested this code, so if there are any issues with it let me know.

Strange Characters when printing a string in x86 assembly language?

My code is required to have a string that will be printed to the console, alongside a string length counting program that will count it instead of manually putting length of string in edx register. But i am getting strange characters printed right after the string is printed.
global _start
section .text
_start:
mov edi, message
call _strlen
mov edx, eax
mov eax, 4
mov ebx, 1
mov ecx, message
int 80h
mov eax, 1
mov ebx, 5
int 80h
section .data
message: db "My name is Stanley Hudson", 0Ah
_strlen:
push ebx
push ecx
mov ebx, edi
xor al, al
mov ecx, 0xffffffff
repne scasb ; REPeat while Not Equal [edi] != al
sub edi, ebx ; length = offset of (edi – ebx)
mov eax, edi
pop ebx
pop ecx
ret
Here is the output

strlen searches for a 0 byte terminating the string, but your string doesn't have one, so it goes until it does find a zero byte and returns a value that's too large.
You want to write
message: db "My name is Stanley Hudson", 0Ah, 0
; ^^^
Another bug is that your _strlen function is apparently in the .data section, because you didn't go back to section .text after your string. x86-32 doesn't have the NX bit so the .data section is executable and everything still works, but it's surely not what you intend.

To get rid of the special characters write the strlen function before the start process and create a new register for the newline character

Clear input buffer Assembly x86 (NASM)

Edit: This is similar to this: Reset a string variable to print multitple user inputs in a loop (NASM Assembly). But it is not the same issue.
From the other post, I was able to prevent additional characters from being printed. However, I still cannot prevent those additional characters from being read when the program goes back to the point in which it asks the user for input.
I'm creating a program that asks an user for input, and then prints it. Afterwards, it asks the user to enter 'y' if they want to print another text, or press anything else to close the program.
My issue is that if the user enters more characters than expected, those extra characters don't go away, and when the program goes back to ask the user for input, there's no chance to enter input because the program takes the remaining characters from the last time it received input.
For example:
The user is asked to enter text to print, and they enter: "Heyyyyyyyyyyyyyyyyyyyyyyyyyyyyy"
Leftover is "yyy"
At this point, the program should ask the user to enter 'y' to repeat the process, or anything else to close the program.
Output:
Heyyyyyyyyyyyyyyyyyyyyyyyyyy
Wanna try again? If yes, enter y. If not, enter anything else to close the program
Enter your text:
Output: yy
Wanna try again? If yes, enter y. If not, enter anything else to close the program.
And only now it asks for user input again
Since "yyy" is still in buffer, the user doesn't get a chance to actually enter input in this case.
What can I do to fix this issue? I've just started to learn about this recently, so I'd appreciate any help.
Thanks.
This is what I've done
prompt db "Type your text here.", 0h
retry db "Wanna try again? If yes, enter y. If not, enter anything else to close the program"
section .bss
text resb 50
choice resb 2
section .text
global _start
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, 21
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax ;This is what I added to prevent additional characters
;from being printed
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, 83
syscall
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
mov r8b, [choice] ;If choice is different from y, go to end and close the program. Otherwhise, go back to start.
cmp byte r8b, 'y'
jne end
jmp _start
end:
mov rax, 60
mov rdi, 0
syscall

The simple way to clear stdin is to check if the 2nd char in choice is the '\n' (0xa). If it isn't, then characters remain in stdin unread. You already know how to read from stdin, so in that case, just read stdin until the '\n' is read1, e.g.
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
Beyond that, you should determine your prompt lengths when you declare them, e.g.
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
That way you do not have to hardcode lengths in case you change your prompts, e.g.
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
If you put it altogether, you can do:
prompt db "Type your text here. ", 0h
plen equ $-prompt
retry db "Try again (y/n)? ", 0h
rlen equ $-retry
section .bss
text resb 50
choice resb 2
section .text
global _start
_start:
mov rax, 1 ;Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ;Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, 50
syscall
mov r8, rax
mov rax, 1 ;Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ;Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ;Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, 2
syscall
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end
cmp byte [choice + 1], 0xa ; is 2nd char '\n' (if yes done, jump start)
je _start
empty: ; chars remain in stdin unread
mov rax, 0 ; read 1-char from stdin into choice
mov rdi, 0
mov rsi, choice
mov rdx, 1
syscall
cmp byte [choice], 0xa ; check if char '\n'?
jne empty ; if not, repeat
jmp _start
end:
mov rax, 60
mov rdi, 0
syscall
Example Use/Output
$ ./bin/emptystdin
Type your text here. abc
abc
Try again (y/n)? y
Type your text here. def
def
Try again (y/n)? yes please!
Type your text here. geh
geh
Try again (y/n)? yyyyyyyyyyyyyeeeeeeeeeesssssssss!!!!
Type your text here. ijk
ijk
Try again (y/n)? n
Now even a cat stepping on the keyboard at your (y/n)? prompt won't cause problems. There are probably more elegant ways to handle this that would be more efficient that repetitive reads, with syscall, but this will handle the issue.
Additional Considerations And Error-Checks
As mentioned above, the simplistic reading and checking of a character-at-a-time isn't a very efficient approach, though it is conceptually the easiest extension without making other changes. #PeterCordes makes a number of good points in the comments below related to approaches that are more efficient and more importantly about error conditions that can arise that should be protected against as well.
For starters when you are looking for information on the individual system call use, Anatomy of a system call, part 1 provides a bit of background on approaching their use supplemented by the Linux manual page, for read man 2 read for details on the parameter types and return type and values.
The original solution above does not address what happens if the user generates a manual end-of-file by pressing Ctrl+d or if a read error actually occurs. It simply addressed the user-input and emptying stdin question asked. With any user-input, before you use the value, you must validate that the input succeeded by checking the return. (not just for the yes/no input, but all inputs). For purposes here, you can consider zero input (manual end-of-file) or a negative return (read error) as a failed input.
To check whether you have at least one valid character of input, you can simply check the return (read returns the number of characters read, sys_read placing that value in rax after the syscall). A zero or negative value indicating no input was received. A check could be:
cmp rax, 0 ; check for 0 bytes read or error
jle error
You can write a short diagnostic to the user and then handle the error as wanted, this example simply exits after outputting a diagnostic, e.g.
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
...
; your call to read here
cmp rax, 0 ; check for 0 bytes read or error
jle error
...
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
Now moving on to a more efficient manner for emptying stdin. The biggest hindrance indicate in the original answer was the repeated system calls to sys_read to read one character at a time reusing your 2-byte choice buffer. The obvious solution is to make choice bigger, or just use stack space to read more characters each time. (you can look at the comments for a couple of approaches) Here, for example we will increase choice to 128-bytes which in the case of the anticipate "y\n" input will only make use of two of those bytes, but in the case of an excessively long input will read 128-bytes at a time until the '\n' is found. For setup you have:
choicesz equ 128
...
section .bss
text resb 50
choice resb 128
Now after you ask for (y/n)? your read would be:
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
Now there are two conditions to check. First, compare the number of characters read with your buffer size choicesz and if the number of characters read is less than choicesz, no characters remain unread in stdin. Second, if the return equals the buffer size, you may or may not have characters remaining in stdin. You need to check the last character in the buffer to see if it is the '\n' to indicate whether you have read all the input. If the last character is other than the '\n' characters remain unread (unless the user just happened to generate a manual end-of-file at the 128th character) You can check as:
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
(note: as noted above, there is a further case to cover, not covered here, such as where the user enters valid input, but then generates a manual end-of-file instead of just pressing Enter after the 128th character (or a multiple of 128). There you can't just look for a '\n' it doesn't exist, and if there are no more chacters and call sys_read again, it will block wating on input. Conceivably you will need to use a non-blocking read and putback of a single character to break that ambiguity -- that is left to you)
A comlete example with the improvements would be:
prompt db "Type your text here. ", 0x0
plen equ $-prompt
retry db "Try again (y/n)? ", 0x0
rlen equ $-retry
textsz equ 50
choicesz equ 128
readerr db 0xa, "eof or read error", 0xa, 0x0
rderrsz equ $-readerr
section .bss
text resb 50
choice resb 128
section .text
global _start
_start:
mov rax, 1 ; Just asking the user to enter input
mov rdi, 1
mov rsi, prompt
mov rdx, plen
syscall
mov rax, 0 ; Getting input and saving it on var text
mov rdi, 0
mov rsi, text
mov rdx, textsz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
mov r8, rax
mov rax, 1 ; Printing the user input
mov rdi, 1
mov rsi, text
mov rdx, r8
syscall
mov rax, 1 ; Asking if user wants to try again
mov rdi, 1
mov rsi, retry
mov rdx, rlen
syscall
mov rax, 0 ; Getting input and saving it on var choice
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read (eof) or error
jle error
cmp byte [choice], 'y' ; check 1st byte of choice, no need for r8b
jne end ; not a 'y', doesn't matter what's in stdin, end
empty:
cmp eax, choicesz ; compare chars read and buffer size
jb _start ; buffer not full - nothing remains in stdin
cmp byte [choice + choicesz - 1], 0xa ; if full - check if last byte \n, done
je _start
mov rax, 0 ; fill choice again from stdin and repeat checks
mov rdi, 0
mov rsi, choice
mov rdx, choicesz
syscall
cmp rax, 0 ; check for 0 bytes read or error
jle error
jmp empty
end:
mov rax, 60
mov rdi, 0
syscall
error:
mov rax, 1 ; output the readerr string and jmp to end
mov rdi, 1
mov rsi, readerr
mov rdx, rderrsz
syscall
jmp end
There are surely more efficient ways to optimize this, but for purposes of discussion of "How do I empty stdin?", this second approach with the buffer size used alieviates the repetitive calls to sys_read to read one character at-a-time is a good step forward. "How do it completely optimize the check?" is a whole separate question.
Let me know if you have further questions.
Footnotes:
1. In this circumstance where the user is typing input, the user generates a '\n' by pressing Enter, allowing you to check for the '\n' as the final character in emptying stdin. The user can also generate a manual end-of-file by pressing Ctrl+d so the '\n' isn't guaranteed. There are many still other ways stdin can be filled, such as redirecting a file as input where there should be a ending '\n' to be POSIX compliant, there too that isn't a guarantee.

How to read each char from string NASM assembly 64bit linux

I have a 64bit NASM assembly assignment to capitalize (all letters should be lowercase,except those which are at the beginning of the sentence) letters of input text. I'm totally new to assembler and I can't find anywhere how I should read each char from string incrementally, when I read the text like this:
section .data
prompt db "Enter your text: ", 10
length equ $ - prompt
text times 255 db 0
textsize equ $ - text
section .text
global main
main:
mov rax, 1
mov rdi, 1
mov rsi, prompt
mov rdx, length
syscall ;print prompt
mov rax, 0
mov rdi, 0
mov rsi, text
mov rdx, textsize
syscall ;read text input from keyboard
exit:
mov rax, 60
mov rdi, 0
syscall
Also, I'm not sure how to find out when the text is over, so I could know when I have to exit the program. Should I do some operations with text size or there is some king of special symbol which shows the EOL? Thank you for your answers.

After returning from sys_read (syscall rax=0) RAX register should contain the number of characters actually has been read. Notice, that in Linux, sys_read will return when /n is accepted, even if there is more place in the buffer provided.
Then organize a loop from 0 to RAX and process each character the way you want:
mov byte ptr [text+rax], 0 ; make the string zero terminated for future use.
mov rcx, rax ; rcx will be the character counter.
mov rsi, text ; a pointer to the current character. Start from the beginning.
process_loop:
mov al, [rsi] ; is it correct NASM syntax?
; here process al, according to your needs...
; .....
inc rsi
dec rcx
jnz process_loop
The above code can be optimized of course, for example to use string instructions or loop instructions, but IMO, this way is better for a beginner.

linux nasm print multiple characters

I am trying to write a program that will allow me to print multiple characters (strings of characters or integers). The problem that I am having is that my code only prints one of the characters, and then newlines and stays in an infinite loop. Here is my code:
SECTION .data
len EQU 32
SECTION .bss
num resb len
output resb len
SECTION .text
GLOBAL _start
_start:
Read:
mov eax, 3
mov ebx, 1
mov ecx, num
mov edx, len
int 80h
Point:
mov ecx, num
Print:
mov al, [ecx]
inc ecx
mov [output], al
mov eax, 4
mov ebx, 1
mov ecx, output
mov edx, len
int 80h
cmp al, 0
jz Exit
Clear:
mov eax, 0
mov [output], eax
jmp Print
Exit:
mov eax, 1
mov ebx, 0
int 80h
Could someone point out what I am doing wrong?
Thanks,
Rileyh

In the first time you enter the Print section, ecx is pointing to the start of the string and you use it to copy a single character to the start of the output string. But a few more instructions down, you overwrite ecx with the pointer to the output string, and never restore it, therefore you never manage to copy and print the rest of the string.
Also, why are you calling write() with a single character string with the aim to loop over it to print the entire string? Why not just pass num directly in instead of copying a single character to output and passing that?

In your last question, you showed message as a zero-terminated string, so cmp al, 0 would indicate the end of the string. sys_read does NOT create a zero-terminated string! (we can stuff a zero in there if we need it - e.g. as a filename for sys_open) sys_read will read a maximum of edx characters. sys_read from stdin returns when, and only when, the "enter" key is hit. If fewer than edx characters were entered, the string is terminated with a linefeed character (10 decimal or 0xA or 0Ah hex) - you could look for that... But, if the pesky user types more than edx characters, only edx characters go into your buffer, the "excess" remains in the OS's buffer (and can cause trouble later!). In this case your string is NOT terminated with a linefeed, so looking for it will fail. sys_read returns the number of characters actually read - up to edx - including the linefeed - in eax. If you don't want to include the linefeed in the length, you can decrement eax.
As an experiment, do a sys_read with some small number (say 4) in edx, then exit the program. Type "abcdls"(enter) and watch the "ls" be executed. If some joker typed "abcdrm -rf ."... well, don't!!!
Safest thing is to flush the OS's input buffer.
mov ecx, num
mov edx, len
mov ebx, 1
mov eax, 3
int 80h
cmp byte [ecx + eax - 1], 10 ; got linefeed?
push eax ; save read length - doesn't alter flags
je good
flush:
mov ecx, dummy_buf
mov edx, 1
mov ebx, 1
mov eax, 3
int 80h
cmp byte [ecx], 10
jne flush
good:
pop eax ; restore length from first sys_read
Instead of defining dummy_buf in .bss (or .data), we could put it on the stack - trying to keep it simple here. This is imperfect - we don't know if our string is linefeed-terminated or not, and we don't check for error (unlikely reading from stdin). You'll find you're writing much more code dealing with errors and "idiot user" input than "doing the work". Inevitable! (it's a low-level language - we've gotta tell the CPU Every Single Thing!)
sys_write doesn't know about zero-terminated strings, either! It'll print edx characters, regardless of how much garbage that might be. You want to figure out how many characters you actually want to print, and put that in edx (that's why I saved/restored the original length above).
You mention "integers" and use num as a variable name. Neither of these functions know about "numbers" except as ascii codes. You're reading and writing characters. Converting a single-digit number to and from a character is easy - add or subtract '0' (48 decimal or 30h). Multiple digits are more complicated - look around for an example, if that's what you need.
Best,
Frank

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string