linux nasm assembly isolating characters in a string

linux nasm assembly isolating characters in a string - linux

If I had a string declared like this:
message db "ABCDEFGHIJ",0
how could I create a pointer that would allow me to point to a specific character in this string, such as the 'A' character. And also, how could I create a loop that would allow me to increment the pointer and as a result, cycle through the entire string?

mov ecx, message ; Masm would use "offset"
top:
mov al, [ecx] ; get a character
inc ecx ; get ready for next one
cmp al, 0 ; end of string?
jz done
; do something intelligent with al
jmp top
done:

Related

How to skip spaces at the beginning of the string in assembly x86

What loop should I write If I want to skip all spaces at the beginning of the string and start to do something with the string when my code reaches the first symbol of the string. If I have a string like
a=' s o m e word'
the code should start when 's' is reached. It should be some kind of loop but I still don't know how to write it correctly.
My try:
mov si, offset buff+2 ;buffer
mov ah, [si]
loop_skip_space:
cmp ah,20h ;chech if ah is space
jnz increase ;if yes increase si
jmp done ;if no end of loop
increase:
inc si
loop loop_skip_space
done:

In this 16-bit code that fetches its string offset at buff+2, I believe it's a safe bet to consider this string to belong to the input gotten from executing the DOS.BufferedInput function 0Ah.
The code snippets below are based on this assumption. Next observations about the OP's code remain valid anyway.
The mov ah, [si] must be part of the loop. You need to verify different characters from the string, so loading following bytes is necessary.
Your code should exit the loop upon finding a non-space character. Currently you exit upon the first space.
The loop instruction requires setting up the CX register with the length of the string. You omitted that.
mov si, offset buff+2
mov cl, [si-1]
mov ch, 0
jcxz done ; String could start off empty
loop_skip_space:
cmp byte ptr [si], ' '
jne done
inc si ; Skip the current space
loop loop_skip_space
done:
Here SI points at the first non-space character in the string with CX characters remaining. CX could be zero!
You can write better code if you stop using the loop instruction, because it's an instruction that is said to be slow. See Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?.
Avoiding loop and no longer requiring the use of CX
mov si, offset buff+2
mov cl, [si-1]
test cl, cl
jz done ; String could start off empty
loop_skip_space:
cmp byte ptr [si], ' '
jne done
inc si ; Skip the current space
dec cl
jnz loop_skip_space
done:
Avoiding loop and using the delimiter 13 (carriage return)
mov si, offset buff+1
loop_skip_space:
inc si
cmp byte ptr [si], ' '
je loop_skip_space

look at the example
STRLEN EQU 9
STRING DB 'Assembler'
CLD ;clear direction flag
MOV AL,' ' ;symbol to find.
MOV ECX,STRLEN ;length of string
LEA EDI,STRING ;string pointer.
REPE SCASB ;search itself
JNZ K20 ;jump if not found
DEC EDI ;
; EDI points to your first not space char
K20: RET

Reversing a string using stack in x86 NASM

I'm trying to write a function in x86 NASM assembly which reverses order of characters in a string passed as argument. I tried implementing it using stack but ended up getting error message
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
Code below:
section .text
global reverse
reverse:
push ebp ; epilogue
mov ebp, esp
mov eax, [ebp+8]
xor ecx, ecx ; ecx = 0
push ebx ; saved register
push_eax:
mov edx, [eax] ; edx = char at eax
test edx, edx
jz inc_eax ; if edx == 0, move eax pointer back and pop eax
push edx
inc eax
inc ecx ; counter + 1
jmp push_eax
inc_eax:
sub eax, ecx ; move eax back to beginning of string
mov ebx, ecx ; to move eax back at the end of function
pop_eax:
test ecx, ecx ; loop counter == 0
jz end
pop edx
mov [eax], edx ; char at eax = edx
inc eax ; eax++
dec ecx ; ecx--
jmp pop_eax
end:
sub eax, ebx
pop ebx ; saved register
mov esp, ebp
pop ebp
ret
C declaration:
extern char* reverse(char*);
I've read somewhere that you get this error when trying to for instance write something in an array that is longer than allocated but i don't see how would that function do it? Also when instead of using ebx at the end I manually move the pointer in eax back (string in C of length 9 -> sub eax, 9) I get the reversed string at the output followed by 2nd, 3rd and 4th char. (No matter the length of the string I declare in C). So for instanceinput: "123456789"
output: "987654321234" but that only happens when I move eax manually, using ebx like in the code above outputs some trash.

Peter's answer is the answer you are looking for. However, may I comment on the technique? Must you use the stack? Do you already know the length of the string, or must you calculate/find that yourself?
For example, if you already know the length of the string, can you place a pointer at the first and another at the end and simply exchange the characters, moving each pointer toward the center until they meet? This has the advantage of not assuming there is enough room on the stack for the string. In fact, you don't even touch the stack except for the prologue and epilogue. (Please note you comment that the epilogue is at the top, when it is an 'ending' term.)
If you do not know the length of the string, to use the above technique, you must find the null char first. By doing this, you have touched each character in the string already, before you even start. Advantage, it is now loaded in to the cache. Disadvantage, you must touch each character again, in essence, reading the string twice. However, since you are using assembly, a repeated scasb instruction is fairly fast and has the added advantage of auto-magically placing a pointer near the end of the string for you.
I am not expecting an answer by asking these questions. I am simply suggesting a different technique based on certain criteria of the task. When I read the question, the following instantly came to mind:
p[i] <-> p[n-1]
i++, n--
loop until n <= i
Please note that you will want to check that 'n' is actually greater than 'i' before you make the first move. i.e.: it isn't a zero length string.

If this is a string of 1-byte characters, you want movzx edx, byte [eax] byte loads and mov [eax], dl byte stores.
You're doing 4-byte stores, which potentially steps on bytes past the end of the array. You also probably overread until you find a whole dword on the stack that's all zero. test edx, edx is fine if you correctly zero-extended a byte into EDX, but loading a whole word probably resulted in overread.
Use a debugger to see what you're doing to memory around the input arg.
(i.e. make sure you aren't writing past the end of the array, which is probably what happened here, stepping on the buffer-overflow detection cookie.)

How to split / truncate a string variable value in Assembly?

I am currently working on a project and for storage's sake I would like to cut off a variable in assembly, and (optionally) make that the value of a register, such as eax.
I will need code that works with NASM using Intel syntax.
For example, if the variable "msg" is set to "29ak49", I want to take a part of that, like "9ak4", and put it in a register, or something similar.

As Peter Cordes mentioned in the comments, you can always add a null terminator (0) into the existing string's buffer to right truncate; if you don't mind modifying the original string data.
The example below will retrieve a substring without modifying the original string.
If you have the address of a variable, and you know where you want to truncate it, you can take the address of the starting position of the data, and add an offset to left truncate. To right truncate you can just read as many characters as you need from the new offset.
For example in x86:
msg db '29ak49' ; create a string (1 byte per char)
;; left truncate
mov esi, msg ; get the address of the start of the string
add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)
;; right truncate
mov edi, NUM_CHARS ; number of characters to take
.loop:
movzx eax, byte [esi] ; get the value of the next character
;; do something with the character in eax
inc esi
dec edi
jnz .loop
;; end loop
EDIT:
The following is a runable test implementation as a 32-bit Linux application that prints out the substring selected based on OFFSET_INTO_DATA and NUM_CHARS (note: the algorithm is the same, but the registers have changed):
section .text
global _start
_start:
;; left truncate
mov esi, msg ; get the address of the start of the string
add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)
;; right truncate
mov edi, NUM_CHARS ; number of characters to take
.loop:
mov ecx, esi ; get the address of the next character
call print_char_32
inc esi
dec edi
jnz .loop
jmp halt
;;; input: ecx -> character to display
print_char_32:
mov edx, 1 ; PRINT
mov ebx, 1 ;
mov eax, 4 ;
int 0x80 ;
ret
halt:
mov eax, 1 ; EXIT
int 0x80 ;
jmp halt
section .data
msg db '29ak49' ; create a string (1 byte per char)
OFFSET_INTO_DATA EQU 1
NUM_CHARS EQU 3
Compiled with:
nasm -f elf substring.asm ; ld -m elf_i386 -s -o substring substring.o

Printing a string in assembly using no predefined function

I have to define a function in assembly that will allow me to loop through a string of declared bytes and print them using a BIOS interrupt. I'm in 16 bit real mode. This is an exercise on writing a little bootloader from a textbook, but it seems that it was only a draft and it's missing some stuff.
I have been given the following code:
org 0x7c00
mov bx, HELLO_MSG
call print_string
mov bx, GOODBYE_MSG
call print_string
jmp $ ;hang so we can see the message
%include "print_string.asm"
HELLO_MSG:
db 'Hello, World!', 0
GOODBYE_MSG:
db 'Goodbye!', 0
times 510 - ($ - $$) db 0
dw 0xaa55
My print_string.asm looks like this:
print_string:
pusha
mov ah, 0x0e
loop:
mov al, bl
cmp al, 0
je return
int 0x10
inc bx
jmp loop
return:
popa
ret
I have some idea of what I'm doing, but the book doesn't explain how to iterate through something. I know how to do it in C but this is my first time using assembly for something other than debugging C code. What happens when I boot through the emulator is that it prints out a couple of lines of gibberish and eventually hangs there for me to see my failure in all it's glory. Hahaha.

Well, it looks like it loads the address of the string into the BX register before calling the function.
The actual function looks like it is trying to loop through the string, using BX as a pointer and incrementing it (inc bx) until it hits the ASCII NUL at the end of the string (cmp al, 0; je return)...
...but something is wrong. The "mov al, bl" instruction does not look correct, because that would move the low 8 bits of the address into al to be compared for an ASCII NUL, which does not make a lot of sense. I think it should be something more like "mov al, [bx]"; i.e. move the byte referenced by the BX address into the AL register -- although it has been a long time since I've worked with assembly so I might not have the syntax correct.
Because of that bug, the 10h interrupt would also be printing random characters based on the address of the string rather than the contents of the string. That would explain the gibberish you're seeing.

I think the issue is you cannot count on the int preserving any of your registers, so you need to protect them. Plus, what Steven pointed out regarding loading of your string address:
; Print the string whose address is in `bx`, segment `ds`
; String is zero terminated
;
print_string:
pusha
loop:
mov al, [bx] ; load what `bx` points to
cmp al, 0
je return
push bx ; save bx
mov ah, 0x0e ; load this every time through the loop
; you don't know if `int` preserves it
int 0x10
pop bx ; restore bx
inc bx
jmp loop
return:
popa
ret

How do I convert a string representing a signed hex int into its signed int Doubleword number in 80x86?

So I am taking an assembly language course and I am stuck on this problem from the book:
Using the windows 32 console (so I have an io.h to use), I am supposed to take a valid hex value inputted by the user and then display the actual hex value in the register EAX. So if the user entered "AB CD E2 18", then after the procedure EAX would hold the value: ABCDE218.
The parts that I am stuck on are the A-F values. If I use A for example, I can get the bits to read 00000010, but I don't know how to change that into its hex value A. Here is what I have so far:
.586
.MODEL FLAT
.CODE
hexToInt PROC
push ebp ; save base pointer
mov ebp, esp ; establish stack frame
sub esp, 4 ; local space for sign
push ebx ; Save registers
push edx
push esi
pushfd ; save flags
mov esi,[ebp+8] ; get parameter (source addr)
WhileBlankD:
cmp BYTE PTR [esi],' ' ; space?
jne EndWhileBlankD ; exit if not
inc esi ; increment character pointer
jmp WhileBlankD ; and try again
EndWhileBlankD:
mov eax,1 ; default sign multiplier
IfPlusD:cmp BYTE PTR [esi],'+' ; leading + ?
je SkipSignD ; if so, skip over
IfMinusD:
cmp BYTE PTR [esi],'-' ; leading - ?
jne EndIfSignD ; if not, save default +
mov eax,-1 ; -1 for minus sign
SkipSignD:
inc esi ; move past sign
EndIfSignD:
mov [ebp-4],eax ; save sign multiplier
mov eax,0 ; number being accumulated
WhileDigitD:
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
and ebx,0000000Fh ; convert to single-digit integer
and eax, ebx
shl eax, 4
inc esi
jmp WhileDigitD
TestForHexD:
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl, [esi]
sub bl, 31h
and ebx, 000000FFh
or al, bl
shl eax, 4
inc esi ; increment character pointer
jmp WhileDigitD ; go try next character
EndWhileDigitD:
; if value is < 80000000h, multiply by sign
cmp eax,80000000h ; 80000000h?
jnb endIfMaxD ; skip if not
imul DWORD PTR [ebp-4] ; make signed number
endIfMaxD:
popfd ; restore flags
pop esi ; restore registers
pop edx
pop ebx
mov esp, ebp ; delete local variable space
pop ebp
ret ; exit
hexToInt ENDP
END
The TestForHex label is where I am trying to convert the ASCII string to hex. I was looking around and read that I could accomplish my goal by shifting and masking, but I can't figure it out and I can't find any examples. At this point I am sure its something really small that I am just over looking, but I am stuck.

There are some bugs in your code.
First, in 0 ... 9 string to integer conversion code, you don't do ASCII to binary conversion as you should do, but instead you do and ebx,0Fh, which is incorrect. You need to subtract '0' (30h) from each ASCII character, like this:
mov bl,[esi]
sub bl,'0' ; sub bl,30h
Then, also in 0 ... 9 string to integer conversion code:
and eax, ebx
If the number consists of only 0...9 digits, and eax, ebx will produce always 0. It should be:
or al,bl
Then, you do shl eax,4, even if you don't know if there will be more digits. That means that the number will be 16 times bigger than it should.
Then, you give the example input with spaces, but your code does not handle spaces (20h) properly, it ends reading input for any value below '0' (30h), it seems to accept only leading spaces (skip this if you don't want to accept spaces in between).
So, the entire code block above should be:
WhileDigitD:
cmp byte ptr [esi], ' ' ; delete this if you don't want spaces in between.
je next_char ; delete this if you don't want spaces in between.
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
sub bl,'0' ; sub bl,30h -> convert ASCII to binary.
shift_eax_by_4_and_add_bl:
shl eax,4 ; shift the current value 4 bits to left.
or al,bl ; add the value of the current digit.
next_char:
inc esi
jmp WhileDigitD
I also added labels next_char and shift_eax_by_4_and_add_bl. The reason for next_char should be evident, shift_eax_by_4_and_add_bl is to minimize duplicate code of 0...9 and A...F code blocks, see below.
You don't check that that the hexadecimal A...F digit is within range A ... F, only that it's below or equal to F. Otherwise it has same bug with shl eax,4. And as usually duplicate code should be avoided, I added shift_eax_by_4_and_add_bl label to minimize duplicate code.
So I think it should be:
Edit: corrected sub bl,31h -> sub bl,('A'-0Ah).
TestForHexD:
cmp BYTE PTR [esi], 'A'
jb EndWhileDigitD
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl,[esi]
sub bl,('A'-0Ah) ; sub bl,55 -> convert ASCII to binary.
jmp shift_eax_by_4_and_add_bl

If you need to convert a character (for simplicity, say, in upper case) representing a hex digit into the value of that digit you need to do this:
IF char >= 'A'
value = char - 'A' + 10
ELSE
value = char - '0'
ENDIF
If you need to do the reverse, you do the reverse:
IF value >= 10
char = value - 10 + 'A'
ELSE
char = value + '0'
ENDIF
Here you exploit the fact that the ASCII characters 0 through 9 have consecutive ASCII codes and so do the ASCII characters A through F.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string