Converting a string of numbers into an integer in Assembly x86

Converting a string of numbers into an integer in Assembly x86 - string

I'm trying to convert a user inputted string of numbers to an integer.
For example, user enters "1234" as a string I want 1234 stored in a DWORD variable.
I'm using lodsb and stosb to get the individual bytes. My problem is I can't get the algorithm right for it. My code is below:
mov ecx, (SIZEOF num)-1
mov esi, OFFSET num
mov edi, OFFSET ints
cld
counter:
lodsb
sub al,48
stosb
loop counter
I know that the ECX counter is going to be a bit off also because it's reading the entire string not just the 4 bytes, so it's actually 9 because the string is 10 bytes.
I was trying to use powers of 10 to multiply the individual bytes but I'm pretty new to Assembly and can't get the right syntax for it. If anybody can help with the algorithm that would be great. Thanks!

A simple implementation might be
mov ecx, digitCount
mov esi, numStrAddress
cld ; We want to move upward in mem
xor edx, edx ; edx = 0 (We want to have our result here)
xor eax, eax ; eax = 0 (We need that later)
counter:
imul edx, 10 ; Multiply prev digits by 10
lodsb ; Load next char to al
sub al,48 ; Convert to number
add edx, eax ; Add new number
; Here we used that the upper bytes of eax are zeroed
loop counter ; Move to next digit
; edx now contains the result
mov [resultIntAddress], edx
Of course there are ways to improve it, like avoiding the use of imul.
EDIT: Fixed the ecx value

Related

NASM x64 read numbers to array and write out

I have got a big problem with reading 10 numbers from keyboard to array and then writing them out.
mov rcx, arr + qword [n]*8 - I don't know how to modify it properly, because actually it causes an error.
Additionally how should I set mov rdx, 1 when I want to read numbers like: 12 123 1234 not only digits?
I would be grateful for any kind of help.
global main
section .text
main:
mov rbp, rsp; for correct debugging
mov rdi, 0
_in:
mov rax, 3
mov rbx, 0
mov rcx, arr + rdi*8
mov rdx, 1
int 80h
mov rax, 3
mov rbx, 0
mov rcx, blank
mov rdx, 1
int 80h
inc qword [n]
cmp qword [n], 10
jz _next
jmp _in
inc rdi
_next:
mov qword [n], 0
mov rdi, 0
_out:
mov rax, 4
mov rbx, 1
mov rcx, arr + rdi*8
mov rdx, 1
int 80h
mov rax, 4
mov rbx, 1
mov rcx, nl
mov rdx, 1
int 80h
inc qword [n]
cmp qword [n], 10
jz _end
jmp _out
inc rdi
_end:
mov rax, 1
mov rbx, 0
ret
section .data
arr times 10 dq 0
blank db 0
n dq 0
nl db 10

About input and data structures.
Keep reading per char (rdx=1), in a loop. Input chars will be for example '1', '2', '3', '4', '5', 10, the character 10 is new line (maybe check also for other whitespace chars like 13, 9, 32 too, or even flip the test, any char out of '0'..'9' range is end of number).
While reading digits, decide if you want to store them as strings, or as numbers.
If strings, then write every new digit into memory at address arr+n*8+input_char_index, put probably zero value as terminator after the number (your current array can hold at most 7 character long strings + zero for each "n"), or store string length into separate array, or as first byte of element, and make first char go at +1 offset after the length byte, etc... (you can design your data structure as you wish). To display such string just load it's address lea rcx,[arr+n*8] and calculate it's length with strlen (it reads+counts char by char until 0 is found), or load the length if you have it stored somewhere, and sys_write it.
If you want to store numbers, set some spare register as zero ahead of input (for example rdi), then for every digit read do add rdi,rdi lea rdi,[rdi+rdi*4] => that's rdi *= 10, then convert the input character from ASCII digit to 64b 0-9 value, and add it to rdi ... loop until non-digit or newline is read (but 64b unsigned number will overflow for 19+ digits input). After end of input store the value into arr, now arr will contain numerical QWORD values.
To output them, you have to do the conversion in opposite direction, from numerical value into some memory buffer, producing digit by digit ASCII characters (have big enough buffer, again 20+ chars is safe for 64b value). After you have your number stored in memory as ASCII string + know it's length, you can SYS_WRITE it to stdout.
You may also consider to follow some more tutorials first and re-read some theory about common data structures/etc, memory, string encodings, registers, x86 addressing modes, .... before writing your own code (as it feels to me that you are guessing a bit too much, how things work).

Can I multiply a register's value by an immediate number to add the result to another register?

Learning Assembly with NASM, Ubuntu, 32 bits.
My array in .data:
ary db 1,2,2,4,5 ; Five elements of one byte each
And some number:
tmp db 2 ; Holds the number 2
Let's say I want to print the element at index 4 in the array (so it would be 5).
I know I could do this:
mov EAX,4
mov EBX,0
mov ECX,ary ; Put the array's address in ECX
add ECX,4 ; Move address four bytes to the right
add byte [ECX],'0' ; The value at this address to ASCII
mov EDX,1
int 0x80
However, for whatever reasons, I decided that instead of writing the constant number 4, I want to do it by multiplying my variable (which is 2) by 2.
This is the updated code:
mov EAX,[tmp] ; Put the number 2 in EAX
mov ECX,ary ; Put the array's address in ECX
add ECX,EAX * 2 ; Move (2 * 2) = 4 bytes to the right
add byte [ECX],'0' ; Decimal to ASCII
mov EAX,4
mov EBX,0
mov EDX,1
int 0x80
This doesn't work at add ECX,EAX * 2:
invalid operand type
But why? Doesn't ECX evaluate to 2? Being equivalent to
add ECX,2 * 2
Curiously, these do work:
add ECX,EAX * 1 ; Moves by 2
add ECX,EAX * 0 ; Moves by 0
The above suggests me that the answer is no. And the reason that multiplying by 1 or 0 works is because the assembler doesn't actually need to do any multiplication to know the answer in the first place.
Does this mean that to achieve what I want, I do have to use the mul instruction?

You CAN do multiplication and adding in one instruction if you use lea:
lea ECX,[ECX+EAX*2]

In x86, although lea supports multiplication by a constant, the add instruction doesn't support an operand that multiples a register by a constant. It supports additive offsets, but not multiplication. I assume, as you noted, that the assembler is being somewhat forgiving in this case in the accepted syntax of add ECX,EAX*0 and add ECX,EAX*1 as being equivalent to add ECX,0 and add ECX,EAX, respectively.
You would instead need do something like this:
mov ECX,ary ; Put the array's address in ECX
mov EAX,[tmp] ; Put the number 2 in EAX
shl EAX,1 ; (instead of mul EAX,2)
add ECX,EAX ; Move (2 * 2) = 4 bytes to the right
add byte [ECX],'0' ; Decimal to ASCII
mov EAX,4
mov EBX,0
mov EDX,1
int 0x80

The instruction LEA can be used to provide two additions and one limited multiplication at once. The common syntax is:
lea reg, [offset+reg+const*reg]
Here, reg is any register, offset is some constant number and const is one of 1, 2, 4 or 8 constant.
This way, this instruction is very powerful is order to compute some pretty complex equations:
The equation from the question:
add ECX,EAX * 2
can be computed this way:
lea ecx, [ecx+2*eax]
There are many other uses:
lea eax, [ebx+2*ebx] ; eax = 3*ebx
lea eax, [eax+4*eax] ; eax = 5*eax
lea eax, [ecx+8*ecx] ; eax = 9*ecx
lea eax, [1234+ebx+8*ecx]
Note, that FASM allows shorter syntax for the above examples:
lea eax, [3*ebx]
lea eax, [5*eax]
lea eax, [9*ecx]
Additional advantage of lea instruction is that it does not affects the flags. The execution speed of this instruction is very fast on all x86 CPU.

Copy string from BSS variable to BSS variable in Assembly

Let's suppose I have to string stored in variables created in the .BSS section.
var1 resw 5 ; this is "abcde" (UNICODE)
var2 resw 5 ; here I will copy the first one
How would I do this with NASM?
I tried something like this:
mov ebx, var2 ; Here we will copy the string
mov dx, 5 ; Length of the string
mov esi, dword var1 ; The variable to be copied
.Copy:
lodsw
mov [ebx], word ax ; Copy the character into the address from EBX
inc ebx ; Increment the EBX register for the next character to copy
dec dx ; Decrement DX
cmp dx, 0 ; If DX is 0 we reached the end
jg .Copy ; Otherwise copy the next one
So, first problem is that the string is not copied as UNICODE but as ASCII and I don't know why. Secondly, I know there might be some not recommended use of some registers. And lastly, I wonder if there is some quicker way of doing this (maybe there are instructions specially created for this kind of operations with strings). I'm talking about 8086 processors.

inc ebx ; Increment the EBX register for the next character to copy
A word is 2 bytes, but you're only stepping ebx 1 byte ahead. Replace inc ebx with add ebx,2.

Michael already answered about the obvious problem of the demonstrated code.
But there is also another layer of understanding. It is not important how you will copy the string from one buffer to another - by bytes, words or double words. It will always create exact copy of the string.
So, how to copy the string is a matter of optimization. Using rep movsd is the fastest known way.
Here is one example:
; ecx contains the length of the string in bytes
; esi - the address of the source, aligned on dword
; edi - the address of the destination aligned on dword
push ecx
shr ecx, 2
rep movsd
pop ecx
and ecx, 3
rep movsb

How do I convert a string representing a signed hex int into its signed int Doubleword number in 80x86?

So I am taking an assembly language course and I am stuck on this problem from the book:
Using the windows 32 console (so I have an io.h to use), I am supposed to take a valid hex value inputted by the user and then display the actual hex value in the register EAX. So if the user entered "AB CD E2 18", then after the procedure EAX would hold the value: ABCDE218.
The parts that I am stuck on are the A-F values. If I use A for example, I can get the bits to read 00000010, but I don't know how to change that into its hex value A. Here is what I have so far:
.586
.MODEL FLAT
.CODE
hexToInt PROC
push ebp ; save base pointer
mov ebp, esp ; establish stack frame
sub esp, 4 ; local space for sign
push ebx ; Save registers
push edx
push esi
pushfd ; save flags
mov esi,[ebp+8] ; get parameter (source addr)
WhileBlankD:
cmp BYTE PTR [esi],' ' ; space?
jne EndWhileBlankD ; exit if not
inc esi ; increment character pointer
jmp WhileBlankD ; and try again
EndWhileBlankD:
mov eax,1 ; default sign multiplier
IfPlusD:cmp BYTE PTR [esi],'+' ; leading + ?
je SkipSignD ; if so, skip over
IfMinusD:
cmp BYTE PTR [esi],'-' ; leading - ?
jne EndIfSignD ; if not, save default +
mov eax,-1 ; -1 for minus sign
SkipSignD:
inc esi ; move past sign
EndIfSignD:
mov [ebp-4],eax ; save sign multiplier
mov eax,0 ; number being accumulated
WhileDigitD:
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
and ebx,0000000Fh ; convert to single-digit integer
and eax, ebx
shl eax, 4
inc esi
jmp WhileDigitD
TestForHexD:
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl, [esi]
sub bl, 31h
and ebx, 000000FFh
or al, bl
shl eax, 4
inc esi ; increment character pointer
jmp WhileDigitD ; go try next character
EndWhileDigitD:
; if value is < 80000000h, multiply by sign
cmp eax,80000000h ; 80000000h?
jnb endIfMaxD ; skip if not
imul DWORD PTR [ebp-4] ; make signed number
endIfMaxD:
popfd ; restore flags
pop esi ; restore registers
pop edx
pop ebx
mov esp, ebp ; delete local variable space
pop ebp
ret ; exit
hexToInt ENDP
END
The TestForHex label is where I am trying to convert the ASCII string to hex. I was looking around and read that I could accomplish my goal by shifting and masking, but I can't figure it out and I can't find any examples. At this point I am sure its something really small that I am just over looking, but I am stuck.

There are some bugs in your code.
First, in 0 ... 9 string to integer conversion code, you don't do ASCII to binary conversion as you should do, but instead you do and ebx,0Fh, which is incorrect. You need to subtract '0' (30h) from each ASCII character, like this:
mov bl,[esi]
sub bl,'0' ; sub bl,30h
Then, also in 0 ... 9 string to integer conversion code:
and eax, ebx
If the number consists of only 0...9 digits, and eax, ebx will produce always 0. It should be:
or al,bl
Then, you do shl eax,4, even if you don't know if there will be more digits. That means that the number will be 16 times bigger than it should.
Then, you give the example input with spaces, but your code does not handle spaces (20h) properly, it ends reading input for any value below '0' (30h), it seems to accept only leading spaces (skip this if you don't want to accept spaces in between).
So, the entire code block above should be:
WhileDigitD:
cmp byte ptr [esi], ' ' ; delete this if you don't want spaces in between.
je next_char ; delete this if you don't want spaces in between.
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
sub bl,'0' ; sub bl,30h -> convert ASCII to binary.
shift_eax_by_4_and_add_bl:
shl eax,4 ; shift the current value 4 bits to left.
or al,bl ; add the value of the current digit.
next_char:
inc esi
jmp WhileDigitD
I also added labels next_char and shift_eax_by_4_and_add_bl. The reason for next_char should be evident, shift_eax_by_4_and_add_bl is to minimize duplicate code of 0...9 and A...F code blocks, see below.
You don't check that that the hexadecimal A...F digit is within range A ... F, only that it's below or equal to F. Otherwise it has same bug with shl eax,4. And as usually duplicate code should be avoided, I added shift_eax_by_4_and_add_bl label to minimize duplicate code.
So I think it should be:
Edit: corrected sub bl,31h -> sub bl,('A'-0Ah).
TestForHexD:
cmp BYTE PTR [esi], 'A'
jb EndWhileDigitD
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl,[esi]
sub bl,('A'-0Ah) ; sub bl,55 -> convert ASCII to binary.
jmp shift_eax_by_4_and_add_bl

If you need to convert a character (for simplicity, say, in upper case) representing a hex digit into the value of that digit you need to do this:
IF char >= 'A'
value = char - 'A' + 10
ELSE
value = char - '0'
ENDIF
If you need to do the reverse, you do the reverse:
IF value >= 10
char = value - 10 + 'A'
ELSE
char = value + '0'
ENDIF
Here you exploit the fact that the ASCII characters 0 through 9 have consecutive ASCII codes and so do the ASCII characters A through F.

How would I find the length of a string using NASM?

I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:
len:
push ebx
mov ecx,0
dec ebx
count:
inc ecx
inc ebx
cmp ebx,0
jnz count
dec ecx
pop ebx
ret
My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?

You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:
cmp byte[ebx], 0
Also, the last push ebx should be pop ebx.

Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.
_start:
pop rsi ; number of arguments (argc)
pop rsi ; argv[0] the command itself (or program name)
pop rsi ; rsi = argv[1], a pointer to a string
mov ecx, 0 ; counter
.repeat:
lodsb ; byte in AL
test al,al ; check if zero
jz .done ; if zero then we're done
inc ecx ; increment counter
jmp .repeat ; repeat until zero
.done:
; string is unchanged, ecx contains the length of the string
; unused, we look at command line args instead
section .rodata
asciiz: db "This is a string with 36 characters.", 0
This is slow and inefficient, but easy to understand.
For efficiency, you'd want
only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.
And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).

Here how I would have coded it
len:
push ebx
mov eax, ebx
lp:
cmp byte [eax], 0
jz lpend
inc eax
jmp lp
lpend:
sub eax, ebx
pop ebx
ret
(The result is in eax). Likely there are better ways.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string