How do I convert a string representing a signed hex int into its signed int Doubleword number in 80x86?

How do I convert a string representing a signed hex int into its signed int Doubleword number in 80x86? - string

So I am taking an assembly language course and I am stuck on this problem from the book:
Using the windows 32 console (so I have an io.h to use), I am supposed to take a valid hex value inputted by the user and then display the actual hex value in the register EAX. So if the user entered "AB CD E2 18", then after the procedure EAX would hold the value: ABCDE218.
The parts that I am stuck on are the A-F values. If I use A for example, I can get the bits to read 00000010, but I don't know how to change that into its hex value A. Here is what I have so far:
.586
.MODEL FLAT
.CODE
hexToInt PROC
push ebp ; save base pointer
mov ebp, esp ; establish stack frame
sub esp, 4 ; local space for sign
push ebx ; Save registers
push edx
push esi
pushfd ; save flags
mov esi,[ebp+8] ; get parameter (source addr)
WhileBlankD:
cmp BYTE PTR [esi],' ' ; space?
jne EndWhileBlankD ; exit if not
inc esi ; increment character pointer
jmp WhileBlankD ; and try again
EndWhileBlankD:
mov eax,1 ; default sign multiplier
IfPlusD:cmp BYTE PTR [esi],'+' ; leading + ?
je SkipSignD ; if so, skip over
IfMinusD:
cmp BYTE PTR [esi],'-' ; leading - ?
jne EndIfSignD ; if not, save default +
mov eax,-1 ; -1 for minus sign
SkipSignD:
inc esi ; move past sign
EndIfSignD:
mov [ebp-4],eax ; save sign multiplier
mov eax,0 ; number being accumulated
WhileDigitD:
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
and ebx,0000000Fh ; convert to single-digit integer
and eax, ebx
shl eax, 4
inc esi
jmp WhileDigitD
TestForHexD:
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl, [esi]
sub bl, 31h
and ebx, 000000FFh
or al, bl
shl eax, 4
inc esi ; increment character pointer
jmp WhileDigitD ; go try next character
EndWhileDigitD:
; if value is < 80000000h, multiply by sign
cmp eax,80000000h ; 80000000h?
jnb endIfMaxD ; skip if not
imul DWORD PTR [ebp-4] ; make signed number
endIfMaxD:
popfd ; restore flags
pop esi ; restore registers
pop edx
pop ebx
mov esp, ebp ; delete local variable space
pop ebp
ret ; exit
hexToInt ENDP
END
The TestForHex label is where I am trying to convert the ASCII string to hex. I was looking around and read that I could accomplish my goal by shifting and masking, but I can't figure it out and I can't find any examples. At this point I am sure its something really small that I am just over looking, but I am stuck.

There are some bugs in your code.
First, in 0 ... 9 string to integer conversion code, you don't do ASCII to binary conversion as you should do, but instead you do and ebx,0Fh, which is incorrect. You need to subtract '0' (30h) from each ASCII character, like this:
mov bl,[esi]
sub bl,'0' ; sub bl,30h
Then, also in 0 ... 9 string to integer conversion code:
and eax, ebx
If the number consists of only 0...9 digits, and eax, ebx will produce always 0. It should be:
or al,bl
Then, you do shl eax,4, even if you don't know if there will be more digits. That means that the number will be 16 times bigger than it should.
Then, you give the example input with spaces, but your code does not handle spaces (20h) properly, it ends reading input for any value below '0' (30h), it seems to accept only leading spaces (skip this if you don't want to accept spaces in between).
So, the entire code block above should be:
WhileDigitD:
cmp byte ptr [esi], ' ' ; delete this if you don't want spaces in between.
je next_char ; delete this if you don't want spaces in between.
cmp BYTE PTR [esi],'0' ; compare next character to '0'
jb EndWhileDigitD ; not a digit if smaller than '0'
cmp BYTE PTR [esi],'9' ; compare to '9'
ja TestForHexD
mov bl,[esi] ; ASCII character to BL
sub bl,'0' ; sub bl,30h -> convert ASCII to binary.
shift_eax_by_4_and_add_bl:
shl eax,4 ; shift the current value 4 bits to left.
or al,bl ; add the value of the current digit.
next_char:
inc esi
jmp WhileDigitD
I also added labels next_char and shift_eax_by_4_and_add_bl. The reason for next_char should be evident, shift_eax_by_4_and_add_bl is to minimize duplicate code of 0...9 and A...F code blocks, see below.
You don't check that that the hexadecimal A...F digit is within range A ... F, only that it's below or equal to F. Otherwise it has same bug with shl eax,4. And as usually duplicate code should be avoided, I added shift_eax_by_4_and_add_bl label to minimize duplicate code.
So I think it should be:
Edit: corrected sub bl,31h -> sub bl,('A'-0Ah).
TestForHexD:
cmp BYTE PTR [esi], 'A'
jb EndWhileDigitD
cmp BYTE PTR [esi], 'F'
ja EndWhileDigitD
mov bl,[esi]
sub bl,('A'-0Ah) ; sub bl,55 -> convert ASCII to binary.
jmp shift_eax_by_4_and_add_bl

If you need to convert a character (for simplicity, say, in upper case) representing a hex digit into the value of that digit you need to do this:
IF char >= 'A'
value = char - 'A' + 10
ELSE
value = char - '0'
ENDIF
If you need to do the reverse, you do the reverse:
IF value >= 10
char = value - 10 + 'A'
ELSE
char = value + '0'
ENDIF
Here you exploit the fact that the ASCII characters 0 through 9 have consecutive ASCII codes and so do the ASCII characters A through F.

Related

How to find the hamming distance for strings that are not necessarily equal length?

I have an assignment asking me to find the hamming distance of two user-input strings that are not necessarily equal in length.
So, I made the following algorithm:
Read both strings
check the length of each string
compare the length of the strings
if(str1 is shorter)
set counter to be the length of str1
END IF
if(str1 is longer)
set counter to be the length of str2
END IF
if(str1 == str2)
set counter to be length of str1
END IF
loop through each digit of the strings
if(str1[digitNo] XOR str2[digitNo] == 1)
inc al
END IF
the final al value is the hamming distance of the strings, print it.
But I'm stuck at step 3 and I don't seem to get it working. any help?
I tried playing around with the registers to save the values in, but none of that worked, I still didn't get it working.
; THIS IS THE CODE I GOT
.model small
.data
str1 db 255
db ?
db 255 dup(?)
msg1 db 13,10,"Enter first string:$"
str2 db 255
db ?
db 255 dup(?)
msg2 db 13,10,"Enter second string:$"
one db "1"
count db ?
.code
.startup
mov ax,#data
mov ds,ax
; printing first message
mov ah, 9
mov dx, offset msg1
int 21h
; reading first string
mov ah, 10
mov dx, offset str1
int 21h
; printing second message
mov ah, 9
mov dx, offset msg2
int 21h
; reading second string
mov ah, 10
mov dx, offset str2
int 21h
; setting the values of the registers to zero
mov si, 0
mov di, 0
mov cx, 0
mov bx, 0
; checking the length of the first string
mov bl, str1+1
add bl, 30h
mov ah, 02h
mov dl, bl
int 21h
; checking the length of the second string
mov bl, str2+1
add bl, 30h
mov ah, 02h
mov dh, bl
int 21h
; comparing the length of the strings
cmp dl,dh
je equal
jg str1Greater
jl str1NotGreater
; if the strings are equal we jump here
equal:
mov cl, dl
call theLoop
; if the first string is greater than the second, we jump here and set counter of str1
str1Greater:
; if the second string is greater than the first, we jump here and set counter to length of str2
Str1NotGreater:
; this is the loop that finds and prints the hamming distance
;we find it by looping over the strings and taking the xor for each 2, then incrementing counter of ones for each xor == 1
theLoop:
end
So, in the code I provided, it's supposed to print the length of each string (it prints the lengths next to each other), but it seems to always keep printing the length of the first string, twice. The register used to store the length of the first string is dl, and the register used to store the length of the second is dh, if I change it back to dl, it would then print the correct length, but I want to compare the lengths, and I think it won't be possible to do so if I save it in dl both times.

but it seems to always keep printing the length of the first string, twice.
When outputting a character with the DOS function 02h you don't get to choose which register to use to supply the character! It's always DL.
Since after printing both lengths you still want to work with these lengths it will be better to not destroy them in the first place. Put the 1st length in BL and the second length in BH. For outputting you copy these in turn to DL where you do the conversion to a character. This of course can only work for strings of at most 9 characters.
; checking the length of the first string
mov BL, str1+1
mov dl, BL
add dl, 30h
mov ah, 02h
int 21h
; checking the length of the second string
mov BH, str2+1
mov dl, BH
add dl, 30h
mov ah, 02h
int 21h
; comparing the length of the strings
cmp BL, BH
ja str1LONGER
jb str1SHORTER
; if the strings are equal we ** FALL THROUGH ** here
equal:
mov cl, BL
mov ch, 0
call theLoop
!!!! You need some way out at this point. Don't fall through here !!!!
; if the first string is greater than the second, we set counter of str1
str1LONGER:
; if the second string is greater than the first, we set counter to length of str2
Str1SHORTER:
; this is the loop that finds and prints the hamming distance
;we find it by looping over the strings and taking the xor for each 2, then incrementing counter of ones for each xor == 1
theLoop:
Additional notes
Lengths are unsigned numbers. Use the unsigned conditions above and below.
Talking about longer and shorter makes more sense for strings.
Don't use 3 jumps if a mere fall through in the code can do the job.
Your code in theLoop will probably use CX as a counter. Don't forget to zero CH. Either using 2 instructions like I did above or else use movzx cx, BL if you're allowed to use instructions that surpass the original 8086.
Bonus
mov si, offset str1+2
mov di, offset str2+2
mov al, 0
MORE:
mov dl, [si]
cmp dl, [di]
je EQ
inc al
EQ:
inc si
inc di
loop MORE

Converting a string of numbers into an integer in Assembly x86

I'm trying to convert a user inputted string of numbers to an integer.
For example, user enters "1234" as a string I want 1234 stored in a DWORD variable.
I'm using lodsb and stosb to get the individual bytes. My problem is I can't get the algorithm right for it. My code is below:
mov ecx, (SIZEOF num)-1
mov esi, OFFSET num
mov edi, OFFSET ints
cld
counter:
lodsb
sub al,48
stosb
loop counter
I know that the ECX counter is going to be a bit off also because it's reading the entire string not just the 4 bytes, so it's actually 9 because the string is 10 bytes.
I was trying to use powers of 10 to multiply the individual bytes but I'm pretty new to Assembly and can't get the right syntax for it. If anybody can help with the algorithm that would be great. Thanks!

A simple implementation might be
mov ecx, digitCount
mov esi, numStrAddress
cld ; We want to move upward in mem
xor edx, edx ; edx = 0 (We want to have our result here)
xor eax, eax ; eax = 0 (We need that later)
counter:
imul edx, 10 ; Multiply prev digits by 10
lodsb ; Load next char to al
sub al,48 ; Convert to number
add edx, eax ; Add new number
; Here we used that the upper bytes of eax are zeroed
loop counter ; Move to next digit
; edx now contains the result
mov [resultIntAddress], edx
Of course there are ways to improve it, like avoiding the use of imul.
EDIT: Fixed the ecx value

x86 Assembly String Buffer Number to ASCII

I was writing an x86 assembly program to output a number in hexadecimal. The program was assembled using nasm and the image file ran by qemu. The behavior of the program confused me a lot. As the working program below suggests, I wouldn't have to add 0x30 to a digit to get it to print the character of that digit.
; Boot sector code offset: 0x7c00
[org 0x7c00]
mov dx, 0x1fb6 ; The hexadecimal to be printed
call print_hex ; call the function
jmp $ ; jump infinitely
%include "print_string.asm" ; Include the print_string function
print_hex:
pusha ; push all registers to stack
mov ax, 0x4 ; rotate through the number four times
print_hex_loop:
cmp ax, 0x0 ; compare the counter with 0
jle print_hex_end ; if it is zero then jump to the end
mov cx, dx ; move dx to cx
and cx, 0x000F ; take the lower four binary digits of cx
cmp cx, 0xa ;compare the digits with 0xa
jge print_hex_letter ; if it is larger than a, jump to printing character
add cx, 0x0 ; otherwise print the ascii of a number
jmp print_hex_modify_string ; jump to routine for modifing the template
print_hex_letter:
add cx, 0x7 ; print the ascii of a letter
print_hex_modify_string:
mov bx, HEX_OUT ; bring the address of HEX_OUT into dx
add bx, 0x1 ; skip the 0x
add bx, ax ; add the bias
add byte [bx], cl ; move the character into its position
shr dx, 4 ; shift right 4 bits
sub ax, 0x1 ; subtract 1 from the counter
jmp print_hex_loop ; jump back to the start of the function
print_hex_end:
mov bx, HEX_OUT ; move the address of HEX_OUT to bx
call print_string ; call the function print_string
popa ; pop all registers from stack
ret ; return to calling function
HEX_OUT:
db '0x0000',0 ; The template string for printing
times 510-($-$$) db 0 ; fill zeros
dw 0xaa55 ; MAGIC_FLAG for boot
boot_sect.asm
print_string:
pusha
mov ah, 0x0e
mov al, [bx]
print_string_loop:
cmp al, 0x0
je print_string_end
int 0x10
add bx, 0x1
mov al, [bx]
jmp print_string_loop
print_string_end:
popa
ret
print_string.asm
The output of this program is what I expected, but when I tried to add 0x30 on the numerals to get the ASCII code of the digits, the output was gibberish. Is there some trick to it or am I missing some key points here?
Thanks!

The answer to your original question:
Because you do add byte [bx], cl to write digit into buffer, and the buffer already contains '0', so the first time it will work correctly. Calling print_hex second time will produce gibberish again, as the HEX_OUT content is already modified (trivia: which hex number printed as first would allow also some second value to be printed correctly?).
Now just for fun I'm adding how I would probably do print_hex for myself. Maybe it will give you additional ideas for your x86 ASM programming, I tried to comment it a lot to explain why I'm doing things in a way I'm doing them:
First I would separate formatting function, so I could eventually reuse it elsewhere, so input is both number and target buffer pointer. I'm using LUT (look up table) for ASCII conversion, as the code is simpler. If you care about size, it's possible to do it in code with branching in less bytes and use the slower pusha/popa to save registers.
format_hex:
; dx = number, di = 4B output buffer for "%04X" format of number.
push bx ; used as temporary to calculate digits ASCII
push si ; used as pointer to buffer for writing chars
push dx
lea si,[di+4] ; buffer.end() pointer
format_hex_loop:
mov bx,dx ; bx = temporary to extract single digit
dec si ; si = where to write next digit
and bx,0x000F ; separate last digit (needs whole bx for LUT indexing)
shr dx,4 ; shift original number one hex-digit (4 bits) to right
mov bl,[format_hex_ascii_lut+bx] ; convert digit 0-15 value to ASCII
mov [si],bl ; write it into buffer
cmp di,si ; compare buffer.begin() with pointer-to-write
jb format_hex_loop ; loop till first digit was written
pop dx ; restore original values of all modified regs
pop si
pop bx
ret
format_hex_ascii_lut: ; LUT for 0-15 to ASCII conversion
db '0123456789ABCDEF'
Then for convenience a print_hex function may be added too, providing its own buffer for formatting with "0x" and nul terminator:
print_hex:
; dx = number to print
push di
push bx
; format the number
mov di,HEX_OUT+2
call format_hex
; print the result to screen
lea bx,[di-2] ; bx = HEX_OUT
; HEX_OUT was already set with "0x" and nul-terminator, otherwise I would do:
; mov word [bx],'0x'
; mov byte [bx+6],0
call print_string
pop bx
pop di
ret
HEX_OUT:
db '0x1234',0 ; The template string for printing
And finally example usage from the boot code:
mov dx,0x1fb6 ; The hexadecimal to be printed
call print_hex
mov dx,ax ; works also when called second time
call print_hex ; (but would be nicer to print some space between them)
jmp $ ; loop infinitely
(I did verify this code to some extend (that it will compile and run), although only by separate parts of it and in 32b environment (patching few lines to make it 32b), so some bug may have slipped in. I don't have 16b environment to verify it as complete boot code.)

Move to end of string within buffer - Assembly Languge

I am trying to take in a string and then see if the last value in the string is an EOL character. I figured I would use the length of the string read in and then add it to the address of the buffer to find the last element. This does not seem to work.
Edit: I apologize that I did not include more information. Variables are defined as such:
%define BUFLEN 256
SECTION .bss ; uninitialized data section
buf: resb BUFLEN ; buffer for read
newstr: resb BUFLEN ; converted string
rlen: resb 4
Then a dos interrupt is called to accept a string from the user like so:
; read user input
;
mov eax, SYSCALL_READ ; read function
mov ebx, STDIN ; Arg 1: file descriptor
mov ecx, buf ; Arg 2: address of buffer
mov edx, BUFLEN ; Arg 3: buffer length
int 080h
Then we go into our loop:
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx\
I am user NASM to compile and writing for an i386 machine.

Adding the length of the string to the address of the buffer gives access to the byte behind the string.
Based on you saying that
you want to see if the last value in the string is an EOL character
you aim to decrease 'rlen' if it is an EOL (*)
I conclude that you consider the possible EOL character part of the string as defined by its length rlen. If you don't then (*) doesn't make sense.
Use mov al,[esi-1] to see what the last element is!
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi-1] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx

This is a much more roundabout way (literally) of getting to the end of the string. I loop through all the characters in the string based on what the size of the counter, rlen, is. Then, once the loop is complete, I make the comparison and decrement rlen as necessary.
test_loop:
mov al, [esi] ; get a character
inc esi ; update source pointer
dec ecx ; update char count
jnz test_loop ; loop to top if more chars
cmp al, 10 ; comparison
jne L1_init ; if not EOL jump to L1_init
mov ecx, [rlen] ; decrease the size of rlen if necessary
dec ecx
mov [rlen], ecx

How can I read one integer each time?

I'm using an assembly library to make a program that reads three integers from standard input. When the reading is done in the console it works perfectly, but when I use a file as input, it reads the three integers at once.
This is the strace for console:
read(0, "3000\n", 512) = 5
read(0, "2000\n", 512) = 5
read(0, "1000\n", 512) = 5
And this from input file:
read(0, "3000\n2000\n1000\n", 512) = 15
read(0, "", 512) = 0
read(0, "", 512) = 0
Here are the procedures:
;--------------------------------------------------------
ReadInt:
;
; Reads a 32-bit signed decimal integer from standard
; input, stopping when the Enter key is pressed.
; All valid digits occurring before a non-numeric character
; are converted to the integer value. Leading spaces are
; ignored, and an optional leading + or - sign is permitted.
; All spaces return a valid integer, value zero.
; Receives: nothing
; Returns: If CF=0, the integer is valid, and EAX = binary value.
; If CF=1, the integer is invalid and EAX = 0.
;--------------------------------------------------------
push edx
push ecx
; Input a signed decimal string.
mov edx,digitBuffer
mov ecx,MAX_DIGITS
call ReadString
mov ecx,eax ; save length in ECX
; Convert to binary (EDX -> string, ECX = length)
call ParseInteger32 ; returns EAX, CF
pop ecx
pop edx
ret
;--------------- End of ReadInt ------------------------
;--------------------------------------------------------
ReadString:
;
; Reads a string from the keyboard and places the characters
; in a buffer.
; Receives: EDX offset of the input buffer
; ECX = maximum characters to input (including terminal null)
; Returns: EAX = size of the input string.
; Comments: Stops when Enter key (0Dh,0Ah) is pressed. If the user
; types more characters than (ECX-1), the excess characters
; are ignored.
; Written by Kip Irvine and Gerald Cahill
; Modified by Curtis Wong
;--------------------------------------------------------
enter 8, 0 ; bufSize: ebp - 4
; bytesRead: ebp - 8
pushad
mov edi,edx ; set EDI to buffer offset
mov dword [ebp - 4],ecx ; save buffer size
call ReadKeys
mov dword [ebp - 8], eax
cmp eax,0
jz .L5 ; skip move if zero chars input
cld ; search forward
mov ecx, dword [ebp - 4] ; repetition count for SCASB
dec ecx
mov al,NL ; scan for 0Ah (Line Feed) terminal character
repne scasb
jne .L1 ; if not found, jump to L1
;if we reach this line, length of input string <= (bufsize - 2)
dec dword [ebp - 8] ; second adjustment to bytesRead
dec edi ; 0Ah found: back up two positions
cmp edi,edx ; don't back up to before the user's buffer
jae .L2
mov edi,edx ; 0Ah must be the only byte in the buffer
jmp .L2 ; and jump to L2
.L1: mov edi,edx ; point to last byte in buffer
add edi,dword [ebp - 4]
dec edi
mov byte [edi],0 ; insert null byte
; Clear excess characters from the buffer, 1 byte at a time
.L6: call BufferFlush
jmp .L5
.L2: mov byte [edi],0 ; insert null byte
.L5: popad
mov eax, dword [ebp - 8]
leave
ret
;--------------- End of ReadString --------------------

You will need to buffer the input and split it because the console and files behave slightly different. A console will send you data as soon as someone presses Return, that is line by line.
Files will send you as much data as possible per call to read().
To make your code work, you will have to write a readline() function that reads the input byte by byte and returns when it sees a line feed.
Or you can use an internal buffer, fill it with as much data as possible, find the first line, return that, repeat until the buffer is empty, try to read more data, return EOF when there is no more data from the input.

As Aaron points out, the problem is that sys_read behaves differently when stdin is redirected. You could fix it as he suggests. or you could use Along32's ReadString and use a "homemade" atoi.
;--------------------
atoi:
push ebx
mov edx, [esp + 8] ; pointer to string
xor ebx, ebx ; assume not negative
cmp byte [edx], '-'
jnz notneg
inc ebx ; indicate negative
inc edx ; move past the '-'
notneg:
xor eax, eax ; clear "result"
.top:
movzx ecx, byte [edx]
inc edx
cmp ecx, byte '0'
jb .done
cmp ecx, byte '9'
ja .done
; we have a valid character - multiply
; result-so-far by 10, subtract '0'
; from the character to convert it to
; a number, and add it to result.
lea eax, [eax + eax * 4]
lea eax, [eax * 2 + ecx - '0']
jmp short .top
.done:
test ebx, ebx
jz notminus
neg eax
notminus:
pop ebx
ret
;------------------------
That expects the address of the string to be pushed on the stack and "removed" after, but I think you could just comment out that second line, and pass the address in edx (not tested!). More like the rest of the Along32 code that way. Unlike Along32's code, it returns with edx pointed to the next byte, and ecx (just cl, really) containing the "invalid" byte that stopped processing. I think you could call it repeatedly on the string returned by ReadString, saving the integer (in eax) and calling it again (without touching edx) if ecx is LF. When ecx is zero, you're done. Hope you find it helpful.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string