assembly 8086 adding a space in a string

assembly 8086 adding a space in a string - string

I ran into a problem in my program. Basically what i want to do check if there is a space after a dot in a string and if there isn't i add a space right after the dot. However i don't get how to go about this since my buffer is limited size, therefore if i add the space, the last letter of the buffer will be erased? or am i doing this wrong? thank you for the help in advance :)
For example: Hello.Hi = Hello. Hi
MOV cx, ax
MOV si, offset readBuf
MOV di, offset writeBuf
work:
MOV dl, [si]
CMP dl, '.'
JE dot
increase:
MOV [di], dl
INC si
INC di
LOOP work
dot:
CMP dl+1, ' '
JNE noSpace
JMP increase
noSpace:

There are a few problems with the code. The first one is this line:
CMP dl+1, ' '
This is adding 1 to the value in dl and comparing that to a space character which is not what you want. What you want is to compare the next character, so you'll have to load it into a register with MOV dl, [si] or similar.
The second problem is algorithm. It's often easiest to start with psuedo-code and then create the assembly language version from that. For example:
load a character
is there room left?
if not, exit
if so, save the char
does char == period?
if not, go to 1
is there room left?
if not, exit
if so, save space char
load a character
does char == space?
if so, go to 1
if not, go to 2
Note that "load a character" means both fetching the character and incrementing si, and "save a character" means both saving the character and incrementing di. Also note that steps 2, 3 and 4 are identical to steps 7, 8 and 9. This suggests a potential for a subroutine or macro so that you only have to write (and debug!) the code once and can use it multiple times.

Related

How do I edit an 8 character string with a NASM Assembler based on user input? [duplicate]

Ok, so I'm fairly new to assembly, infact, I'm very new to assembly. I wrote a piece of code which is simply meant to take numerical input from the user, multiply it by 10, and have the result expressed to the user via the programs exit status (by typing echo $? in terminal)
Problem is, it is not giving the correct number, 4x10 showed as 144. So then I figured the input would probably be as a character, rather than an integer. My question here is, how do I convert the character input to an integer so that it can be used in arithmetic calculations?
It would be great if someone could answer keeping in mind that I'm a beginner :)
Also, how can I convert said integer back to a character?
section .data
section .bss
input resb 4
section .text
global _start
_start:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 4
int 0x80
mov ebx, 10
imul ebx, ecx
mov eax, 1
int 0x80

Here's a couple of functions for converting strings to integers, and vice versa:
; Input:
; ESI = pointer to the string to convert
; ECX = number of digits in the string (must be > 0)
; Output:
; EAX = integer value
string_to_int:
xor ebx,ebx ; clear ebx
.next_digit:
movzx eax,byte[esi]
inc esi
sub al,'0' ; convert from ASCII to number
imul ebx,10
add ebx,eax ; ebx = ebx*10 + eax
loop .next_digit ; while (--ecx)
mov eax,ebx
ret
; Input:
; EAX = integer value to convert
; ESI = pointer to buffer to store the string in (must have room for at least 10 bytes)
; Output:
; EAX = pointer to the first character of the generated string
int_to_string:
add esi,9
mov byte [esi],STRING_TERMINATOR
mov ebx,10
.next_digit:
xor edx,edx ; Clear edx prior to dividing edx:eax by ebx
div ebx ; eax /= 10
add dl,'0' ; Convert the remainder to ASCII
dec esi ; store characters in reverse order
mov [esi],dl
test eax,eax
jnz .next_digit ; Repeat until eax==0
mov eax,esi
ret
And this is how you'd use them:
STRING_TERMINATOR equ 0
lea esi,[thestring]
mov ecx,4
call string_to_int
; EAX now contains 1234
; Convert it back to a string
lea esi,[buffer]
call int_to_string
; You now have a string pointer in EAX, which
; you can use with the sys_write system call
thestring: db "1234",0
buffer: resb 10
Note that I don't do much error checking in these routines (like checking if there are characters outside of the range '0' - '9'). Nor do the routines handle signed numbers. So if you need those things you'll have to add them yourself.

The basic algorith for string->digit is: total = total*10 + digit, starting from the MSD. (e.g. with digit = *p++ - '0' for an ASCII string of digits). So the left-most / Most-Significant / first digit (in memory, and in reading order) gets multiplied by 10 N times, where N is the total number of digits after it.
Doing it this way is generally more efficient than multiplying each digit by the right power of 10 before adding. That would need 2 multiplies; one to grow a power of 10, and another to apply it to the digit. (Or a table look-up with ascending powers of 10).
Of course, for efficiency you might use SSSE3 pmaddubsw and SSE2 pmaddwd to multiply digits by their place-value in parallel: see Is there a fast way to convert a string of 8 ASCII decimal digits into a binary number? and arbitrary-length How to implement atoi using SIMD?. But the latter probably isn't a win when numbers are typically short. A scalar loop is efficient when most numbers are only a couple digits long.
Adding on to #Michael's answer, it may be useful to have the int->string function stop at the first non-digit, instead of at a fixed length. This will catch problems like your string including a newline from when the user pressed return, as well as not turning 12xy34 into a very large number. (Treat it as 12, like C's atoi function). The stop character can also be the terminating 0 in a C implicit-length string.
I've also made some improvements:
Don't use the slow loop instruction unless you're optimizing for code-size. Just forget it exists and use dec / jnz in cases where counting down to zero is still what you want to do, instead of comparing a pointer or something else.
2 LEA instructions are significantly better than imul + add: lower latency.
accumulate the result in EAX where we want to return it anyway. (If you inline this instead of calling it, use whatever register you want the result in.)
I changed the registers so it follows the x86-64 System V ABI (First arg in RDI, return in EAX).
Porting to 32-bit: This doesn't depend on 64-bitness at all; it can be ported to 32-bit by just using 32-bit registers. (i.e. replace rdi with edi, rax with ecx, and rax with eax). Beware of C calling-convention differences between 32 and 64-bit, e.g. EDI is call-preserved and args are usually passed on the stack. But if your caller is asm, you can pass an arg in EDI.
; args: pointer in RDI to ASCII decimal digits, terminated by a non-digit
; clobbers: ECX
; returns: EAX = atoi(RDI) (base 10 unsigned)
; RDI = pointer to first non-digit
global base10string_to_int
base10string_to_int:
movzx eax, byte [rdi] ; start with the first digit
sub eax, '0' ; convert from ASCII to number
cmp al, 9 ; check that it's a decimal digit [0..9]
jbe .loop_entry ; too low -> wraps to high value, fails unsigned compare check
; else: bad first digit: return 0
xor eax,eax
ret
; rotate the loop so we can put the JCC at the bottom where it belongs
; but still check the digit before messing up our total
.next_digit: ; do {
lea eax, [rax*4 + rax] ; total *= 5
lea eax, [rax*2 + rcx] ; total = (total*5)*2 + digit
; imul eax, 10 / add eax, ecx
.loop_entry:
inc rdi
movzx ecx, byte [rdi]
sub ecx, '0'
cmp ecx, 9
jbe .next_digit ; } while( digit <= 9 )
ret ; return with total in eax
This stops converting on the first non-digit character. Often this will be the 0 byte that terminates an implicit-length string. You could check after the loop that it was a string-end, not some other non-digit character, by checking ecx == -'0' (which still holds the str[i] - '0' integer "digit" value that was out of range), if you want to detect trailing garbage.
If your input is an explicit-length string, you'd need to use a loop counter instead of checking a terminator (like #Michael's answer), because the next byte in memory might be another digit. Or it might be in an unmapped page.
Making the first iteration special and handling it before jumping into the main part of the loop is called loop peeling. Peeling the first iteration allows us to optimize it specially, because we know total=0 so there's no need to multiply anything by 10. It's like starting with sum = array[0]; i=1 instead of sum=0, i=0;.
To get nice loop structure (with the conditional branch at the bottom), I used the trick of jumping into the middle of the loop for the first iteration. This didn't even take an extra jmp because I was already branching in the peeled first iteration. Reordering a loop so an if()break in the middle becomes a loop branch at the bottom is called loop rotation, and can involve peeling the first part of the first iteration and the 2nd part of the last iteration.
The simple way to solve the problem of exiting the loop on a non-digit would be to have a jcc in the loop body, like an if() break; statement in C before the total = total*10 + digit. But then I'd need a jmp and have 2 total branch instructions in the loop, meaning more overhead.
If I didn't need the sub ecx, '0' result for the loop condition, I could have used lea eax, [rax*2 + rcx - '0'] to do it as part of the LEA as well. But that would have made the LEA latency 3 cycles instead of 1, on Sandybridge-family CPUs. (3-component LEA vs. 2 or less.) The two LEAs form a loop-carried dependency chain on eax (total), so (especially for large numbers) it would not be worth it on Intel. On CPUs where base + scaled-index is no faster than base + scaled-index + disp8 (Bulldozer-family / Ryzen), then sure, if you have an explicit length as your loop condition and don't want to check the digits at all.
I used movzx to load with zero extension in the first place, instead of doing that after converting the digit from ASCII to integer. (It has to be done at some point to add into 32-bit EAX). Often code that manipulates ASCII digits uses byte operand-size, like mov cl, [rdi]. But that would create a false dependency on the old value of RCX on most CPUs.
sub al,'0' saves 1 byte over sub eax,'0', but causes a partial-register stall on Nehalem/Core2 and even worse on PIII. Fine on all other CPU families, even Sandybridge: it's a RMW of AL, so it doesn't rename the partial reg separately from EAX. But cmp al, 9 doesn't cause a problem, because reading a byte register is always fine. It saves a byte (special encoding with no ModRM byte), so I used that at the top of the function.
For more optimization stuff, see http://agner.org/optimize, and other links in the x86 tag wiki.
The tag wiki also has beginner links, including an FAQ section with links to integer->string functions, and other common beginner questions.
Related:
How do I print an integer in Assembly Level Programming without printf from the c library? is the reverse of this question, integer -> base10string.
Is there a fast way to convert a string of 8 ASCII decimal digits into a binary number? highly optimized SSSE3 pmaddubsw / pmaddwd for 8-digit integers.
How to implement atoi using SIMD? using a shuffle to handle variable-length
Conversion of huge decimal numbers (128bit) formatted as ASCII to binary (hex) handles long strings, e.g. a 128-bit integer that takes 4x 32-bit registers. (It's not very efficient, and might be better to convert in multiple chunks and then do extended-precision multiplies by 1e9 or something.)
Convert from ascii to integer in AT&T Assembly Inefficient AT&T version of this.

NASM Comparison Always Falls to Default [duplicate]

Ok, so I'm fairly new to assembly, infact, I'm very new to assembly. I wrote a piece of code which is simply meant to take numerical input from the user, multiply it by 10, and have the result expressed to the user via the programs exit status (by typing echo $? in terminal)
Problem is, it is not giving the correct number, 4x10 showed as 144. So then I figured the input would probably be as a character, rather than an integer. My question here is, how do I convert the character input to an integer so that it can be used in arithmetic calculations?
It would be great if someone could answer keeping in mind that I'm a beginner :)
Also, how can I convert said integer back to a character?
section .data
section .bss
input resb 4
section .text
global _start
_start:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 4
int 0x80
mov ebx, 10
imul ebx, ecx
mov eax, 1
int 0x80

Here's a couple of functions for converting strings to integers, and vice versa:
; Input:
; ESI = pointer to the string to convert
; ECX = number of digits in the string (must be > 0)
; Output:
; EAX = integer value
string_to_int:
xor ebx,ebx ; clear ebx
.next_digit:
movzx eax,byte[esi]
inc esi
sub al,'0' ; convert from ASCII to number
imul ebx,10
add ebx,eax ; ebx = ebx*10 + eax
loop .next_digit ; while (--ecx)
mov eax,ebx
ret
; Input:
; EAX = integer value to convert
; ESI = pointer to buffer to store the string in (must have room for at least 10 bytes)
; Output:
; EAX = pointer to the first character of the generated string
int_to_string:
add esi,9
mov byte [esi],STRING_TERMINATOR
mov ebx,10
.next_digit:
xor edx,edx ; Clear edx prior to dividing edx:eax by ebx
div ebx ; eax /= 10
add dl,'0' ; Convert the remainder to ASCII
dec esi ; store characters in reverse order
mov [esi],dl
test eax,eax
jnz .next_digit ; Repeat until eax==0
mov eax,esi
ret
And this is how you'd use them:
STRING_TERMINATOR equ 0
lea esi,[thestring]
mov ecx,4
call string_to_int
; EAX now contains 1234
; Convert it back to a string
lea esi,[buffer]
call int_to_string
; You now have a string pointer in EAX, which
; you can use with the sys_write system call
thestring: db "1234",0
buffer: resb 10
Note that I don't do much error checking in these routines (like checking if there are characters outside of the range '0' - '9'). Nor do the routines handle signed numbers. So if you need those things you'll have to add them yourself.

The basic algorith for string->digit is: total = total*10 + digit, starting from the MSD. (e.g. with digit = *p++ - '0' for an ASCII string of digits). So the left-most / Most-Significant / first digit (in memory, and in reading order) gets multiplied by 10 N times, where N is the total number of digits after it.
Doing it this way is generally more efficient than multiplying each digit by the right power of 10 before adding. That would need 2 multiplies; one to grow a power of 10, and another to apply it to the digit. (Or a table look-up with ascending powers of 10).
Of course, for efficiency you might use SSSE3 pmaddubsw and SSE2 pmaddwd to multiply digits by their place-value in parallel: see Is there a fast way to convert a string of 8 ASCII decimal digits into a binary number? and arbitrary-length How to implement atoi using SIMD?. But the latter probably isn't a win when numbers are typically short. A scalar loop is efficient when most numbers are only a couple digits long.
Adding on to #Michael's answer, it may be useful to have the int->string function stop at the first non-digit, instead of at a fixed length. This will catch problems like your string including a newline from when the user pressed return, as well as not turning 12xy34 into a very large number. (Treat it as 12, like C's atoi function). The stop character can also be the terminating 0 in a C implicit-length string.
I've also made some improvements:
Don't use the slow loop instruction unless you're optimizing for code-size. Just forget it exists and use dec / jnz in cases where counting down to zero is still what you want to do, instead of comparing a pointer or something else.
2 LEA instructions are significantly better than imul + add: lower latency.
accumulate the result in EAX where we want to return it anyway. (If you inline this instead of calling it, use whatever register you want the result in.)
I changed the registers so it follows the x86-64 System V ABI (First arg in RDI, return in EAX).
Porting to 32-bit: This doesn't depend on 64-bitness at all; it can be ported to 32-bit by just using 32-bit registers. (i.e. replace rdi with edi, rax with ecx, and rax with eax). Beware of C calling-convention differences between 32 and 64-bit, e.g. EDI is call-preserved and args are usually passed on the stack. But if your caller is asm, you can pass an arg in EDI.
; args: pointer in RDI to ASCII decimal digits, terminated by a non-digit
; clobbers: ECX
; returns: EAX = atoi(RDI) (base 10 unsigned)
; RDI = pointer to first non-digit
global base10string_to_int
base10string_to_int:
movzx eax, byte [rdi] ; start with the first digit
sub eax, '0' ; convert from ASCII to number
cmp al, 9 ; check that it's a decimal digit [0..9]
jbe .loop_entry ; too low -> wraps to high value, fails unsigned compare check
; else: bad first digit: return 0
xor eax,eax
ret
; rotate the loop so we can put the JCC at the bottom where it belongs
; but still check the digit before messing up our total
.next_digit: ; do {
lea eax, [rax*4 + rax] ; total *= 5
lea eax, [rax*2 + rcx] ; total = (total*5)*2 + digit
; imul eax, 10 / add eax, ecx
.loop_entry:
inc rdi
movzx ecx, byte [rdi]
sub ecx, '0'
cmp ecx, 9
jbe .next_digit ; } while( digit <= 9 )
ret ; return with total in eax
This stops converting on the first non-digit character. Often this will be the 0 byte that terminates an implicit-length string. You could check after the loop that it was a string-end, not some other non-digit character, by checking ecx == -'0' (which still holds the str[i] - '0' integer "digit" value that was out of range), if you want to detect trailing garbage.
If your input is an explicit-length string, you'd need to use a loop counter instead of checking a terminator (like #Michael's answer), because the next byte in memory might be another digit. Or it might be in an unmapped page.
Making the first iteration special and handling it before jumping into the main part of the loop is called loop peeling. Peeling the first iteration allows us to optimize it specially, because we know total=0 so there's no need to multiply anything by 10. It's like starting with sum = array[0]; i=1 instead of sum=0, i=0;.
To get nice loop structure (with the conditional branch at the bottom), I used the trick of jumping into the middle of the loop for the first iteration. This didn't even take an extra jmp because I was already branching in the peeled first iteration. Reordering a loop so an if()break in the middle becomes a loop branch at the bottom is called loop rotation, and can involve peeling the first part of the first iteration and the 2nd part of the last iteration.
The simple way to solve the problem of exiting the loop on a non-digit would be to have a jcc in the loop body, like an if() break; statement in C before the total = total*10 + digit. But then I'd need a jmp and have 2 total branch instructions in the loop, meaning more overhead.
If I didn't need the sub ecx, '0' result for the loop condition, I could have used lea eax, [rax*2 + rcx - '0'] to do it as part of the LEA as well. But that would have made the LEA latency 3 cycles instead of 1, on Sandybridge-family CPUs. (3-component LEA vs. 2 or less.) The two LEAs form a loop-carried dependency chain on eax (total), so (especially for large numbers) it would not be worth it on Intel. On CPUs where base + scaled-index is no faster than base + scaled-index + disp8 (Bulldozer-family / Ryzen), then sure, if you have an explicit length as your loop condition and don't want to check the digits at all.
I used movzx to load with zero extension in the first place, instead of doing that after converting the digit from ASCII to integer. (It has to be done at some point to add into 32-bit EAX). Often code that manipulates ASCII digits uses byte operand-size, like mov cl, [rdi]. But that would create a false dependency on the old value of RCX on most CPUs.
sub al,'0' saves 1 byte over sub eax,'0', but causes a partial-register stall on Nehalem/Core2 and even worse on PIII. Fine on all other CPU families, even Sandybridge: it's a RMW of AL, so it doesn't rename the partial reg separately from EAX. But cmp al, 9 doesn't cause a problem, because reading a byte register is always fine. It saves a byte (special encoding with no ModRM byte), so I used that at the top of the function.
For more optimization stuff, see http://agner.org/optimize, and other links in the x86 tag wiki.
The tag wiki also has beginner links, including an FAQ section with links to integer->string functions, and other common beginner questions.
Related:
How do I print an integer in Assembly Level Programming without printf from the c library? is the reverse of this question, integer -> base10string.
Is there a fast way to convert a string of 8 ASCII decimal digits into a binary number? highly optimized SSSE3 pmaddubsw / pmaddwd for 8-digit integers.
How to implement atoi using SIMD? using a shuffle to handle variable-length
Conversion of huge decimal numbers (128bit) formatted as ASCII to binary (hex) handles long strings, e.g. a 128-bit integer that takes 4x 32-bit registers. (It's not very efficient, and might be better to convert in multiple chunks and then do extended-precision multiplies by 1e9 or something.)
Convert from ascii to integer in AT&T Assembly Inefficient AT&T version of this.

How do i reverse a string on emu8086 assembly language [duplicate]

I have to do a simple calculator in assembly using EMU8086, but every time I try to launch it EMU8086 gives this error:
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I checked the other stuff, but there were no mistakes:
data segment
choice db ?
snum1 db 4 dup(?)
snum2 db 4 dup(?)
sres db 4 dup(?)
num1 db ?
num2 db ?
res db ?
;;menu1 db "Chose a function to procced", 10, 13, "Add [+]", 10, 13, "Sub [-]", 10, 13
;;menu2 db "Mul [*]", 10, 13, "Div [/]", 10, 13, "Mod [%]", 10, 13, "Pow [^]", 10, 13, "Exit [x]$"
messStr db "Enter Your Choice:",10,13,"",10,13,"Add --> +",10,13,"Sub --> -",10,13,"Mul --> *",10,13,"Div --> /",10,13,"Mod --> %",10,13,"Pow --> ^",10,13,"Exit --> X",10,13,"$"
msg1 db "Enter first number$"
msg2 db "Enter second number$"
msg3 db "Press any key to procced$"
msg4 db "The result is $"
ends
stack segment
dw 128 dup(0)
ends
code segment
assume cs:code, ds:data, ss:stack
newline proc ;; new line
push ax
push dx
mov ah, 2
mov DL, 10
int 21h
mov ah, 2
mov DL, 13
int 21h
pop dx
pop ax
ret
endp
printstr proc ;; print string
push BP
mov BP, SP
push dx
push ax
mov dx, [BP+4]
mov ah, 9
int 21h
pop ax
pop dx
pop BP
ret 2
endp
inputstr proc ;; collect input
push BP
mov BP, SP
push bx
push ax
mov bx, [BP+4]
k1:
mov ah, 1
int 21h
cmp al, 13
je sofk
mov [bx], al
inc bx
jmp k1
sofk:
mov byte ptr [bx], '$'
pop ax
pop bx
pop BP
ret 2
endp
getNums proc ;; get the numbers
call newline
push offset msg1
call printstr
call newline
push offset snum1
call inputstr
call newline
push offset msg2
call printstr
call newline
push offset snum2
call inputstr
ret
endp
start:
mov ax, data
mov ds, ax
mov ax, stack
mov ss, ax
;; print the main menu
call newline
push offset msg4
call printstr
;; collect the input
call newline
mov bx, offset choice
mov ah, 1
int 21h
mov [bx], al
;; check it
mov al, choice
cmp al, '+'
jne cexit
call getNums
jmp cont
cexit:
cmp al, 'x'
je cend
cont:
;; pause before going to the main menu
call newline
push offset msg3
call printstr
mov bx, offset choice
mov ah, 1
int 21h
call newline
call newline
call newline
jmp start
cend:
mov ax, 4c00h
int 21h
ends
end start
I cut most of the code segment because it wasn't important here.
After experimenting with the code I found that the problem was related to the lengths of the messages in the data segment. menu1 & menu2 were too long and any message after them can't be printed (msg1 & msg2 are printed, but nothing after them). I checked if I should merge menu1 & menu2, but it didn't help out. Please help me find out what is wrong with it.

The error message means you use int 21h / AH=09h on a string that didn't end with a $ (ASCII 24h). The system-call handler checked 2000 bytes without finding one.
Often, that means your code or data is buggy, e.g. in a fixed string you forgot a $ at the end, or if copying bytes into a buffer then you maybe overwrote or never stored a '$' in the first place.
But in this case, it appears that EMU8086 has a bug assembling push offset msg4. (In a way that truncates the 00B5h 16-bit address to 8-bit, and sign-extends back to 16, creating a wrong pointer that points past where any $ characters are in your data.)
Based on the error message below I know you are using EMU8086 as your development environment.
INT 21h, AH=09h -
address: 170B5
byte 24h not found after 2000 bytes.
; correct example of INT 21h/9h:
mov dx, offset msg
mov ah, 9
int 21h
ret
msg db "Hello$"
I'm no expert on EMU8086 by any stretch of the imagination. I do know why your offsets don't work. I can't tell you if there is a proper way to resolve this, or if it's an EMU8086 bug. Someone with a better background on this emulator would know.
You have created a data segment with some variables. It seems okay to me (but I may be missing something). I decided to load up EMU8086 to actually try this code. It assembled without error. Using the debugger I single stepped to the push offset msg1 line near the beginning of the program. I knew right away from the instruction encoding what was going on. This is the decoded instruction I saw:
It shows the instruction was encoded as push 0b5h where 0b5h is the offset. The trouble is that it is encoded as a push imm8 . The two highlighted bytes on the left hand pane show it was encoded with these bytes:
6A B5
If you review an instruction set reference you'll find the encodings for PUSH instruction encoded with 6A is listed as:
Opcode* Instruction Op/En 64-Bit Mode Compat/Leg Mode Description
6A ib PUSH imm8 I Valid Valid Push imm8.
You may say that B5 fits within a byte (imm8) so what is the problem? The smallest value that can be pushed onto the stack with push in 16-bit mode is a 16-bit word. Since a byte is smaller than a word, the processor takes the byte and sign extends it to make a 16-bit value. The instruction set reference actually says this:
If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack
B5 is binary 10110101 . The sign bit is the left most bit. Since it is 1 the upper 8 bits placed onto the stack will be 11111111b (FF). If the sign bit is 0 then then 00000000b is placed in the upper 8 bits. The emulator didn't place 00B5 onto the stack, it placed FFB5. That is incorrect! This can be confirmed if I step through the push 0b5h instruction and review the stack. This is what I saw:
Observe that the value placed on the stack is FFB5. I could not find an appropriate syntax (even using the word modifier) to force EMU8086 to encode this as push imm16. A push imm16 would be able to encode the entire word as push 00b5 which would work.
Two things you can do. You can place 256 bytes of dummy data in your data segment like this:
data segment
db 256 dup(?)
choice db ?
... rest of data
Why does this work? Every variable defined after the dummy data will be an offset that can't be represented in a single byte. Because of this EMU8086 is forced to encode push offset msg1 as a word push.
The cleaner solution is to use the LEA instruction. This is the load effective address instruction. It takes a memory operand and computes the address (in this case the offset relative to the data segment). You can replace all your code that uses offset with something like:
lea ax, [msg1]
push ax
AX can be any of the general purpose 16-bit registers. Once in a register, push the 16-bit register onto the stack.
Someone may have a better solution for this, or know a way to resolve this. If so please feel free to comment.
Given the information above, you may ask why did it seem to work when you moved the data around? The reason is that the way you reorganized all the strings (placing the long one last) caused all the variables to start with offsets that were less than < 128. Because of this the PUSH of an 8-bit immediate offset sign extended a 0 in the top bits when placed on the stack. The offsets would be correct. Once the offsets are >= 128 (and < 256) the sign bit is 1 and the value placed on the stack sign will have an upper 8 bits of 1 rather than 0.
There are other bugs in your program, I'm concentrating on the issue directly related to the error you are receiving.

I reviewed your code and concentrated on the following sequence of instructions:
mov bx, offset choice ; here you set BX to the address of 'choice'
mov ah, 1
int 21h ; here you 'READ CHARACTER FROM STANDARD INPUT, WITH ECHO'
mov [bx], al ; because INT 21h does preserve BX, you are writing back the result of the interrupt call (AL) back to the memory location at BX, which is named 'choice'
;; check it
mov al, choice ; HERE you are moving a BYTE variable named 'choice' to AL, overwriting the result of the last INT 21h call
cmp al, '+' ; ... and compare this variable to the ASCII value of '+'
jne cexit ; if this variable is unequal to '+' you jump to 'cexit'
call getNums ; otherwise you try to get another number from the input/STANDARD CONSOLE
So your sequence
mov bx, offset choice ; here you set BX to the address of 'choice'
...
mov [bx], al ; because INT 21h does preserve BX, you ...
...
mov al, choice
essentially means, that you are setting BX to the address of 'choice', then setting 'choice'([BX]) to AL and copying it back to AL.
This is redundant.
After that, you compare that char to '+' and...
if that char equals to '+', you get the next char with call getNums and then continue with cont:.
if that char does not equal to '+', you compare it to 'x', the exit-char. If it's not 'x', you fall through to cont:
No error here.
So your problem with menu1 and menu2 may stem from some escape characters included in your strings like %,/,\. For example, % is a MACRO character in some assemblers which may create problems.

simple solution is that your strings should always end in '$'
change DUP(?) to DUP('$') and all other strings end with ,'$'

string comparison in 8086

I have problem with this question. I don't know what it wants from me.
Question : Write a procedure that compares a source string at DS:SI to a destination string at ES:DI and sets the flags accordingly. If the source is less than the destination, carry flag is set. if string are equal , the zero flag is set. if the source is greater than the destination , the zero and carry flags are both cleared.
My Answer :
MOV ESI , STRING1
MOV EDI, STRING2
MOV ECX, COUNT
CLD
REPE CMPSB
Still I am not sure about it. Is it true or should I try something else ?
p.s: I don't understand why people vote down this question. What is wrong with my question ? I think we are all here for learning. Or not ? Miss I something ?

If the problem statement says the pointers are already in SI and DI when you're called, you shouldn't clobber them.
16-bit code often doesn't stick to a single calling convention for all functions, and passing (the first few) args in registers is usually good (fewer instructions, and avoids store/reload). 32-bit x86 calling conventions usually use stack args, but that's obsolete. Both the Windows x64 and Linux/Mac x86-64 System V ABIs / calling conventions use register args.
The problem statement doesn't mention a count, though. So you're implementing strcmp for strings terminated by a zero-byte, rather than memcmp for known-length blocks of memory. You can't use a single rep instruction, since you need to check for non-equal and for end-of-string. If you just pass some large size, and the strings are equal, repe cmpsb would continue past the terminator.
repe cmpsb is usable if you know the length of either string. e.g. take a length arg in CX to avoid the problem of running past the terminator in both strings.
But for performance, repe cmpsb isn't fast anyway (like 2 to 3 cycles per compare, on Skylake vs. Ryzen. Or even 4 cycles per compare on Bulldozer-family). Only rep movs and rep stos are efficient on modern CPUs, with optimized microcode that copies or stores 16 (or 32 or 64) bytes at a time.
There are 2 major conventions for storing strings in memory: Explicit-length strings (pointer + length) like C++ std::string, and implicit length strings where you just have a pointer, and the end of string is marked by a sentinel / terminator. (Like C char* that uses a 0 byte, or DOS string-print functions that use '$' as a terminator.)
A useful observation is that you only need to check for the terminator in one of the strings. If the other string has a terminator and this one doesn't, it will be a mismatch.
So you want to load a byte into a register from one string, and check it for the teminator and against memory for the other string.
(If you need to actually use ES:DI instead of just DI with the default DS segment base, you can use cmp al, [es: bx + di] (NASM syntax, adjust as needed like maybe cmp al, es: [bx + di] I think). Probably the question intended for you to use lodsb and scasb, because scasb uses ES:DI.)
;; inputs: str1 pointer in DI, str2 pointer in SI
;; outputs: BX = mismatch index, or one-past-the-terminators.
;; FLAGS: ZF=1 for equal strings (je), ZF=0 for mismatch (jne)
;; clobbers: AL (holds str1's terminator or mismatching byte on return)
strcmp:
xor bx, bx
.innerloop: ; do {
mov al, [si + bx] ; load a source byte
cmp al, [di + bx] ; check it against the other string
jne .mismatch ; if (str1[i] != str2[i]) break;
inc bx ; index++
test al, al ; check for 0. Use cmp al, '$' for a $ terminator
jnz .innerloop ; }while(str1[i] != terminator);
; fall through (ZF=1) means we found the terminator
; in str1 *and* str2 at the same position, thus they match
.mismatch: ; we jump here with ZF=0 on mismatch
; sete al ; optionally create an integer in AL from FLAGS
ret
Usage: put pointers in SI/DI, call strcmp / je match, because the match / non-match state is in FLAGS. If you want to turn the condition into an integer, 386 and later CPUs allow sete al to create a 0 or 1 in AL according to the equals condition (ZF==1).
Using sub al, [mem] instead of cmp al, [mem], we'd get al = str1[i] - str2[i], giving us a 0 only if the strings matched. If your strings only hold ASCII values from 0..127, that can't cause signed overflow, so you can use it as a signed return value that actually tells you which string sorts before/after the other. (But if there could be high-ASCII 128..255 bytes in the string, we'd need to zero- or sign-extend to 16-bit first to avoid signed overflow for a case like (unsigned)5 - (unsigned)254 = (signed)+7 because of 8-bit wraparound.
Of course, with our FLAGS return value, the caller can already use ja or jb (unsigned compare results), or jg / jl if they want to treat the strings as holding signed char. This works regardless of the range of input bytes.
Or inline this loop so jne mismatch jumps somewhere useful directly.
16-bit addressing modes are limited, but BX can be a base, and SI and DI can both be indices. I used an index increment instead of inc si and inc di. Using lodsb would also be an option, and maybe even scasb to compare it to the other string. (And then check the terminator.)
Performance
Indexed addressing modes can be slower on some modern x86 CPUs, but this does save instructions in the loop (so it's good for true 8086 where code-size matters). Although to really tune for 8086, I think lodsb / scasb would be your best bet, replacing the mov load and cmp al, [mem], and also the inc bx. Just remember to use cld outside the loop if your calling convention doesn't guarantee that.
If you care about modern x86, use movzx eax, byte [si+bx] to break the false dependency on the old value of EAX, for CPUs that don't rename partial registers separately. (Breaking the false dep is especially important if you use sub al, [str2] because that would turn it into a 2-cycle loop-carried dependency chain through EAX, on CPUs other than PPro through Sandybridge. IvyBridge and later doesn't rename AL separately from EAX, so mov al, [mem] is a micro-fused load+merge uop.)
If the cmp al,[bx+di] micro-fuses the load, and macro-fuses with jne into one compare-and-branch uop, the whole loop could be only 4 uops total on Haswell, and could run at 1 iteration per clock for large inputs. The branch mispredict at the end will make small-input performance worse, unless branching goes the way way every time for a small enough input. See https://agner.org/optimize/. Recent Intel and AMD can do 2 loads per clock.
Unrolling could amortize the inc bx cost, but that's all. With a taken + not-taken branch inside the loop, no current CPUs can run this faster than 1 cycle per iteration. (See Why are loops always compiled into "do...while" style (tail jump)? for more about the do{}while loop structure). To go faster we'd need to check multiple bytes at once.
Even 1 byte / cycle is very slow compared to 16 bytes per 1 or 2 cycles with SSE2 (using some clever tricks to avoid reading memory that might fault).
See https://www.strchr.com/strcmp_and_strlen_using_sse_4.2 for more about using x86 SIMD for string compare, and also glibc's SSE2 and later optimized string functions.
GNU libc's fallback scalar strcmp implementation looks decent (translated from AT&T to Intel syntax, but with the C preprocessor macros and stuff left in. L() makes a local label).
It only uses this when SSE2 or better isn't available. There are bithacks for checking a whole 32-bit register for any zero bytes, which could let you go faster even without SIMD, but alignment is a problem. (If the terminator could be anywhere, you have to be careful when loading multiple bytes at once to not read from any memory pages that you aren't sure contain at least 1 byte of valid data, otherwise you could fault.)
strcmp:
mov ecx,DWORD PTR [esp+0x4]
mov edx,DWORD PTR [esp+0x8] # load pointer args
L(oop): mov al,BYTE PTR [ecx] # movzx eax, byte ptr [ecx] would be avoid a false dep
cmp al,BYTE PTR [edx]
jne L(neq)
inc ecx
inc edx
test al, al
jnz L(oop)
xorl eax, eax
/* when strings are equal, pointers rest one beyond
the end of the NUL terminators. */
ret
L(neq): mov eax, 1
mov ecx, -1
cmovb eax, ecx
ret

display different attributes of character in a string in assembly

I would like to know if its possible to change the attributes of each character in a string?
For example in the string "hello" the character 'h' will have a different color, the same with 'e' and so on.
I use AH, 06 to call every character in the string. Then use AH, 09 INT 10h to change the attribute of each character but then its not working.
I want to know how can AL (in AH, 09) get the DL (AH, 06) and change the attribute of every character.
is this possible?
thanks for the help
here's my code
`
.DATA
hello DB "hello$"
.CODE
START:
MOV AX, #DATA
MOV DS, AX
LEA SI, hello
MOV CX, 0005H
E: MOV AH, 06H
MOV DL, [SI]
INC SI
;INT 21H
LOOP E
MOV CX, 0005H
MOV AH, 09H
MOV AL, [SI]
INC SI
MOV BL, 0001H
H: INT 10H
INC BL
LOOP H
MOV AX, 4C00H
INT 21H
END START `

First off this code is not Windows, it's 16-bit DOS code that calls the BIOS video routines.
The main body calls INT 10H, the documentation for that call is here: http://en.wikipedia.org/wiki/INT_10H
For int 10H,9 this is the relevant line:
Write character and attribute at cursor position AH=09h AL = Character, BH = Page Number, BL = Color, CX = Number of times to print character
This means there are a couple of errors you're making:
You cannot use CX as a loop counter, because it's a parameter for the call.
The color goes into bl so don't hardcode that.
bh is the page number, but you're not setting bh anywhere.
Increasing bl and then later resetting it back to 1 will obviously fix it at 1.
You've already increased si through the whole length of the string in the first loop, so in the second loop you're reading past the end of the string (a classic buffer overrun). At the start of the second loop you need to repeat the lea.
Ever since the 80486 using loop is a bad idea because it's much slower than the equivalent sub reg,1; jnz label; besides loop is tied to the cx register which is awkward.
If you're using bios int calls speed is hardly a requirement, but that's not the point.
If you want to learn x86 assembly you should also learn not to use the old cisc instructions on new processors.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string