Strange execution times on string reversal - string

Last week I read this answer and its accompanying comments about a string
reversal routine.
I decided to test it myself and stumbled upon an apparent anomalie.
For some reason the second routine performs very differently depending on
the length of the string being shorter than 128 bytes or not.
UsingXchg UsingMov UsingBswap
add eax, ebx add eax, ebx mov edx, eax
jmp B jmp B add eax, ebx
A: mov dl, [ebx] A: mov dl, [ebx] jmp B
xchg dl, [eax] mov cl, [eax] A: sub eax, 4
mov [ebx], dl mov [eax], dl mov esi, [ebx]
inc ebx mov [ebx], cl mov edi, [eax]
B: dec eax inc ebx bswap esi
cmp ebx, eax B: dec eax bswap edi
jb A cmp ebx, eax mov [eax], esi
jb A mov [ebx], edi
add ebx, 4
B: sub edx, 8
jnb A
jmp D
On entry: EBX is address C: mov dl, [ebx]
EAX is length mov cl, [eax]
mov [eax], dl
mov [ebx], cl
inc ebx
D: dec eax
cmp ebx, eax
jb C
Next come the times (in msec) I measured for a large number of repeats.
StringSize UsingXchg UsingMov UsingBswap
26 30.7 17.4 6.4
52 60.9 33.4 12.2
78 90.5 49.4 17.9
104 121.9 65.4 22.7
/------\
127 | 79.0 |
128 | 27.3 |
\------/
130 152.1 27.6 27.9
156 181.9 30.7 33.5
182 211.8 34.4 39.2
208 241.8 37.2 44.0
260 301.6 43.6 54.9
2600 2996.5 337.2 538.4
26000 29949.2 3226.5 5350.4
Both UsingXchg and UsingBswap show regularity, but UsingMov clearly
shifts gear at the 128 bytes mark.
I suspect caches are at the heart of this little problem, but then why don't
all 3 routines show the same effect?
I'd like to conclude that the best string reversal routine is UsingBswap
with a bypass in case the string has more than 127 bytes.
Would this be a valid conclusion at all?
mov edx, eax ;Remaining string length
add eax, ebx ;Turn EAX into a pointer
cmp edx, 127
ja D ;Don't use Bswap on long strings
jmp B
A: sub eax, 4
...
D: dec eax
cmp ebx, eax
jb C ;Still 2 bytes left

Related

How to print in reverse in assembly language using DOSBox Debug?

This is one of the examples my instructor gave me but I need to find a way to reverse the order of what I input.
Here's the code:
e200 "Name: $"
e300 "Hello, $"
a100
mov ah, 09
mov dx, 200
int 21
mov bx, 400
mov ah, 01
int 21
mov [bx], al
inc bx
cmp bx, 405
jne 10a
mov cl, 24
mov [bx], cl
mov ah, 02
mov dl, 0a
int 21
mov dl, 0d
int 21
mov ah, 09
mov dx, 300
int 21
mov ah, 09
mov dx, 400
int 21
int 20
Output is:
Name: Maria
Hello, Maria
Expected Output (reverse):
Name: Maria
Hello, ariaM
Tested on current lDebug and (creating the executable only) on Microsoft's Debug from MS-DOS version 2 (available under MIT license). This should solve your task.
f 100 4FF 90
a 100
mov ah, 09
mov dx, 500
int 21
mov ah, 0A
mov dx, 800
int 21
mov ah, 09
mov dx, 600
int 21
xor cx, cx
mov cl, [801]
mov bx, cx
add bx, 801
jcxz 190
a 130
mov ah, 02
mov dl, [bx]
int 21
dec bx
loop 130
a 190
mov ah, 09
mov dx, 700
int 21
mov ax, 4C00
int 21
e 800 FF 0 0D
e 500 "Name: $"
e 600 0D 0A "Hello, $"
e 700 0D 0A "$"
g
q
800 up to below 901 holds a maximum size buffer for interrupt 21h service 0Ah, which we initialise to FF (buffer size 255), 0 (nothing to recall), 0D (indicate end of recallable input).
500, 600, and 700 contain dollar-terminated messages to be used with interrupt 21h service 09h.
The output loop counter is initialised to CX = CL = length in bytes of input line, as returned by service 0Ah, excluding the final Carriage Return.
The offset to output is initialised to BX = 801 + CX, which is the same as 802 + CX - 1. 802 is the address of the first byte of the returned input. 802 + CX is the address behind the last byte of the returned input. The minus one serves as an adjustment to point at the last byte instead of behind it.
jcxz skips the loop for empty names.
mov dl, [bx] loads a byte into the low half of DX.
Interrupt 21h service 02h is used to output a byte.
dec bx decrements the offset stored in BX.
The loop 130 instruction decrements CX and jumps back if the resulting CX is non-zero.
The f command is to fill a part of the code segment with all nop instructions. Along with the different a commands this allows us to place and use fixed offsets for jump targets in the code, without having to know exactly how long each instruction will be.
Enter the name input, then indicate its end using the Enter key.
To generate an executable, prepend the command f 100 9FF 0, then instead of the g command run this:
r bx
0
r cx
900
n test.com
w 100
q

Why the add or the imul get wrong results in nested loops in x86-64

I am implementing at least 3 nested loops in x86-64 using the NASM assembler, and to see how they are working Im creating a counter and printing its values for each iteration. The issue comes when the second iteration of the external loop begins because ist not setting the register counter to 0. Ive tried mov rbp, 0, xor rbp, rbp and sub rbp, 10 but all the results that may be less than 8 are worng for example if the result must be a 0 its shown as a 10.Here`s my code:
test.asm
global main
section .data
newLine db 10,0
section .text
global _start
_start:
mov r12, 0
mov rbp, 0
jmp t1
b1:
mov r13, 0
jmp t2
b2:
mov r14, 0
jmp t3
b3:
printVal rbp
print newLine
inc r14
t3:
cmp r14, 3
jl b3
add rbp, 6
inc r13
t2:
cmp r13, 3
jl b2
inc r12
mov rbp, 0
xor rbp, rbp
print newLine
t1:
cmp r12, 2
jl b1
exit
Result
0
0
0
6
6
6
12
12
12
10
10
10
16
16
16
12
12
12
Expected Result
0
0
0
6
6
6
12
12
12
0
0
0
6
6
6
12
12
12
linux64.inc
section .bss
digitSpace resb 100
digitSpacePos resb 8
printSpace resb 8
%macro print 1
mov rax, %1
mov [printSpace], rax
mov rbx, 0
%%printLoop:
mov cl, [rax]
cmp cl, 0
je %%endPrintLoop
inc rbx
inc rax
jmp %%printLoop
%%endPrintLoop:
mov rax, 1
mov rdi, 0
mov rsi, [printSpace]
mov rdx, rbx
syscall
%endmacro
%macro printVal 1
mov rax, %1
%%printRAX:
mov rcx, digitSpace
mov [digitSpacePos], rcx
%%printRAXLoop:
mov rdx, 0
mov rbx, 10
div rbx
push rax
add rdx, 48
mov rcx, [digitSpacePos]
mov [rcx], dl
inc rcx
mov [digitSpacePos], rcx
pop rax
cmp rax, 0
jne %%printRAXLoop
%%printRAXLoop2:
mov rcx, [digitSpacePos]
mov rax, 1
mov rdi, 1
mov rsi, rcx
mov rdx, 1
syscall
mov rcx, [digitSpacePos]
dec rcx
mov [digitSpacePos], rcx
cmp rcx, digitSpace
jge %%printRAXLoop2
%endmacro
%macro exit 0
mov rax, 60
mov rdi, 0
syscall
%endmacro
sh
#!/bin/bash
nasm -f elf64 test.asm -o test.o
ld test.o -o test
./test

Strange Nasm error: invalid combination of opcode and operands

I have an assembler file with this code:
global _start
_start:
mov eax, -2
imul eax, c
mov ebx, eax
mov eax, 82
imul eax, d
sub ebx, eax
div 4
mov eax, 1
mov ebx, 0
int 0x80
section .data
a: db 10
c: db 3
d: db 2
I compile it with:
nasm -f elf c1.asm
I get an error:
c1.asm:15: error: invalid combination of opcode and operands
What is the problem with my code, and how can I fix it?
mov eax, -2
imul eax, c
mov ebx, eax
mov eax, 82
imul eax, d
sub ebx, eax
*** Here's something amiss
div 4
These first 6 lines place a result in EBX, but the div instruction will always use the EAX register and certainly not the EBX register! You need to move the result from EBX to EAX with mov eax, ebx prior to doing the division.
div 4
The div instruction doesn't take an immediate for an operand! Even if it did, you would still have to supply some info about the size of the operation.
This solution keeps the division:
xor edx, edx
div dword [four] ;divide EDX:EAX by 4 -> quotient in EAX
four dd 4
A much better solution is to not divide at all and just shift EAX to the right 2 times:
shr eax, 2
imul eax, c
...
imul eax, d
These imul's compiled fine but to actually multiply by the defined values use the following:
imul eax, [c]
...
imul eax, [d]
c dd 3
d dd 2

How to compute the product of six numbers on Assembly Language

I am new to assembly and I am trying to create a program that simply multiplies the product 1 * 2 * 3 * 4 * 5 * 6 and stores that result in the AL register. I am told that I can accomplish this in a single statement.
This is what I have so far:
MOV product, 1 * 2 * 3 * 4 * 5 * 6
MOV al, product
However, this produces the message error A2070: inval on the first line.
I also tried to do it like this:
IMUL AH, 1, 2
IMUL BH, 3, 4
IMUL BL, 5, 6
IMUL CL, BH, BL
IMUL AL, CL, AH
But every line of that produces an error referring to the size of the arguments being different.
Could someone tell me the best way I can accomplish computing this product?
Note that 1*2*3*4*5*6 = 720 so it won't fit into a 8 bit register so you should use a 16 bit one, such as ax. If you are allowed to use compile time multiply, then of course mov ax, 1*2*3*4*5*6 should work. The assembler simply turns that into mov ax, 720, no surprise there.
As for the second version, IMUL does not accept two immediate operands. If you want to use this approach, you will need something like this:
mov ax, 2
imul ax, ax, 3
imul ax, ax, 4
imul ax, ax, 5
imul ax, ax, 6

this program is incomplete but I'm supposed to ask the user for 10 2-digit numbers and then give the average

However I always get a segmentation fault I don't know what that means and how to approach it either
can someone please explain?
segment .data
;welcome message
welcomeMsg db "This NASM program calculates the average of 10 2-digit numbers", 0xA
msgLength equ $-welcomeMsg
endingMsg db "The average is: ", 0xA
endingLength equ $-endingMsg
numbers db "enter 10 2-digit numbers",0xA
numbersLength equ $-numbers
segment .bss
arrayOfNumber: resb 30 ;store the number entered by the user in arrayOfNumber
arrayOfNumberLength equ $-arrayOfNumber
total : resw 1 ; store the sum value of 10 numbers
segment .text
global _start
_start:
mov eax, 4 ;write()
mov ebx, 1 ;where? output
mov ecx, welcomeMsg ; what do we want to output
mov edx, msgLength ; the length of welcomeMsg
int 0x80
mov eax, 4
mov ebx, 1
mov ecx, numbers
mov edx, numbersLength
int 0x80
;Accept input and store it into arrayOfNumber
sub esi, esi
mov esi, 10
loop1:
mov eax, 3 ;read
mov ebx, 0 ; from the standard input(keyboard/console)
mov ecx, arrayOfNumber ; storing at memory location arrayOfNumber
mov edx, 3 ;3 bytes
int 0x80
dec esi
jnz loop1 ;if esi is not equal to zero goto loop1
sub esi, esi
sub edi,edi
sub ecx, ecx
sub ebx, ebx
loop2:
sub eax, eax
mov eax, [arrayOfNumber+edi]
sub eax, 30
add edi, 1
imul eax, 10
mov ebx, [arrayOfNumber+edi]
sub ebx, 30
add edi, 2
add ebx, eax
add ecx, ebx
cmp edi, 30
jne loop2
sub edi, edi
mov [total], ecx
mov edi, 2
mov ax, [total]
convert:
mov bx, 10
sub dx, dx
div bx
add dl, '0'
mov [total + edi], dl
dec edi
cmp ax, 0
jne convert
print:
mov eax, 4
mov ebx, 1
mov ecx, total
mov edx, 3
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This is the whole program and the output of that I no longer have the segmentation fault but I do not get the right sum of these following numbers.
This NASM program calculates the average of 10 2-digit numbers
enter 10 2-digit numbers
12
12
12
12
12
12
12
12
12
12
264
however I should actually have 120 as the sum of all of these integers not 264. If you could hep me please.

Resources