Most effective code to search character in string - search

Let's say we have a given string of chars
DataString DB 'AGIJKSZ', 0FFH ;
What would be the most time-effective procedure to find let's say J in it?
By time-effective I mean least amount of clock ticks.
It's a x86 processor with these instruction sets:
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, FMA3, TSX
Let's assume that both string and searched character can be changed but only via editing the code, and we're always looking for a single character. The string is ASCII. End of string is marked by FF
Answer should be just setting EAX to found 1 / not found 0.
This is what I can think of
FindChar_1 PROC
MOV ESI, OFFSET DataString ;
SI
MOV AH, 'J' ;
Check_End:
CMP BYTE PTR [ESI], 0FFH ;
JE Not_Find ;
CMP AH, [ESI] ;
'DataString'
JE Got_Equal ;
ADD ESI, 1 ;
JMP Check_End ;
Got_Equal:
MOV DL, [ESI] ;
JMP Done
Not_Find:
MOV EAX,0 ;
RET ;
Done:
MOV EAX,1 ;
RET ;
FindChar_1 ENDP
EDIT:
Now I realize that there I something else I should have mentioned. I'm using masm32 so instructions I can use are limited to the very basic ones.

When you need the code be fast, avoid memory access, jumps and complex instructions. I would scan the string twice: once to find the end marker, and then to find the searched character:
FindChar_2 PROC
MOV ESI, OFFSET DataString
XOR EAX,EAX
XOR ECX,ECX
XOR EDX,EDX
NOT EAX ; Let AL=EndOfString marker.
NOT ECX ; Let ECX=Max.integer.
MOV EDI,ESI
CLD
REPNE SCASB ; ECX -= String size (-9 in this example).
NOT ECX ; ECX= String size (8).
MOV AL,'J' ; The searched needle.
MOV EDI,ESI ; Restore the haystack pointer.
REPNE SCASB ; The actual search.
; It returns CF if not found, because the needle is below 0FFH.
CMC ; Invert the logic.
MOV EAX,EDX ; Return false.
ADC EAX,EDX ; Return true if found.
RET
FindChar_2 ENDP

Related

Split string in assembly

I'm using a code for split a string with a delimiter, but it save me the "right side" and I need the "left side" of the word.
For example, if the input is 15,20,x, then the outputs should be:
15
20
x
But it show me:
15,20,x
20,x
x
This is the code that I'm using
split:
mov esi,edi
mov ecx, 0
not ecx
mov al, ','
cld
repne scasb
not ecx
lea eax, [ecx-1]
;push eax
mov eax, edi
call println ;Here is where I want to print the "left side"
;pop eax
xchg esi, edi
mov eax, esi
call length
mov edi, esi
mov ecx, eax
cmp ecx, 1
je fin
jg split
ret
fin:
ret
After repne scasb the contents of ECX has changed from -1 to -4, you need to NOT ECX and then DEC ECX to obtain ECX=2 (size of the member "15"). Then println ECX bytes of the text at ESI and repeat split: There is a rub: as the last member "x" is not terminated with comma, repne scasb will crash. You should limit ECX to the total size of input text prior to scan. I tried this variant with EuroAssembler:
; Assembled on Ubuntu with
; wine euroasm.exe split.asm
; Linked with
; ld split.obj -o split -e Main -m elf_i386
; Run with
; ./split
EUROASM
split PROGRAM Format=COFF, Entry=Main:
INCLUDE linapi.htm,cpuext32.htm ; Library which defines StdInput,StdOutput.
[.text]
Main: StdOutput ="Enter comma-separated members: "
StdInput aString ; Read aString from console, return number of bytes in ECX.
MOV EDI,aString ; Pointer to the beginning of text.
LEA EBX,[EDI+ECX] ; Pointer to the end of text.
split: MOV ESI,EDI ; Position of the 1st byte.
MOV ECX,EBX
SUB ECX,EDI ; How many bytes is left in unparsed substring.
JNA fin:
MOV AL,','
REPNE SCASB
MOV ECX,EDI
DEC ECX ; Omit the delimiter.
SUB ECX,ESI
StdOutput ESI, Size=ECX, Eol=Yes
JMP split:
fin: TerminateProgram
[.bss]
aString DB 256 * BYTE
ENDPROGRAM split
And it worked well:
./split
Enter comma-separated members: 15,20,x
15
20
x

Disregard a strings' space characters in ASM

Trying to find the number of characters in a string and disregard all the " " space characters
I have a C++ portion that passes the strings to asm and here is my asm
works fine, only thing is that the space characters are being counted as well.
stringLength PROC PUBLIC
PUSH ebp ; save caller base pointer
MOV ebp, esp ; set our base pointer
SUB esp, (1 * 4) ; allocate uint32_t local vars
PUSH edi
PUSH esi
; end prologue
MOV esi, [ebp+8] ;gets the string
xor ebx, ebx
COMPARE:
MOV al, [esi + ebx]
CMP al, 0 ;compare character of string with 0
JE FINALE ;if = to 0 go to end
INC ebx ;counter
CMP al, ' ' ;compare with sapce
JE SPACE ;go get rid of the space and keep going
INC al ;otherwise inc al to next character and repeat
JMP COMPARE
SPACE:
DEC ebx ;get rid of the extra space
INC al
JMP COMPARE ;goes back to compare
FINALE:
MOV eax,ebx ; bring back the counter
ADD esp, (2 * 4) ; clear the stack
POP esi
POP edi
MOV esp, ebp ; deallocate locals
POP ebp ; restore caller base pointer
RET
stringLength ENDP ; end the procedure
END stringLength
You are doing a lot of useless stuff and not doing anything to count ignoring the spaces.
You don't really need to setup a new stack frame, for such a simple routine you can do everything in clobbered registers, or at most save a few registers on the stack;
That inc al is pointless - you are incrementing the character value, just to discard it at the next loop iteration.
push fmt and then you clean the stack immediately? What sense does it make?
mov ebx, 0- nobody does that, the idiomatic way to zero a register is xor ebx,ebx (the instruction encoding is more compact);
cmp al, 0 given that you are only interested in equality, you can just do test al, al (more compact);
you read [ebp+12] but never actually use it - is that supposed to be an unused parameter?
As for the algorithm itself, you'll just have to keep a separate counter to count non-space characters; actually, you can just keep ebx for that, and increment directly esi to iterate over characters. For example:
xor ebx, ebx
COMPARE:
mov al, [esi]
cmp al, ' '
jne nonspace
inc ebx
nonspace:
test al, al
jz FINALE
inc esi
jmp COMPARE
FINALE:
Now, this can be streamlined further exploiting the fact that the eax is going to be the return value, and that you can clobber freely ecx and edx, so:
stringLength PROC PUBLIC
mov ecx,[esp+4] ; get the string
xor eax,eax ; zero the counter
compare:
mov dl,[ecx] ; read character
cmp dl,' '
jne nospace
inc eax ; increase counter if it's a space
nospace:
test dl,dl
jz end ; go to end if we reached the NUL
inc ecx ; next character
jmp compare
end:
ret ; straight return, nothing else to do
stringLength ENDP ; end the procedure
edit: about the updated version
COMPARE:
MOV al, [esi + ebx]
CMP al, 0
JE FINALE
INC ebx
CMP al, " " ; I don't know what assembler you are using,
; but typically character values are in single quotes
JE SPACE
INC al ; this makes no sense! you are incrementing
; the _character value_, not the position!
; it's going to be overwritten at the next iteration
JMP COMPARE
SPACE:
INC eax ; you cannot use eax as a counter, you are already
; using it (al) as temporary store for the current
; character!
JMP COMPARE
I think we need to use whole the eax register to compare values. In such manner:
; inlet data:
; eax - pointer to first byte of string
; edx - count of bytes in string
; ecx - result (number of non-space chars)
push esi
mov ecx, 0
mov esi, eax
##compare: cmp edx, 4
jl ##finalpass
mov eax, [esi]
xor eax, 20202020h ; 20h - space char
cmp al, 0
jz ##nextbyte0
inc ecx
##nextbyte0: cmp ah, 0
jz ##nextbyte1
inc ecx
##nextbyte1: shr eax, 16
cmp al, 0
jz ##nextbyte2
inc ecx
##nextbyte2: cmp ah, 0
jz ##nextbyte3
inc ecx
##nextbyte3: add esi, 4
sub edx, 4
jmp ##compare
##finalpass: and edx, edx
jz ##fine
mov al, [esi]
cmp al, 20h
jz ##nextbyte4
inc ecx
##nextbyte4: inc esi
dec edx
jmp ##finalpass
##fine: pop esi
; save the result data and restore stack

Comparing character with Intel x86_64 assembly

I'm new to assembly (Intel x86_64) and I am trying to recode some functions from the C library. I am on a 64-bit Linux and compiling with NASM.
I have an error with the strchr function and I can't find a solution...
As a reminder here is an extract from the man page of strchr :
char *strchr(const char *s, int c);
The strchr() function returns a pointer to the first occurrence of the character c in the string s.
Here is what I tried :
strchr:
push rpb
mov rbp, rsp
mov r12, rdi ; get first argument
mov r13, rsi ; get second argument
call strchr_loop
strchr_loop:
cmp [r12], r13 ; **DON'T WORK !** check if current character is equal to character given in parameter...
je strchr_end ; go to end
cmp [r12], 0h ; test end of string
je strchr_end ; go to end
inc r12 ; move to next character of the string
jmp strchr_loop ; loop
strchr_end
mov rax, r12 ; set function return
mov rsp, rbp
pop rbp
This return a pointer on the ned of the string and don't find the character...
I think it's this line which doesn't work :
cmp [r12], r13
I tested with this and it worked :
cmp [r12], 97 ; 97 = 'a' in ASCII
The example :
char *s;
s = strchr("blah", 'a');
printf("%s\n", s);
Returned :
ah
But I can't make it work with a register comparison. What am I doing wrong, and how can I fix it?
First, thanks for your help ! I think I have a better understanding of what I am doing.
I was stuck with the problem of receiving a 8bits parameter instead of the 64bits rdi... But a friend shows me that the first 8bit parameter is also in the sil register.
So here's my working code :
strchr:
push rpb
mov rbp, rsp
call strchr_loop
strchr_loop:
cmp byte [rdi], sil ; check if current character is equal to character given in parameter
je strchr_end ; go to end
cmp byte [rdi], 0h ; test end of string
je strchr_end ; go to end
inc rdi ; move to next character of the string
jmp strchr_loop ; loop
strchr_end
mov rax, rdi ; set function return
mov rsp, rbp
pop rbp
Please feel free to tell me if there is a way to improve it and thanks again !
Here is a fix for your assembly code, which implements the strchr(3), in x86-64 Assembly as defined in the man pages:
asm_strchr:
push rbp
mov rbp, rsp
strchr_loop:
cmp byte [rdi], 0 ; test end of string
je fail_end ; go to end
cmp byte [rdi], sil ; check if current character is equal to character given in parameter
je strchr_end ; go to end
inc rdi ; move to next character of the string
jmp strchr_loop ; loop
strchr_end:
mov rax, rdi ; set function return
mov rsp, rbp
pop rbp
ret
fail_end:
mov rax, 0
mov rsp, rbp
pop rbp
ret

replace a string in assembly

I wanna get a source string ,find a key in it and replace the key with a replace string so i copy the rest of source and the replace string in the result .
it outputs the correct prompt when the key doesnt exist in the source string : "The key does not appear in the string."
but when the source contains the key it stucks and doesnt continue running
(it looks sth in found label part have been missed and have an overflow)
can anyone help to correct the found part ?
any help will be appreciate :)
; program to search for one string embedded in another
; author: R. Detmer revised: 10/97
.386
.MODEL FLAT
ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h
cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character
.STACK 4096 ; reserve 4096-byte stack
.DATA
prompt1 BYTE "String to search? ", 0
prompt2 BYTE cr, Lf, "Key to search for? ", 0
prompt3 BYTE cr, Lf, "Word to replace? ", 0
source BYTE 100 DUP (?)
key BYTE 20 DUP (?)
replace BYTE 20 DUP (?)
srcLength DWORD ?
keyLength DWORD ?
repLength DWORD ?
BeginLength DWORD ?
restLength DWORD ?
cpyLength DWORD ?
lastPosn DWORD ?
restPosition DWORD ?
firstParam DWORD ?
secondParam DWORD ?
keyPosition DWORD ?
failure BYTE cr,Lf,Lf,"The key does not appear in the string.",cr,Lf,0
success BYTE cr,Lf,Lf, " The result string is : " ,cr,Lf,Lf
result BYTE 200 DUP (?)
PUBLIC _start ; make entry point public
.CODE
_start: output prompt1 ; ask for
input source,100 ; and input source string
lea eax, source ; find length of string
push eax ; length parameter
call strlen
mov srcLength,eax ; save length of source
output prompt2 ; ask for
input key,20 ; and input key string
lea eax, key ; find length of string
push eax ; length parameter
call strlen
mov keyLength,eax ; save length of key
output prompt3 ; ask for
input replace,20 ; and input replace string
lea eax, replace ; find length of string
push eax ; length parameter
call strlen
dec eax
mov repLength,eax ; save length of replace
; calculate last position of source to check
mov eax,srcLength
sub eax,keyLength
inc eax ; srcLength − keyLength + 1
mov lastPosn, eax
cld ; left to right comparison
mov eax,1 ; starting position
whilePosn: cmp eax,lastPosn ; position <= last_posn?
jnle endWhilePosn ; exit if past last position
lea esi,source ; address of source string
add esi,eax ; add position
dec esi ; address of position to check is incremented automatically
lea edi,key ; address of key
mov ecx,keyLength ; number of positions to check
repe cmpsb ; check
jz found ; exit on success
inc eax ; increment position
jmp whilePosn ; repeat
endWhilePosn:
output failure ; the search failed
jmp quit ; exit
;-------------------------------------------------------------
found:
mov keyPosition, eax ; position of key
mov ebx, eax ;copy start position of key
lea eax, source
sub ebx, eax ;position - source address
mov BeginLength, ebx ;begin Source length (before key)
add ebx, keyLength
mov eax, srcLength
sub eax, ebx
mov restLength, eax ;rest of Source length (after key)
mov eax, keyPosition
add eax, keyLength ; position + key
mov restPosition, eax
;source begin to result
lea eax, result
mov firstParam, eax ; destination address
lea eax, source
mov secondParam, eax
mov eax, BeginLength ; copy length
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
;replace to result
mov eax, firstParam
add eax , BeginLength
mov firstParam, eax ; address of rest of result
lea eax, replace
mov secondParam, eax ; string to replace
mov eax, repLength ; copy length
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
;Rest to result
mov eax, firstParam
add eax , repLength
mov firstParam, eax ; address of rest of result
mov eax, restPosition
mov secondParam, eax
mov eax, restLength
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
mov BYTE PTR [edi],0 ;terminate destination string
output success
quit:
INVOKE ExitProcess, 0 ; exit with return code 0
;----------------------------------------------------------
strlen PROC NEAR32
; find length of string whose address is passed on stack
; length returned in EAX
push ebp ; establish stack frame
mov ebp, esp
pushf ; save flags
push ebx ; and EBX
sub eax, eax ; length := 0
mov ebx, [ebp+8] ; address of string
whileChar: cmp BYTE PTR [ebx], 0 ; null byte?
je endWhileChar ; exit if so
inc eax ; increment length
inc ebx ; point at next character
jmp whileChar ; repeat
endWhileChar:
pop ebx ; restore registers and flags
popf
pop ebp
ret 4 ; return, discarding parameter
strlen ENDP
END
found:
mov keyPosition, eax ; position of key
mov ebx, eax ;copy start position of key
lea eax, source
sub ebx, eax ;position - source address
mov BeginLength, ebx ;begin Source length (before key)
In these lines you have subtracted things that cannot be subtracted.
When you get at the label found, EAX has a 1-based relative position index that you copy to the EBX register. This value ranges from 1 to 100. Now you subtract the absolute address of your source buffer. This could be in the millions. That's clearly a mistake. It becomes disastrous when later on you use it as a loop counter and start corrupting memory!
success BYTE cr,Lf,Lf, " The result string is : " ,cr,Lf,Lf
You forgot to zero-terminate the success message.
It will disrupt your final macro call output success and so it would seem that the program didn't correctly replace the string.

Move to end of string within buffer - Assembly Languge

I am trying to take in a string and then see if the last value in the string is an EOL character. I figured I would use the length of the string read in and then add it to the address of the buffer to find the last element. This does not seem to work.
Edit: I apologize that I did not include more information. Variables are defined as such:
%define BUFLEN 256
SECTION .bss ; uninitialized data section
buf: resb BUFLEN ; buffer for read
newstr: resb BUFLEN ; converted string
rlen: resb 4
Then a dos interrupt is called to accept a string from the user like so:
; read user input
;
mov eax, SYSCALL_READ ; read function
mov ebx, STDIN ; Arg 1: file descriptor
mov ecx, buf ; Arg 2: address of buffer
mov edx, BUFLEN ; Arg 3: buffer length
int 080h
Then we go into our loop:
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx\
I am user NASM to compile and writing for an i386 machine.
Adding the length of the string to the address of the buffer gives access to the byte behind the string.
Based on you saying that
you want to see if the last value in the string is an EOL character
you aim to decrease 'rlen' if it is an EOL (*)
I conclude that you consider the possible EOL character part of the string as defined by its length rlen. If you don't then (*) doesn't make sense.
Use mov al,[esi-1] to see what the last element is!
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi-1] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx
This is a much more roundabout way (literally) of getting to the end of the string. I loop through all the characters in the string based on what the size of the counter, rlen, is. Then, once the loop is complete, I make the comparison and decrement rlen as necessary.
test_loop:
mov al, [esi] ; get a character
inc esi ; update source pointer
dec ecx ; update char count
jnz test_loop ; loop to top if more chars
cmp al, 10 ; comparison
jne L1_init ; if not EOL jump to L1_init
mov ecx, [rlen] ; decrease the size of rlen if necessary
dec ecx
mov [rlen], ecx

Resources