Comparing character with Intel x86_64 assembly - linux

I'm new to assembly (Intel x86_64) and I am trying to recode some functions from the C library. I am on a 64-bit Linux and compiling with NASM.
I have an error with the strchr function and I can't find a solution...
As a reminder here is an extract from the man page of strchr :
char *strchr(const char *s, int c);
The strchr() function returns a pointer to the first occurrence of the character c in the string s.
Here is what I tried :
strchr:
push rpb
mov rbp, rsp
mov r12, rdi ; get first argument
mov r13, rsi ; get second argument
call strchr_loop
strchr_loop:
cmp [r12], r13 ; **DON'T WORK !** check if current character is equal to character given in parameter...
je strchr_end ; go to end
cmp [r12], 0h ; test end of string
je strchr_end ; go to end
inc r12 ; move to next character of the string
jmp strchr_loop ; loop
strchr_end
mov rax, r12 ; set function return
mov rsp, rbp
pop rbp
This return a pointer on the ned of the string and don't find the character...
I think it's this line which doesn't work :
cmp [r12], r13
I tested with this and it worked :
cmp [r12], 97 ; 97 = 'a' in ASCII
The example :
char *s;
s = strchr("blah", 'a');
printf("%s\n", s);
Returned :
ah
But I can't make it work with a register comparison. What am I doing wrong, and how can I fix it?

First, thanks for your help ! I think I have a better understanding of what I am doing.
I was stuck with the problem of receiving a 8bits parameter instead of the 64bits rdi... But a friend shows me that the first 8bit parameter is also in the sil register.
So here's my working code :
strchr:
push rpb
mov rbp, rsp
call strchr_loop
strchr_loop:
cmp byte [rdi], sil ; check if current character is equal to character given in parameter
je strchr_end ; go to end
cmp byte [rdi], 0h ; test end of string
je strchr_end ; go to end
inc rdi ; move to next character of the string
jmp strchr_loop ; loop
strchr_end
mov rax, rdi ; set function return
mov rsp, rbp
pop rbp
Please feel free to tell me if there is a way to improve it and thanks again !

Here is a fix for your assembly code, which implements the strchr(3), in x86-64 Assembly as defined in the man pages:
asm_strchr:
push rbp
mov rbp, rsp
strchr_loop:
cmp byte [rdi], 0 ; test end of string
je fail_end ; go to end
cmp byte [rdi], sil ; check if current character is equal to character given in parameter
je strchr_end ; go to end
inc rdi ; move to next character of the string
jmp strchr_loop ; loop
strchr_end:
mov rax, rdi ; set function return
mov rsp, rbp
pop rbp
ret
fail_end:
mov rax, 0
mov rsp, rbp
pop rbp
ret

Related

String reverse function x86 NASM assembly

I'm trying to write a function that reverses order of characters in a string using x86 NASM assembly language. I tried doing it using registers (I know it's more efficient to do it using stack) but I keep getting a segmentation fault, the c declaration looks as following
extern char* reverse(char*);
The assembly segment:
section .text
global reverse
reverse:
push ebp ; prologue
mov ebp, esp
mov eax, [ebp+8] ; eax <- points to string
mov edx, eax
look_for_last:
mov ch, [edx] ; put char from edx in ch
inc edx
test ch, ch
jnz look_for_last ; if char != 0 loop
sub edx, 2 ; found last
swap: ; eax = first, edx = last (characters in string)
test eax, edx
jg end ; if eax > edx reverse is done
mov cl, [eax] ; put char from eax in cl
mov ch, [edx] ; put char from edx in ch
mov [edx], cl ; put cl in edx
mov [eax], ch ; put ch in eax
inc eax
dec edx
jmp swap
end:
mov eax, [ebp+8] ; move char pointer to eax (func return)
pop ebp ; epilogue
ret
It seems like the line causing the segmentation fault is
mov cl, [eax]
Why is that happening? In my understanding eax never goes beyond the bounds of the string so there always is something in [eax]. How come I get a segmentation fault?
Ok I figured it out, I mistakenly used test eax, edx instead of which I should have used cmp eax, edx. It works now.

Most effective code to search character in string

Let's say we have a given string of chars
DataString DB 'AGIJKSZ', 0FFH ;
What would be the most time-effective procedure to find let's say J in it?
By time-effective I mean least amount of clock ticks.
It's a x86 processor with these instruction sets:
MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, FMA3, TSX
Let's assume that both string and searched character can be changed but only via editing the code, and we're always looking for a single character. The string is ASCII. End of string is marked by FF
Answer should be just setting EAX to found 1 / not found 0.
This is what I can think of
FindChar_1 PROC
MOV ESI, OFFSET DataString ;
SI
MOV AH, 'J' ;
Check_End:
CMP BYTE PTR [ESI], 0FFH ;
JE Not_Find ;
CMP AH, [ESI] ;
'DataString'
JE Got_Equal ;
ADD ESI, 1 ;
JMP Check_End ;
Got_Equal:
MOV DL, [ESI] ;
JMP Done
Not_Find:
MOV EAX,0 ;
RET ;
Done:
MOV EAX,1 ;
RET ;
FindChar_1 ENDP
EDIT:
Now I realize that there I something else I should have mentioned. I'm using masm32 so instructions I can use are limited to the very basic ones.
When you need the code be fast, avoid memory access, jumps and complex instructions. I would scan the string twice: once to find the end marker, and then to find the searched character:
FindChar_2 PROC
MOV ESI, OFFSET DataString
XOR EAX,EAX
XOR ECX,ECX
XOR EDX,EDX
NOT EAX ; Let AL=EndOfString marker.
NOT ECX ; Let ECX=Max.integer.
MOV EDI,ESI
CLD
REPNE SCASB ; ECX -= String size (-9 in this example).
NOT ECX ; ECX= String size (8).
MOV AL,'J' ; The searched needle.
MOV EDI,ESI ; Restore the haystack pointer.
REPNE SCASB ; The actual search.
; It returns CF if not found, because the needle is below 0FFH.
CMC ; Invert the logic.
MOV EAX,EDX ; Return false.
ADC EAX,EDX ; Return true if found.
RET
FindChar_2 ENDP

Disregard a strings' space characters in ASM

Trying to find the number of characters in a string and disregard all the " " space characters
I have a C++ portion that passes the strings to asm and here is my asm
works fine, only thing is that the space characters are being counted as well.
stringLength PROC PUBLIC
PUSH ebp ; save caller base pointer
MOV ebp, esp ; set our base pointer
SUB esp, (1 * 4) ; allocate uint32_t local vars
PUSH edi
PUSH esi
; end prologue
MOV esi, [ebp+8] ;gets the string
xor ebx, ebx
COMPARE:
MOV al, [esi + ebx]
CMP al, 0 ;compare character of string with 0
JE FINALE ;if = to 0 go to end
INC ebx ;counter
CMP al, ' ' ;compare with sapce
JE SPACE ;go get rid of the space and keep going
INC al ;otherwise inc al to next character and repeat
JMP COMPARE
SPACE:
DEC ebx ;get rid of the extra space
INC al
JMP COMPARE ;goes back to compare
FINALE:
MOV eax,ebx ; bring back the counter
ADD esp, (2 * 4) ; clear the stack
POP esi
POP edi
MOV esp, ebp ; deallocate locals
POP ebp ; restore caller base pointer
RET
stringLength ENDP ; end the procedure
END stringLength
You are doing a lot of useless stuff and not doing anything to count ignoring the spaces.
You don't really need to setup a new stack frame, for such a simple routine you can do everything in clobbered registers, or at most save a few registers on the stack;
That inc al is pointless - you are incrementing the character value, just to discard it at the next loop iteration.
push fmt and then you clean the stack immediately? What sense does it make?
mov ebx, 0- nobody does that, the idiomatic way to zero a register is xor ebx,ebx (the instruction encoding is more compact);
cmp al, 0 given that you are only interested in equality, you can just do test al, al (more compact);
you read [ebp+12] but never actually use it - is that supposed to be an unused parameter?
As for the algorithm itself, you'll just have to keep a separate counter to count non-space characters; actually, you can just keep ebx for that, and increment directly esi to iterate over characters. For example:
xor ebx, ebx
COMPARE:
mov al, [esi]
cmp al, ' '
jne nonspace
inc ebx
nonspace:
test al, al
jz FINALE
inc esi
jmp COMPARE
FINALE:
Now, this can be streamlined further exploiting the fact that the eax is going to be the return value, and that you can clobber freely ecx and edx, so:
stringLength PROC PUBLIC
mov ecx,[esp+4] ; get the string
xor eax,eax ; zero the counter
compare:
mov dl,[ecx] ; read character
cmp dl,' '
jne nospace
inc eax ; increase counter if it's a space
nospace:
test dl,dl
jz end ; go to end if we reached the NUL
inc ecx ; next character
jmp compare
end:
ret ; straight return, nothing else to do
stringLength ENDP ; end the procedure
edit: about the updated version
COMPARE:
MOV al, [esi + ebx]
CMP al, 0
JE FINALE
INC ebx
CMP al, " " ; I don't know what assembler you are using,
; but typically character values are in single quotes
JE SPACE
INC al ; this makes no sense! you are incrementing
; the _character value_, not the position!
; it's going to be overwritten at the next iteration
JMP COMPARE
SPACE:
INC eax ; you cannot use eax as a counter, you are already
; using it (al) as temporary store for the current
; character!
JMP COMPARE
I think we need to use whole the eax register to compare values. In such manner:
; inlet data:
; eax - pointer to first byte of string
; edx - count of bytes in string
; ecx - result (number of non-space chars)
push esi
mov ecx, 0
mov esi, eax
##compare: cmp edx, 4
jl ##finalpass
mov eax, [esi]
xor eax, 20202020h ; 20h - space char
cmp al, 0
jz ##nextbyte0
inc ecx
##nextbyte0: cmp ah, 0
jz ##nextbyte1
inc ecx
##nextbyte1: shr eax, 16
cmp al, 0
jz ##nextbyte2
inc ecx
##nextbyte2: cmp ah, 0
jz ##nextbyte3
inc ecx
##nextbyte3: add esi, 4
sub edx, 4
jmp ##compare
##finalpass: and edx, edx
jz ##fine
mov al, [esi]
cmp al, 20h
jz ##nextbyte4
inc ecx
##nextbyte4: inc esi
dec edx
jmp ##finalpass
##fine: pop esi
; save the result data and restore stack

replace a string in assembly

I wanna get a source string ,find a key in it and replace the key with a replace string so i copy the rest of source and the replace string in the result .
it outputs the correct prompt when the key doesnt exist in the source string : "The key does not appear in the string."
but when the source contains the key it stucks and doesnt continue running
(it looks sth in found label part have been missed and have an overflow)
can anyone help to correct the found part ?
any help will be appreciate :)
; program to search for one string embedded in another
; author: R. Detmer revised: 10/97
.386
.MODEL FLAT
ExitProcess PROTO NEAR32 stdcall, dwExitCode:DWORD
INCLUDE io.h
cr EQU 0dh ; carriage return character
Lf EQU 0ah ; linefeed character
.STACK 4096 ; reserve 4096-byte stack
.DATA
prompt1 BYTE "String to search? ", 0
prompt2 BYTE cr, Lf, "Key to search for? ", 0
prompt3 BYTE cr, Lf, "Word to replace? ", 0
source BYTE 100 DUP (?)
key BYTE 20 DUP (?)
replace BYTE 20 DUP (?)
srcLength DWORD ?
keyLength DWORD ?
repLength DWORD ?
BeginLength DWORD ?
restLength DWORD ?
cpyLength DWORD ?
lastPosn DWORD ?
restPosition DWORD ?
firstParam DWORD ?
secondParam DWORD ?
keyPosition DWORD ?
failure BYTE cr,Lf,Lf,"The key does not appear in the string.",cr,Lf,0
success BYTE cr,Lf,Lf, " The result string is : " ,cr,Lf,Lf
result BYTE 200 DUP (?)
PUBLIC _start ; make entry point public
.CODE
_start: output prompt1 ; ask for
input source,100 ; and input source string
lea eax, source ; find length of string
push eax ; length parameter
call strlen
mov srcLength,eax ; save length of source
output prompt2 ; ask for
input key,20 ; and input key string
lea eax, key ; find length of string
push eax ; length parameter
call strlen
mov keyLength,eax ; save length of key
output prompt3 ; ask for
input replace,20 ; and input replace string
lea eax, replace ; find length of string
push eax ; length parameter
call strlen
dec eax
mov repLength,eax ; save length of replace
; calculate last position of source to check
mov eax,srcLength
sub eax,keyLength
inc eax ; srcLength − keyLength + 1
mov lastPosn, eax
cld ; left to right comparison
mov eax,1 ; starting position
whilePosn: cmp eax,lastPosn ; position <= last_posn?
jnle endWhilePosn ; exit if past last position
lea esi,source ; address of source string
add esi,eax ; add position
dec esi ; address of position to check is incremented automatically
lea edi,key ; address of key
mov ecx,keyLength ; number of positions to check
repe cmpsb ; check
jz found ; exit on success
inc eax ; increment position
jmp whilePosn ; repeat
endWhilePosn:
output failure ; the search failed
jmp quit ; exit
;-------------------------------------------------------------
found:
mov keyPosition, eax ; position of key
mov ebx, eax ;copy start position of key
lea eax, source
sub ebx, eax ;position - source address
mov BeginLength, ebx ;begin Source length (before key)
add ebx, keyLength
mov eax, srcLength
sub eax, ebx
mov restLength, eax ;rest of Source length (after key)
mov eax, keyPosition
add eax, keyLength ; position + key
mov restPosition, eax
;source begin to result
lea eax, result
mov firstParam, eax ; destination address
lea eax, source
mov secondParam, eax
mov eax, BeginLength ; copy length
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
;replace to result
mov eax, firstParam
add eax , BeginLength
mov firstParam, eax ; address of rest of result
lea eax, replace
mov secondParam, eax ; string to replace
mov eax, repLength ; copy length
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
;Rest to result
mov eax, firstParam
add eax , repLength
mov firstParam, eax ; address of rest of result
mov eax, restPosition
mov secondParam, eax
mov eax, restLength
mov cpyLength, eax
mov esi,firstParam ;initial source address
mov edi,secondParam ;destination
mov ecx ,cpyLength
rep movsb ;copy bytes
mov BYTE PTR [edi],0 ;terminate destination string
output success
quit:
INVOKE ExitProcess, 0 ; exit with return code 0
;----------------------------------------------------------
strlen PROC NEAR32
; find length of string whose address is passed on stack
; length returned in EAX
push ebp ; establish stack frame
mov ebp, esp
pushf ; save flags
push ebx ; and EBX
sub eax, eax ; length := 0
mov ebx, [ebp+8] ; address of string
whileChar: cmp BYTE PTR [ebx], 0 ; null byte?
je endWhileChar ; exit if so
inc eax ; increment length
inc ebx ; point at next character
jmp whileChar ; repeat
endWhileChar:
pop ebx ; restore registers and flags
popf
pop ebp
ret 4 ; return, discarding parameter
strlen ENDP
END
found:
mov keyPosition, eax ; position of key
mov ebx, eax ;copy start position of key
lea eax, source
sub ebx, eax ;position - source address
mov BeginLength, ebx ;begin Source length (before key)
In these lines you have subtracted things that cannot be subtracted.
When you get at the label found, EAX has a 1-based relative position index that you copy to the EBX register. This value ranges from 1 to 100. Now you subtract the absolute address of your source buffer. This could be in the millions. That's clearly a mistake. It becomes disastrous when later on you use it as a loop counter and start corrupting memory!
success BYTE cr,Lf,Lf, " The result string is : " ,cr,Lf,Lf
You forgot to zero-terminate the success message.
It will disrupt your final macro call output success and so it would seem that the program didn't correctly replace the string.

x86-64 Bit Assembly Linux Input

I'm trying to input into my program... All it does is run through and print a '0' to the screen. I'm pretty sure that the PRINTDECI function works, I made it a while ago and it works. Do I just have to loop over the input code and only exit when I enter a certain value? I'm not sure how I would do that... Unless it's by ACSII values which might suck.... Anyways, here's my code (Yasm(nasm clone), Intel Syntax):
GLOBAL _start
SECTION .text
PRINTDECI:
LEA R9,[NUMBER + 18] ; last character of buffer
MOV R10,R9 ; copy the last character address
MOV RBX,10 ; base10 divisor
DIV_BY_10:
XOR RDX,RDX ; zero rdx for div
DIV RBX ; rax:rdx = rax / rbx
ADD RDX,0x30 ; convert binary digit to ascii
TEST RAX,RAX ; if rax == 0 exit DIV_BY_10
JZ CHECK_BUFFER
MOV byte [R9],DL ; save remainder
SUB R9,1 ; decrement the buffer address
JMP DIV_BY_10
CHECK_BUFFER:
MOV byte [R9],DL
SUB R9,1
CMP R9,R10 ; if the buffer has data print it
JNE PRINT_BUFFER
MOV byte [R9],'0' ; place the default zero into the empty buffer
SUB R9,1
PRINT_BUFFER:
ADD R9,1 ; address of last digit saved to buffer
SUB R10,R9 ; end address minus start address
ADD R10,1 ; R10 = length of number
MOV RAX,1 ; NR_write
MOV RDI,1 ; stdout
MOV RSI,R9 ; number buffer address
MOV RDX,R10 ; string length
SYSCALL
RET
_start:
MOV RCX, SCORE ;Input into Score
MOV RDX, SCORELEN
MOV RAX, 3
MOV RBX, 0
SYSCALL
MOV RAX, [SCORE]
PUSH RAX ;Print Score
CALL PRINTDECI
POP RAX
MOV RAX,60 ;Kill the Code
MOV RDI,0
SYSCALL
SECTION .bss
SCORE: RESQ 1
SCORELEN EQU $-SCORE
Thanks for any help!
- Kyle
As a side note, the pointer in RCX goes to a insanely large number according to DDD... So I'm thinking I have to get it to pause and wait for me to type, but I have no idea how to do that...
The 'setup' to call syscall 0 (READ) on x86_64 system is:
#xenon:~$ syscalls_lookup read
read:
rax = 0 (0x0)
rdi = unsigned int fd
rsi = char *buf
rdx = size_t count
So your _start code should be something like:
_start:
mov rax, 0 ; READ
mov rdi, 0 ; stdin
mov rsi, SCORE ; buffer
mov rdx, SCORELEN ; length
syscall
The register conventions and syscall numbers for x86_64 are COMPLETELY different than those for i386.
Some conceptual issues you seem to have:
READ does not do ANY interpretation on what you type, you seem to be expecting it to let you type a number (say, 57) and have it return the value 57. Nope. It'll return '5', '7', 'ENTER', 'GARBAGE'... Your SCORELEN is probably 8 (length of resq 1), so you'll read, AT MOST, 8 bytes. or Characters, if you wish to call them that. And unless you type the EOF char (^D), you'll need to type those 8 characters before the READ call will return to your code.
You have to convert the characters you receive into a value... You can do it the easy way and link with ATOI() in the C library, or write your own parser to convert the characters into a value by addition and multiplication (it's not hard, see code below).
Used below, here as a reference:
#xenon:~$ syscalls_lookup write
write:
rax = 1 (0x1)
rdi = unsigned int fd
rsi = const char *buf
rdx = size_t count
Ugh.... So many... I'll just rewrite bits:
global _start
section .text
PRINTDECI:
; input is in RAX
lea r9, [NUMBER + NUMBERLEN - 1 ] ; + space for \n
mov r10, r9 ; save end position for later
mov [r9], '\n' ; store \n at end
dec r9
mov rbx, 10 ; base10 divisor
DIV_BY_10:
xor rdx, rdx ; zero rdx for div
div rbx : rax = rdx:rax / rbx, rdx = remainder
or dl, 0x30 ; make REMAINDER a digit
mov [r9], dl
dec r9
or rax, rax
jnz DIV_BY_10
PRINT_BUFFER:
sub r10, r9 ; get length (r10 - r9)
inc r9 ; make r9 point to initial character
mov rax, 1 ; WRITE (1)
mov rdi, 1 ; stdout
mov rsi, r9 ; first character in buffer
mov rdx, r10 ; length
syscall
ret
MAKEVALUE:
; RAX points to buffer
mov r9, rax ; save pointer
xor rcx, rcx ; zero value storage
MAKELOOP:
mov al, [r9] ; get a character
or al, al ; set flags
jz MAKEDONE ; zero byte? we're done!
and rax, 0x0f ; strip off high nybble and zero rest of RAX (we're lazy!)
add rcx, rcx ; value = value * 2
mov rdx, rcx ; save it
add rcx, rcx ; value = value * 4
add rcx, rcx ; value = value * 8
add rcx, rdx ; value = value * 8 + value * 2 (== value * 10)
add rcx, rax ; add new digit
jmp MAKELOOP ; do it again
MAKEDONE:
mov rax, rcx ; put value in RAX to return
ret
_start:
mov rax, 0 ; READ (0)
mov rdi, 0 ; stdin
mov rsi, SCORE ; buffer
mov rdx, SCORELEN ; length
syscall
; RAX contains HOW MANY CHARS we read!
; -OR-, -1 to indicate error, really
; should check for that, but that's for
; you to do later... right? (if RAX==-1,
; you'll get a segfault, just so you know!)
add rax, SCORE ; get position of last byte
movb [rax], 0 ; force a terminator at end
mov rax, SCORE ; point to beginning of buffer
call MAKEVALUE ; convert from ASCII to a value
; RAX now should have the VALUE of the string of characters
; we input above. (well, hopefully, right?)
mov [VALUE], rax ; store it, because we can!
; it's stored... pretend it's later... we need value of VALUE!
mov rax, [VALUE] ; get the VALUE
call PRINTDECI ; convert and display value
; all done!
mov rax, 60 ; EXIT (60/0x3C)
mov rdi, 0 ; exit code = 0
syscall
section .bss
SCORE: resb 11 ; 10 chars + zero terminator
SCORELEN equ $-SCORE
NUMBER: resb 19 ; 18 chars + CR terminator
NUMBERLEN equ $-NUMBER
I'm going to say that this should work first time, it's off-the-cuff for me, haven't tested it, but it should be good. We read up to 10 chars, terminate it with a zero, convert to a value, then convert to ascii and write it out.
To be more proper, you should save registers to the stack in each subroutine, well, certain ones, and really, only if you're going to interface with libraries... doing things yourself lets you have all the freedom you want to play with the registers, you just have to remember what you put where!
Yes, someone is going to say "why didn't you just multiply by 10 instead of weird adding?" ... uh... because it's easier on the registers and I don't have to set it all up in rdx:rax. Besides, it's just as readable and understandable, especially with the comments. Roll with it! This isn't a competition, it's learning!
Machine code is fun! Gotta juggle all the eggs in your head though... no help from the compiler here!
Technically, you should check return result (RAX) of the syscalls for READ and WRITE, handle errors appropriately, yadda yadda yadda.... learn to use your debugger (gdb or whatever).
Hope this helps.

Resources