Like strlen() from glibc that performs a nice bit manipulation and 4 bytes checked per time making the function so fast, compared to a byte-by-byte routine as most all others do, is there something like this to compare two strings in assembly? I'm reading some pages on code implementation for C language, very interested in strings-handling part, but I still not found none like this.
I have to make this function as fast possible because it's the heart of my application.(don't recommend hash table)
Any assembler is welcome. But I'm a bit familiar with intel's assembly syntax, if assembly that you'll go to provide is different, please comment it.
You can compare word by word (eg. 32-bits or 64-bits at a time). You just need to be careful not to go past the end of the string. If you are making the strings, then you could pad them with zeroes so they are a multiple of the word size, then you don't even need to check.
Assuming zero terminated strings (although the same applies for memcmp()); the fastest way to do string comparisons in assembly depends on the length/s of the strings, and the specific CPU.
In general; SSE or AVX has a high setup cost but gives faster throughput once it's running, which makes it the best choice if you're comparing very long strings (especially if most of the characters match).
Alternatively, something that does one byte at a time using general purpose registers will typically have a very low setup cost and lower throughput, which makes it the best choice if you're comparing lots of small strings (or even lots of large strings where the first few characters are likely to be different).
If you're doing this for a specific application, then you can determine the average number of characters compared and find the best approach for that average. You can also have different functions for different cases - e.g. implement a strcmp_small() and a strcmp_large() if there's a mixture.
Despite all this, if the performance of string comparisons matters a lot, then it's very likely that the fastest way to compare strings is not comparing strings at all. Basically, the words "I have to make this function as fast possible because it's the heart of my application" should make everyone wonder why better ways of implementing the application aren't possible.
StrCompCase (case sensitive) as implemented in FreshLib/strlib.asm library.
Here is some code that uses dword comparison:
Note, that is first checks the length of the strings. That is because in the mentioned library the strings are length-prefixed, so StrLen is instant O(1) and scanning for termination NULL is provided only as a fall back (see the Second part of this answer).
Comparing lengths before actual comparison allows to make the speed O(1) for different strings, which in case of searching big arrays may significantly improve the performance.
Then the comparison goes on dwords, and, at the end, if the string length is not multiply of 4, the remaining 1..3 bytes are compared byte by byte.
proc StrCompCase, .str1, .str2
begin
push eax ecx esi edi
mov eax, [.str1]
mov ecx, [.str2]
cmp eax, ecx
je .equal
test eax, eax
jz .noteq
test ecx, ecx
jz .noteq
stdcall StrLen, eax
push eax
stdcall StrLen, ecx
pop ecx
cmp eax, ecx
jne .noteq
stdcall StrPtr, [.str1]
mov esi,eax
stdcall StrPtr, [.str2]
mov edi,eax
mov eax, ecx
shr ecx, 2
repe cmpsd
jne .noteq
mov ecx, eax
and ecx, 3
repe cmpsb
jne .noteq
.equal:
stc
pop edi esi ecx eax
return
.noteq:
clc
pop edi esi ecx eax
return
endp
About StrLen code:
Here is the implementation of StrLen.
You can see, that if possible, it uses length prefixed strings, this way making the execution time O(1). If this is not possible it falls back into scanning algorithm that checks 8 bytes per cycle and it pretty fast, but still O(n).
proc StrLen, .hString ; proc StrLen [hString]
begin
mov eax, [.hString]
cmp eax, $c0000000
jb .pointer
stdcall StrPtr, eax
jc .error
mov eax, [eax+string.len]
clc
return
.error:
xor eax, eax
stc
return
.pointer:
push ecx edx esi edi
; align on dword
.byte1:
test eax, 3
jz .scan
cmp byte [eax], 0
je .found
inc eax
jmp .byte1
.scan:
mov ecx, [eax]
mov edx, [eax+4]
lea eax, [eax+8]
lea esi, [ecx-$01010101]
lea edi, [edx-$01010101]
not ecx
not edx
and esi, ecx
and edi, edx
and esi, $80808080
and edi, $80808080
or esi, edi
jz .scan
sub eax, 9
; byte 0 was found: so search by bytes.
.byteloop:
lea eax, [eax+1]
cmp byte [eax], 0
jne .byteloop
.found:
sub eax, [.hString]
clc
pop edi esi edx ecx
return
endp
Note that the zero terminated strings have both performance and security issues.
It is better to use size prefixed strings. For example, the mentioned library uses dynamic strings, where the string contains dword field at offset -4 (string.len in the above code) that contains the current length of the string.
First rule for faster then byte-per-byte comparison is to malloc the strings or .align 16 any constant strings to ensure
robustness against security violations (reading past allocated area)
best alignment for xxm (or 64-bit) processing
Related
I'm trying to write a function in x86 NASM assembly which reverses order of characters in a string passed as argument. I tried implementing it using stack but ended up getting error message
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
Code below:
section .text
global reverse
reverse:
push ebp ; epilogue
mov ebp, esp
mov eax, [ebp+8]
xor ecx, ecx ; ecx = 0
push ebx ; saved register
push_eax:
mov edx, [eax] ; edx = char at eax
test edx, edx
jz inc_eax ; if edx == 0, move eax pointer back and pop eax
push edx
inc eax
inc ecx ; counter + 1
jmp push_eax
inc_eax:
sub eax, ecx ; move eax back to beginning of string
mov ebx, ecx ; to move eax back at the end of function
pop_eax:
test ecx, ecx ; loop counter == 0
jz end
pop edx
mov [eax], edx ; char at eax = edx
inc eax ; eax++
dec ecx ; ecx--
jmp pop_eax
end:
sub eax, ebx
pop ebx ; saved register
mov esp, ebp
pop ebp
ret
C declaration:
extern char* reverse(char*);
I've read somewhere that you get this error when trying to for instance write something in an array that is longer than allocated but i don't see how would that function do it? Also when instead of using ebx at the end I manually move the pointer in eax back (string in C of length 9 -> sub eax, 9) I get the reversed string at the output followed by 2nd, 3rd and 4th char. (No matter the length of the string I declare in C). So for instanceinput: "123456789"
output: "987654321234" but that only happens when I move eax manually, using ebx like in the code above outputs some trash.
Peter's answer is the answer you are looking for. However, may I comment on the technique? Must you use the stack? Do you already know the length of the string, or must you calculate/find that yourself?
For example, if you already know the length of the string, can you place a pointer at the first and another at the end and simply exchange the characters, moving each pointer toward the center until they meet? This has the advantage of not assuming there is enough room on the stack for the string. In fact, you don't even touch the stack except for the prologue and epilogue. (Please note you comment that the epilogue is at the top, when it is an 'ending' term.)
If you do not know the length of the string, to use the above technique, you must find the null char first. By doing this, you have touched each character in the string already, before you even start. Advantage, it is now loaded in to the cache. Disadvantage, you must touch each character again, in essence, reading the string twice. However, since you are using assembly, a repeated scasb instruction is fairly fast and has the added advantage of auto-magically placing a pointer near the end of the string for you.
I am not expecting an answer by asking these questions. I am simply suggesting a different technique based on certain criteria of the task. When I read the question, the following instantly came to mind:
p[i] <-> p[n-1]
i++, n--
loop until n <= i
Please note that you will want to check that 'n' is actually greater than 'i' before you make the first move. i.e.: it isn't a zero length string.
If this is a string of 1-byte characters, you want movzx edx, byte [eax] byte loads and mov [eax], dl byte stores.
You're doing 4-byte stores, which potentially steps on bytes past the end of the array. You also probably overread until you find a whole dword on the stack that's all zero. test edx, edx is fine if you correctly zero-extended a byte into EDX, but loading a whole word probably resulted in overread.
Use a debugger to see what you're doing to memory around the input arg.
(i.e. make sure you aren't writing past the end of the array, which is probably what happened here, stepping on the buffer-overflow detection cookie.)
I'm trying to convert a user inputted string of numbers to an integer.
For example, user enters "1234" as a string I want 1234 stored in a DWORD variable.
I'm using lodsb and stosb to get the individual bytes. My problem is I can't get the algorithm right for it. My code is below:
mov ecx, (SIZEOF num)-1
mov esi, OFFSET num
mov edi, OFFSET ints
cld
counter:
lodsb
sub al,48
stosb
loop counter
I know that the ECX counter is going to be a bit off also because it's reading the entire string not just the 4 bytes, so it's actually 9 because the string is 10 bytes.
I was trying to use powers of 10 to multiply the individual bytes but I'm pretty new to Assembly and can't get the right syntax for it. If anybody can help with the algorithm that would be great. Thanks!
A simple implementation might be
mov ecx, digitCount
mov esi, numStrAddress
cld ; We want to move upward in mem
xor edx, edx ; edx = 0 (We want to have our result here)
xor eax, eax ; eax = 0 (We need that later)
counter:
imul edx, 10 ; Multiply prev digits by 10
lodsb ; Load next char to al
sub al,48 ; Convert to number
add edx, eax ; Add new number
; Here we used that the upper bytes of eax are zeroed
loop counter ; Move to next digit
; edx now contains the result
mov [resultIntAddress], edx
Of course there are ways to improve it, like avoiding the use of imul.
EDIT: Fixed the ecx value
I am trying to understand how to use pointer in assembly. By reading some tutorials around internel,I think had undertantood some concepts. But when I'II go to try it,it did work. Below some attempts to translate C to ASM.
C
const char *s = "foo";
unsigned z = *(unsigned*)s;
if(!(z & 0xFF))
do_something();
if(!(z & 0xFFFF))
do_b_something();
(here's not full code,but it's a word-check,thefore,there is more two stmts which checks 0xFF0000,0xF000000 respectivily.
ASM:
mov ebp,str
mov eax,ebp
mov eax,[eax]
and eax,0xFF
cmp eax,0
je etc
mov eax,[eax]
and eax,0xFFFF
cmp eax,0
je etc
It returns a seg fault.
And the try:
mov eax,dword ptr [eax]
that's generated by gcc compiler and you can see it in some other assemblies code,returns
invalid symbol
on FASM assembler. It isn't really supported by the FASM or am I missing something?
I think this is what you are attempting to do:
mov ebp,str
mov eax,ebp
mov ebx,[eax]
test ebx,0xFF
jz low_byte_empty
do_something:
; some code here...
low_byte_empty:
test ebx,0xFFFF
jz low_word_empty
do_b_something:
; some code here.
low_word_empty:
Explanation:
First, as JasonD already mentions in his answer, you are loading a pointer to eax, then doing a logical and to it, then you are using the result still in eax to address memory (some memory offset in the range 0x0 ... 0xFF).
So what goes wrong in your code: you can't keep in the same register both a pointer to a memory address and a value at the same time. So I chose to load the value from [eax] to ebx, you can also use some other 32-bit general register (ecx, edx, esi, edi) according to your needs.
Then, you don't need to use cmp to check if a register is empty, because all cmp does is that it does the subtraction and sets the flags. But ZF (zero flag) is already set by and, so cmp is absolutely unnecessary here. Then, as cmp is not needed here and we do not need the result either, we only want to update the flags, it's better to use test. test does exactly the same logical AND as and does, the only difference being that test does not store the result, it only updates the flags.
It's not at all clear what you're trying to do in the original code - doesn't look right.
However this:
mov eax,[eax]
and eax,0xFF
cmp eax,0
je etc
mov eax,[eax]
Isn't going to work. You're overwriting the contents of EAX with the value stored at the address in EAX, manipulating that value, and then trying to reload it after the branch without restoring the original pointer.
Following variant is simpler, smaller, faster and uses only one register.
mov eax, str
mov eax,[eax]
test al, al
jz low_byte_empty
do_something_byte:
; some code here...
low_byte_empty:
test ah, ah
jz low_word_empty
do_something_word:
; some code here
low_word_empty:
I am trying to write a program that will allow me to print multiple characters (strings of characters or integers). The problem that I am having is that my code only prints one of the characters, and then newlines and stays in an infinite loop. Here is my code:
SECTION .data
len EQU 32
SECTION .bss
num resb len
output resb len
SECTION .text
GLOBAL _start
_start:
Read:
mov eax, 3
mov ebx, 1
mov ecx, num
mov edx, len
int 80h
Point:
mov ecx, num
Print:
mov al, [ecx]
inc ecx
mov [output], al
mov eax, 4
mov ebx, 1
mov ecx, output
mov edx, len
int 80h
cmp al, 0
jz Exit
Clear:
mov eax, 0
mov [output], eax
jmp Print
Exit:
mov eax, 1
mov ebx, 0
int 80h
Could someone point out what I am doing wrong?
Thanks,
Rileyh
In the first time you enter the Print section, ecx is pointing to the start of the string and you use it to copy a single character to the start of the output string. But a few more instructions down, you overwrite ecx with the pointer to the output string, and never restore it, therefore you never manage to copy and print the rest of the string.
Also, why are you calling write() with a single character string with the aim to loop over it to print the entire string? Why not just pass num directly in instead of copying a single character to output and passing that?
In your last question, you showed message as a zero-terminated string, so cmp al, 0 would indicate the end of the string. sys_read does NOT create a zero-terminated string! (we can stuff a zero in there if we need it - e.g. as a filename for sys_open) sys_read will read a maximum of edx characters. sys_read from stdin returns when, and only when, the "enter" key is hit. If fewer than edx characters were entered, the string is terminated with a linefeed character (10 decimal or 0xA or 0Ah hex) - you could look for that... But, if the pesky user types more than edx characters, only edx characters go into your buffer, the "excess" remains in the OS's buffer (and can cause trouble later!). In this case your string is NOT terminated with a linefeed, so looking for it will fail. sys_read returns the number of characters actually read - up to edx - including the linefeed - in eax. If you don't want to include the linefeed in the length, you can decrement eax.
As an experiment, do a sys_read with some small number (say 4) in edx, then exit the program. Type "abcdls"(enter) and watch the "ls" be executed. If some joker typed "abcdrm -rf ."... well, don't!!!
Safest thing is to flush the OS's input buffer.
mov ecx, num
mov edx, len
mov ebx, 1
mov eax, 3
int 80h
cmp byte [ecx + eax - 1], 10 ; got linefeed?
push eax ; save read length - doesn't alter flags
je good
flush:
mov ecx, dummy_buf
mov edx, 1
mov ebx, 1
mov eax, 3
int 80h
cmp byte [ecx], 10
jne flush
good:
pop eax ; restore length from first sys_read
Instead of defining dummy_buf in .bss (or .data), we could put it on the stack - trying to keep it simple here. This is imperfect - we don't know if our string is linefeed-terminated or not, and we don't check for error (unlikely reading from stdin). You'll find you're writing much more code dealing with errors and "idiot user" input than "doing the work". Inevitable! (it's a low-level language - we've gotta tell the CPU Every Single Thing!)
sys_write doesn't know about zero-terminated strings, either! It'll print edx characters, regardless of how much garbage that might be. You want to figure out how many characters you actually want to print, and put that in edx (that's why I saved/restored the original length above).
You mention "integers" and use num as a variable name. Neither of these functions know about "numbers" except as ascii codes. You're reading and writing characters. Converting a single-digit number to and from a character is easy - add or subtract '0' (48 decimal or 30h). Multiple digits are more complicated - look around for an example, if that's what you need.
Best,
Frank
I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:
len:
push ebx
mov ecx,0
dec ebx
count:
inc ecx
inc ebx
cmp ebx,0
jnz count
dec ecx
pop ebx
ret
My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?
You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:
cmp byte[ebx], 0
Also, the last push ebx should be pop ebx.
Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.
_start:
pop rsi ; number of arguments (argc)
pop rsi ; argv[0] the command itself (or program name)
pop rsi ; rsi = argv[1], a pointer to a string
mov ecx, 0 ; counter
.repeat:
lodsb ; byte in AL
test al,al ; check if zero
jz .done ; if zero then we're done
inc ecx ; increment counter
jmp .repeat ; repeat until zero
.done:
; string is unchanged, ecx contains the length of the string
; unused, we look at command line args instead
section .rodata
asciiz: db "This is a string with 36 characters.", 0
This is slow and inefficient, but easy to understand.
For efficiency, you'd want
only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.
And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).
Here how I would have coded it
len:
push ebx
mov eax, ebx
lp:
cmp byte [eax], 0
jz lpend
inc eax
jmp lp
lpend:
sub eax, ebx
pop ebx
ret
(The result is in eax). Likely there are better ways.