Inline assembly operand size conflict - visual-c++

I am working on this for class, and as per the instructors guidelines we have to do the program using inline c++. The purpose of the program is to take a string of any length and reverse it. The error I'm getting is an operand size conflict and from what I can tell it's in the first line of the __asm block, there could be other issues but the only one that shows up in visual studio is the conflict. Here is my asm block
int _tmain(int argc, _TCHAR* argv[])
{
char string[] = "Hi There!";
__asm
{ // reverse a string of any length
lea ecx, string
lea eax, string
mov esi, eax // esi points to start of string
add eax, ecx
mov edi, eax
dec edi // edi points to end of string
shr ecx, 1 // ecx is count (length/2)
jz done // if string is 0 or 1 characters long, done
reverseLoop:
mov al, [esi] // load characters
mov bl, [edi]
mov [esi], bl // and swap
mov [edi], al
inc esi // adjust pointers
dec edi
dec ecx // and loop
jnz reverseLoop
done:
}
printf(string);
return 0;
}
I made the changes now I am getting this: Unhandled exception at 0x00e71416 in String Reverse.exe: 0xC0000005: Access violation reading location 0x0087ef6f. Based on other suggestions I have tried I have still not be able to get it to run properly. I think the issue might be in the registers I'm referencing or the add eax line, but I'm not really sure.

mov ecx, [string]
"string" is an array of char, you are trying to move 8 bits into a 32-bit register. If is was a global variable you'd use the offset keyword. But it is not, it is stored on the stack. Which requires you to use the LEA instruction (load effective address), like this:
lea ecx,string
which the compiler automatically translates into something like:
lea ecx,[ebp-20]
with the -20 adjustment depending on where it is located on the stack. The ECX register now points to the first char in the string.

Related

Reversing a string using stack in x86 NASM

I'm trying to write a function in x86 NASM assembly which reverses order of characters in a string passed as argument. I tried implementing it using stack but ended up getting error message
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)
Code below:
section .text
global reverse
reverse:
push ebp ; epilogue
mov ebp, esp
mov eax, [ebp+8]
xor ecx, ecx ; ecx = 0
push ebx ; saved register
push_eax:
mov edx, [eax] ; edx = char at eax
test edx, edx
jz inc_eax ; if edx == 0, move eax pointer back and pop eax
push edx
inc eax
inc ecx ; counter + 1
jmp push_eax
inc_eax:
sub eax, ecx ; move eax back to beginning of string
mov ebx, ecx ; to move eax back at the end of function
pop_eax:
test ecx, ecx ; loop counter == 0
jz end
pop edx
mov [eax], edx ; char at eax = edx
inc eax ; eax++
dec ecx ; ecx--
jmp pop_eax
end:
sub eax, ebx
pop ebx ; saved register
mov esp, ebp
pop ebp
ret
C declaration:
extern char* reverse(char*);
I've read somewhere that you get this error when trying to for instance write something in an array that is longer than allocated but i don't see how would that function do it? Also when instead of using ebx at the end I manually move the pointer in eax back (string in C of length 9 -> sub eax, 9) I get the reversed string at the output followed by 2nd, 3rd and 4th char. (No matter the length of the string I declare in C). So for instanceinput: "123456789"
output: "987654321234" but that only happens when I move eax manually, using ebx like in the code above outputs some trash.
Peter's answer is the answer you are looking for. However, may I comment on the technique? Must you use the stack? Do you already know the length of the string, or must you calculate/find that yourself?
For example, if you already know the length of the string, can you place a pointer at the first and another at the end and simply exchange the characters, moving each pointer toward the center until they meet? This has the advantage of not assuming there is enough room on the stack for the string. In fact, you don't even touch the stack except for the prologue and epilogue. (Please note you comment that the epilogue is at the top, when it is an 'ending' term.)
If you do not know the length of the string, to use the above technique, you must find the null char first. By doing this, you have touched each character in the string already, before you even start. Advantage, it is now loaded in to the cache. Disadvantage, you must touch each character again, in essence, reading the string twice. However, since you are using assembly, a repeated scasb instruction is fairly fast and has the added advantage of auto-magically placing a pointer near the end of the string for you.
I am not expecting an answer by asking these questions. I am simply suggesting a different technique based on certain criteria of the task. When I read the question, the following instantly came to mind:
p[i] <-> p[n-1]
i++, n--
loop until n <= i
Please note that you will want to check that 'n' is actually greater than 'i' before you make the first move. i.e.: it isn't a zero length string.
If this is a string of 1-byte characters, you want movzx edx, byte [eax] byte loads and mov [eax], dl byte stores.
You're doing 4-byte stores, which potentially steps on bytes past the end of the array. You also probably overread until you find a whole dword on the stack that's all zero. test edx, edx is fine if you correctly zero-extended a byte into EDX, but loading a whole word probably resulted in overread.
Use a debugger to see what you're doing to memory around the input arg.
(i.e. make sure you aren't writing past the end of the array, which is probably what happened here, stepping on the buffer-overflow detection cookie.)

Converting a string of numbers into an integer in Assembly x86

I'm trying to convert a user inputted string of numbers to an integer.
For example, user enters "1234" as a string I want 1234 stored in a DWORD variable.
I'm using lodsb and stosb to get the individual bytes. My problem is I can't get the algorithm right for it. My code is below:
mov ecx, (SIZEOF num)-1
mov esi, OFFSET num
mov edi, OFFSET ints
cld
counter:
lodsb
sub al,48
stosb
loop counter
I know that the ECX counter is going to be a bit off also because it's reading the entire string not just the 4 bytes, so it's actually 9 because the string is 10 bytes.
I was trying to use powers of 10 to multiply the individual bytes but I'm pretty new to Assembly and can't get the right syntax for it. If anybody can help with the algorithm that would be great. Thanks!
A simple implementation might be
mov ecx, digitCount
mov esi, numStrAddress
cld ; We want to move upward in mem
xor edx, edx ; edx = 0 (We want to have our result here)
xor eax, eax ; eax = 0 (We need that later)
counter:
imul edx, 10 ; Multiply prev digits by 10
lodsb ; Load next char to al
sub al,48 ; Convert to number
add edx, eax ; Add new number
; Here we used that the upper bytes of eax are zeroed
loop counter ; Move to next digit
; edx now contains the result
mov [resultIntAddress], edx
Of course there are ways to improve it, like avoiding the use of imul.
EDIT: Fixed the ecx value

Move to end of string within buffer - Assembly Languge

I am trying to take in a string and then see if the last value in the string is an EOL character. I figured I would use the length of the string read in and then add it to the address of the buffer to find the last element. This does not seem to work.
Edit: I apologize that I did not include more information. Variables are defined as such:
%define BUFLEN 256
SECTION .bss ; uninitialized data section
buf: resb BUFLEN ; buffer for read
newstr: resb BUFLEN ; converted string
rlen: resb 4
Then a dos interrupt is called to accept a string from the user like so:
; read user input
;
mov eax, SYSCALL_READ ; read function
mov ebx, STDIN ; Arg 1: file descriptor
mov ecx, buf ; Arg 2: address of buffer
mov edx, BUFLEN ; Arg 3: buffer length
int 080h
Then we go into our loop:
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx\
I am user NASM to compile and writing for an i386 machine.
Adding the length of the string to the address of the buffer gives access to the byte behind the string.
Based on you saying that
you want to see if the last value in the string is an EOL character
you aim to decrease 'rlen' if it is an EOL (*)
I conclude that you consider the possible EOL character part of the string as defined by its length rlen. If you don't then (*) doesn't make sense.
Use mov al,[esi-1] to see what the last element is!
test_endl:
mov ecx, [rlen]
mov esi, buf
add esi, ecx ; i want to move 'rlen' bytes into buf
mov al, [esi-1] ; to see what the last element is
cmp al, 10 ; compare it to EOL
jnz L1_init
dec ecx ; and then decrease 'rlen' if it is an EOL
mov [rlen], ecx
This is a much more roundabout way (literally) of getting to the end of the string. I loop through all the characters in the string based on what the size of the counter, rlen, is. Then, once the loop is complete, I make the comparison and decrement rlen as necessary.
test_loop:
mov al, [esi] ; get a character
inc esi ; update source pointer
dec ecx ; update char count
jnz test_loop ; loop to top if more chars
cmp al, 10 ; comparison
jne L1_init ; if not EOL jump to L1_init
mov ecx, [rlen] ; decrease the size of rlen if necessary
dec ecx
mov [rlen], ecx

How would I find the length of a string using NASM?

I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:
len:
push ebx
mov ecx,0
dec ebx
count:
inc ecx
inc ebx
cmp ebx,0
jnz count
dec ecx
pop ebx
ret
My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?
You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:
cmp byte[ebx], 0
Also, the last push ebx should be pop ebx.
Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.
_start:
pop rsi ; number of arguments (argc)
pop rsi ; argv[0] the command itself (or program name)
pop rsi ; rsi = argv[1], a pointer to a string
mov ecx, 0 ; counter
.repeat:
lodsb ; byte in AL
test al,al ; check if zero
jz .done ; if zero then we're done
inc ecx ; increment counter
jmp .repeat ; repeat until zero
.done:
; string is unchanged, ecx contains the length of the string
; unused, we look at command line args instead
section .rodata
asciiz: db "This is a string with 36 characters.", 0
This is slow and inefficient, but easy to understand.
For efficiency, you'd want
only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.
And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).
Here how I would have coded it
len:
push ebx
mov eax, ebx
lp:
cmp byte [eax], 0
jz lpend
inc eax
jmp lp
lpend:
sub eax, ebx
pop ebx
ret
(The result is in eax). Likely there are better ways.

Microsoft inline assembly and references or Why does BYTE PTR [ByteRef] not work in this situation?

Okay so I have a C++ function in which I am trying to use inline assembly
void ToggleBit(unsigned char &Byte, unsigned int Bit)
{
/* In C:
* Byte ^= (1<<Bit);
*/
__asm
{
push edx
push ecx
mov ecx, Bit
xor edx, edx
mov edx, 1
sal dl, cl
xor BYTE PTR [Byte], dl
pop ecx
pop edx
}
}
This should work, right? Since Byte is a reference (which is essentially a constant pointer), it must be dereferenced to access the data... but it didn't work!
Upon debugging the following code:
mov edx, Byte
;edx = 0x0040f9d3
mov bl, BYTE PTR [Byte]
;bl = 0xd3
I don't understand why this would happen at all.
As you say, a reference is the same as a pointer in assembly. To access the reference/pointer, you must first read the pointer value, and then dereference it:
mov ecx, Byte ; Or mov ecx, [Byte] which is the same thing
xor [ecx], dl
When you access the value at BYTE PTR [Byte], it accesses the first byte of the pointer value (the pointed-to address) instead of the pointed-to value.

Resources