Some inline assembler questions

Some inline assembler questions - visual-c++

I already asked similar question here, but I still get some errors, so I hope you could tell me what am I doing wrong. Just know that I know assembler, and I have done several projects in 8051 assembler, and even it is not the same, is close to x86 asm.
There is block of code I tried in VC++ 2010 Express (I am trying to get information from CPUID instruction): `
int main()
{
char a[17]; //containing array for the CPUID string
a[16] = '\0'; //null termination for the std::cout
void *b=&a[0];
int c=0; //predefined value which need to be loaded into eax before cpuid
_asm
{
mov eax,c;
cpuid;
mov [b],eax;
mov [b+4],ebx;
mov [b+8],ecx;
mov [b+12],edx;
}
std::cout<<a;
}`
So, to quick summarize, I tried to create void pointer to the first element of array, and than using indirect addressing just move values from registers. But this approach gives me "stack around b variable is corrupted run-time error" but I don´t know why.
Please help. Thanks. And this is just for study purposes, i know there are functions for CPUID....
EDIT: Also, how can you use direct addressing in x86 VC++ 2010 inline assembler? I mean common syntax for immediate number load in 8051 is mov src,#number but in VC++ asm its mov dest,number without # sign. So how to tell the compiler you want to access memory cell adress x directly?

The reason your stack is corrupted is because you're storing the value of eax in b. Then storing the value of ebx at the memory location b+4, etc. The inline assembler syntax [b+4] is equivalent to the C++ expression &(b+4), if b were a byte pointer.
You can see this if you watch b and single-step. As soon as you execute mov [b],eax, the value of b changes.
One way to fix the problem is to load the value of b into an index register and use indexed addressing:
mov edi,[b]
mov [edi],eax
mov [edi+4],ebx
mov [edi+8],ecx
mov [edi+12],edx
You don't really need b at all, to hold a pointer to a. You can load the index register directly with the lea (load effective address) instruction:
lea edi,a
mov [edi],eax
... etc ...
If you're fiddling with inline assembler, it's a good idea to open the Registers window in the debugger and watch how things change when you're single stepping.
You can also directly address the memory:
mov dword ptr a,eax
mov dword ptr a+4,ebx
... etc ...
However, direct addressing like that required more code bytes than the indexed addressing in the previous example.
I think the above, with the lea (Load Effective Address) instruction and the direct addressing I showed answers your final question.

The advice to open the Registers window in the debugger watch how things change is not going to work in VC++ 2010 Express.
You might be just as surprised as myself to find out that the VC++ 2010 Express is MISSING the registers window. This is especially surprising since stepping in disassembly works.
The only workaround I know is to open a watch window and type the register names in the Name field. Type in EAX EBX ECX EDX ESI EDI EIP ESP EBP EFL and CS DS ES SS FS GS if you want
ST1 ST2 ST3 ST4 ST5 ST6 ST7 also work in the watch window.
You will probably also want to set the Value to Hexadecimal by right clicking in the Watch window and checking Hexadecimal Display.

Related

Is it possible for a program to read itself?

Theoretical question. But let's say I have written an assembly program. I have "labelx:" I want the program to read at this memory address and only this size and print to stdout.
Would it be something like
jmp labelx
And then would i then use the Write syscall , making sure to read from the next instruction from labelx:
mov rsi,rip
mov rdi,0x01
mov rdx,?
mov rax,0x01
syscall
to then output to stdout.
However how would I obtain the size to read itself? Especially if there is a
label after the code i want to read or code after. Would I have to manually
count the lines?
mov rdx,rip+(bytes*lines)
And then syscall with populated registers for the syscall to write to from rsi to rdi. Being stdout.
Is this Even possible? Would i have to use the read syscall first, as the write system call requires rsi to be allocated memory buffer. However I assumed .text is already allocated memory and is read only. Would I have to allocate onto the stack or heap or a static buffer first before write, if it's even possible in the first place?
I'm using NASM syntax btw. And pretty new to assembly. And just a question.

Yes, the .text section is just bytes in memory, no different from section .rodata where you might normally put msg: db "hello", 10. x86 is a Von Neumann architecture (not Harvard), so there's no distinction between code pointers and data pointers, other than what you choose to do with them. Use objdump -drwC -Mintel on a linked executable to see the machine-code bytes, or GDB's x command in a running process, to see bytes anywhere.
You can get the assembler to calculate the size by putting labels at the start/end of the part you want, and using mov edx, prog_end - prog_start in the code at the point where you want that size in RDX.
See How does $ work in NASM, exactly? for more about subtracting two labels (in the same section) to get a size. (Where $ is an implicit label at the start of the current line, although $ isn't likely what you want here.)
To get the current address into a register, you need a RIP-relative LEA, not mov, because RIP isn't a general-purpose register and there's no special form of mov that reads it.
here:
lea rsi, [rel here] ; with DEFAULT REL you could just use [here]
mov edi, 1 ; stdout fileno
mov edx, .end - here ; assemble-time constant size calculation
mov eax, 1 ; __NR_write
syscall
.end:
This is fully position-independent, unlike if you used mov esi, here. (How to load address of function or label into register)
The LEA could use lea rsi, [rel $] to assemble to the same machine-code bytes, but you want a label there so you can subtract them.
I optimized your MOV instructions to use 32-bit operand-size, implicitly zero-extending into the full 64-bit RDX and RAX. (And RDI, but write(int fd, void *buf, size_t len) only looks at EDI anyway for the file descriptor).
Note that you can write any bytes of any section; there's nothing special about having a block of code write itself. In the above example, put the start/end labels anywhere. (e.g. foo: and .end:, and mov edx, foo.end - foo taking advantage of how NASM local labels work, by appending to the previous non-local label, so you can reference them from somewhere else. Or just give them both non-dot names.)

Can't understand assembly mov instruction between register and a variable

I am using NASM assembler on linux 64 bit.
There is something with variables and registers I can't understand.
I create a variable named "msg":
msg db "hello, world"
Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.
So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.
Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?
Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.

In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)
mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.
mov rsi, msg ; rsi = address of msg. Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1] ; rax = 'e' (upper 7 bytes zeroed)
mov edx, [msg+6] ; rdx = ' wor' (upper 4 bytes zeroed)
Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
and after writing 1 symbol it changes to the memory location of the next symbol?
No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.
Accessing registers doesn't magically modify them.
There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.

Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. The particular flavor may have more features than what I'm used to. In general, assembly deals with single byte/word entities where the size depends on the processor. I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from.
General statements about Assembly:
Assembly is just like a high level language, except you have to handle a lot more of the details. So if you're used to some operation in say C, you can start there and then break the operation down even further.
For instance, if you have declared two variables that you want to add, that's pretty easy in C:
x = a + b;
In assembly, you have to break that down further:
mov R1, a * get value from a into register R1
mov R2, b * get value from b into register R2
add R1,R2 * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x
Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with.
Comments on your specific question:
If you have a string of characters, then you would have to move each one individually using a loop of some sort. I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop.

Read and display float or double number in assembly in Microsoft Visual Studio

I am using Microsoft Visual Studio 2015 to learn inline assembly programming. I have read a lot of posts on stackoverflow including the most relevant one this, but after I tried the methods the result is still 0.0000. I used the float first and store the value to fpu but the reuslt is the same and tried passing value to eax and still no help.
Here is my code:
#include "stdafx.h"
int _tmain(int argc, _TCHAR* argv[])
{
char input1[] = "Enter number: \n";
char input_format[] = "%lf";
double afloat;
char output[] = "The number is: %lf";
_asm {
lea eax, input1
push eax
call printf_s
add esp, 4
lea eax, input_format
push eax
lea eax, afloat
push eax
call scanf_s
add esp, 8
sub esp, 8
fstp [esp]
lea eax, output
push eax
call printf
add esp, 12
}
return 0;
}
The result:

You are attempting to print the wrong value. In fact, the code should just be causing nonsense to be printed to the terminal. You got quite lucky that you see 0.0. Let's look specifically at the part of the code that retrieves the floating point value, which is your call to scanf_s:
lea eax, input_format
push eax
lea eax, afloat
push eax
call scanf_s
add esp, 8
sub esp, 8
fstp [esp]
First of all, I don't see any logic in adding 8 to your stack pointer (esp) and then immediately subtracting 8 from the stack pointer. Performing these two operations back-to-back just cancel each other out. As such, these two instructions can be deleted.
Second, you are pushing the arguments on the stack in the wrong order. The cdecl calling convention (used by the CRT functions, including printf_s and scanf_s) passes argument in reverse order, from right to left. Therefore, to call scanf_s, you would first push the address of the floating-point value into which the input should be stored, and then push the address of the format control string buffer. Your code is wrong, because it pushes the arguments from left to right. You get lucky with printf_s, because you're only passing one argument, but because you're passing two arguments to scanf_s, bad things happen.
The third problem is that you appear to be assuming that scanf_s returns the output directly.
If it did, and you had requested a floating point value, you would be correct that the cdecl calling convention would have it returning that value at the top of the floating-point stack, in FP(0), and thus you would be correctly popping that value and storing it in a memory location (fstp [esp]).
While scanf_s does return a value (an integer value indicating the number of fields that were successfully converted and assigned), it does not return the value from the standard input stream. There is, in fact, no way that it could do this, since it supports arbitrary types of inputs. This is why it uses a pointer to a memory location to store the value. You probably knew this already, since you arranged to pass that pointer as a parameter to the function.
Now, why did you get an output of 0.0 in the final call to printf? Because of the fstp [esp] instruction. This pops the top value off of the floating-point stack, and stores it in the memory address contained in esp. I have already pointed out that scanf_s does not place any value(s) on the floating-point stack, so technically, it contains meaningless/garbage data. But in your case, you were lucky enough that FP(0) actually contained 0.0, so that is what got printed. Why does FP(0) contain 0.0? I'm guessing because this is a debug build that you're running, and the CRT is zeroing the stack. Or maybe because that's what FSTP pops off the stack when the stack is empty. I don't know, and I don't see that documented anywhere. But it doesn't really matter what happens when you write incorrect code, because you should strive to only write correct code!
Here's what correct code might look like:
; Get address of 'input1' buffer, and push it onto the stack
; in preparation of a call to 'printf_s'. Then, make the call.
lea eax, DWORD PTR [input1]
push eax
call printf_s
add esp, 4 ; clean up the stack after calling 'printf_s'
; Call 'scanf_s' to retrieve the floating-point input.
; Note that in the __cdecl calling convention, arguments are always
; passed on the stack in *reverse* order!
lea eax, DWORD PTR [afloat]
push eax
lea eax, DWORD PTR [input_format]
push eax
call scanf_s
; (no need to clean up the stack here; we're about to reuse that space!)
; The 'scanf_s' function stored the floating-point value that the user entered
; in the 'afloat' variable. So, we need to load this value onto the top
; of the floating point stack in order to pass it to 'printf_s'
fld QWORD PTR [afloat]
fstp QWORD PTR [esp]
; Get the address of the 'output' buffer and push it onto the stack,
; and then call 'printf_s'. Again, this is pushed last because
; __cdecl demands that arguments are passed right-to-left.
lea eax, DWORD PTR [output]
push eax
call printf_s
add esp, 12 ; clean up the stack after 'scanf_s' and 'printf_s'
Note that you could optimize this code further by deferring the stack cleanup after the initial call to printf_s. You can just wait and do the cleanup later at the end of the function. Functionally, this is equivalent, but an optimizing compiler will often choose to defer stack cleanup to produce more efficient code because it can interleave it within other time-consuming instructions.
Also note that you technically do not need the DWORD PTR directives that I've used in the code because the inline assembler (and MASM syntax in general) tries to read your mind and assemble the code that you meant to write. However, I like to write it to be explicit. It just means that the value you're loading is DWORD-sized (32 bits). All pointers on 32-bit x86 are 32 bits in size, as are int values and single-precision float values. QWORD means 64 bits, like an __int64 value or a double value.
Warning: When I first tested this code in MSVC using inline assembly, I couldn't get it to run. It worked fine when assembled separately using MASM, but when written using inline assembly, I couldn't execute it without getting an "access violation" error. In fact, when I tried your original code, I got the same error. Initially, I couldn't figure out how you were able to get your code to run, either!
Finally, I was able to diagnose the problem: by default, my MSVC project had been configured to dynamically link to the C runtime (CRT) library. The implementation of printf/printf_s is apparently doing some type of debug check that is causing this code to fail. I still am not entirely sure what the purpose of this validation code is, or exactly how it works, but it appears to be checking a certain offset within the stack for a sentinel value. Anyway, when I switched to statically linking to the CRT, everything runs as expected.
At least, when I compiled the code as a "release" build. In a "debug" build, the compiler can't tell that you need floating-point support (since all the floating-point stuff is in the inline assembly, which it can't parse), so it fails to tell the linker to link in the floating-point support from the CRT. As a result, the application bombs as soon as you run it and try to use scanf_s with a floating-point value. There are various ways of fixing this, but the simplest way would be to simply explicitly initialize the afloat value (it doesn't matter what you initialize it to; 0.0 or any other value would work just fine).
I suppose you are statically linking the CRT and running a release build, which is why your original code was executing. In that case, the code I've shown will both execute and return the correct results. However, if you're trying to learn assembly language, I strongly recommend avoiding inline assembly and writing functions directly in MASM. This is supported from within the Visual Studio environment, too; for setup instructions, see here (same for all versions).

Have you used a debugger to look at registers / memory?
fstp [esp] is extremely suspicious, because ST0 should be empty at that point. scanf returns an integer, and the calling convention requires the x87 stack to be empty on call/return, except for FP return values.
I forget what happens when you FST while the x87 stack is empty. If you get zero, that would explain it, since this is what you're passing to printf.
add esp, 8 / sub esp, 8 is completely redundant. There's no point in doing that. You can just take it out. (Or comment it out but leave it there as a reminder that you've optimized by reusing the arg space on the stack instead of popping it and pushing new stuff.)
Since scanf writes the double at the address you passed, you could avoid copying it by getting it to write it near the bottom of the stack, and then adjust ESP to right below it. Push a format string and you're ready to call printf, with the double already where it needs to be on the stack as the second arg.
sub esp, 8 ; reserve 8 bytes for storing a `double` on the stack
push esp ; push a pointer to the bottom of that 16B we just reserved
push OFFSET input_format ; (only works if it's in static storage, but it's a string constant so you should have done that instead of initializing a non-const array with a literal inside `main`.)
call scanf ; scanf("%lf", ESP_before_we_pushed_args)
add esp, 8
; TODO: check return value and handle error
; esp now points at the double stored by scanf
push OFFSET output_format
call printf
add esp, 12 ; "pop" the args: format string and the double we made space for before the scanf call.
If the calling convention / ABI you're using requires that you make function calls with ESP 16B-aligned, you can't shorten it quite this much: you'll need an lea eax, [esp+4] / push eax or something instead of push esp, because the double can't be right above scanf's second arg but also be printf's second arg if both calls have the stack 16B-aligned. So have scanf store the double at a higher address, and add esp, 12 to reach it.
IDK if MSVC-style inline-asm guarantees any kind of stack alignment in the first place. Inline-asm seems to make this problem more complicated in some ways than just writing the whole function in asm.

Thank you for all the help and caveats offered. There are still problems I need to configure and fix, but after reading most float point operators on this website , I finally come up with a solution to read and display float point number with Microsoft Visual Studio. I am sorry for all the bad conventions I've used in this post and lack of comments, I will learn to program in a better style. Thanks again!
Here is my working code(different from original post since I want to use float point number there):
#include "stdafx.h"
int _tmain(int argc, _TCHAR* argv[])
{
char input1[] = "Enter number: \n";
char input_format[] = "%f";
char output_format[] = "%.2f\n"; //for better ouput
float afloat;
char output[] = "The number is:";
_asm {
lea eax, input1
push eax
call printf_s
add esp, 4
call flushall; // flush all streams and clear all buffers
mov afloat, 0; // give the variable afloat a default
lea eax, afloat;
push eax;
lea eax, input_format; // read the floating number user inputs
push eax;
call scanf_s;
add esp, 8;
//leave space for more operations.
lea eax, output;
push eax;
call printf; // print the output message
add esp, 4;
sub esp, 8; // reserve stack for a double in stack
fld dword ptr afloat; // load float
fstp qword ptr[esp]; // IMPORTANT: convert to double and store, because printf expects a double not a float
lea eax, output_format; // import the floating number format '%f'
push eax;
call printf; // print the variable afloat
add esp, 12;
}
return 0;
}
The result looks like this:

Understanding ASM. Why does this work in Windows?

Me and a couple of friends are fiddling with a very strange issue. We encountered a Crash in our application inside of a small assembler portion (used to speed up the process). The error was caused by fiddling with the stackpointer and not resetting it at the end, it looked like this:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
pop ebp
When correctly it should be written as:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
mov esp,ebp
pop ebp
Now what our mindbreak is: Why does this work in Windows? We found the error as we ported the application to Linux, where we encountered the crash. Neither in Windows or Android (using the NDK) we encountered any issues and would never have found this error. Is there any Stackpointer recovery? Is there a protection against misusing the stackpointer?

the ebp esp usage, is called a stack frame, and its purpose is to allocate variables on the stack, and afterward have a quick way to restore the stack back before the ret instruction. All new versions of x86 CPU can compress these instructions together using enter / leave instructions instead.
esp is the actual stack pointer used by the CPU when doing push/pop/call/ret.
ebp is a user-manipulated base pointer, more or less all compilers use this as a stack-pointer for local storage.
If the mov esp, ebp instruction is missing, the stack will misbehave if esp != ebp when the CPU reaches pop ebp, but only then.

it seems the compiler takes care of your stack in windows:
The only way I can imagine is:
Microsoft Visual C takes special care of functions that are B{__stdcall}. Since the number of parameters is known at compile time, the compiler encodes the parameter byte count in the symbol name itself.
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
see:
http://unixwiz.net/techtips/win32-callconv-asm.html
and:
https://en.wikipedia.org/wiki/X86_calling_conventions

How to programmatically get the exact address in the memory space according to the assembly statement

Hummm,hope i will express my question clearly.
Now i have an assembly statement string like:
movss ****,xmm0; (intel style)
After this assembly statement being executed, a piece of the process's memory changed.
So **** must be the memory address, it could be something like:
DWORD PTR [eax],
DWORD PTR [eax + 0x4],
DWORD PTR [ebp - 0x4]
...Some style i have not seen, if you know that, please do tell me.
My problem is how to programmatically get the memory address through analyzing the assembly statement string.
For example:
if **** to be :
DWORD PTR [eax]
Perhaps I could search the string, and find EAX register, then get the data from EAX register, maybe it is the memory address, maybe not, it depends on the Addressing System, am i right? Finally，how could i get the exact address？

Is it possible for you to use the LEA "Load Effective Address" instruction ? I'm sorry, I'm not too familiar with Intel assembly, but that's what I would try.
This may help: http://en.wikibooks.org/wiki/X86_Assembly/Data_Transfer#Load_Effective_Address
Hope this helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string