Loading an exe from an exe - visual-c++

I am exporting a function [using _declspec(dllexport)] from a C++ exe. The function works fine when invoked by the exe itself. I am loading this exe (lets call this exe1) from another exe [the test project's exe - I'll call this exe2] using static linking i.e I use exe1's .lib file while compiling exe2 and exe2 loads it into memory on startup just like any dll. This causes the function to fail in execution.
The exact problem is revealed in the disassembly for a switch case statement within the function.
Assembly code when exe1 invokes the function
switch (dwType)
0040FF84 mov eax,dword ptr [dwType]
0040FF87 mov dword ptr [ebp-4],eax
0040FF8A cmp dword ptr [ebp-4],0Bh
0040FF8E ja $LN2+7 (40FFD2h)
0040FF90 mov ecx,dword ptr [ebp-4]
0040FF93 jmp dword ptr (40FFE0h)[ecx*4]
Consider the final two instructions. The mov moves the passed in argument into ecx. At 40EFF0h we have addresses to the various instructions for the respective case statements. Thus, the jmp would take us to the relevant case instructions
Assembly code when exe2 invokes the function
switch (dwType)
0037FF84 mov eax,dword ptr [dwType]
0037FF87 mov dword ptr [ebp-4],eax
0037FF8A cmp dword ptr [ebp-4],0Bh
0037FF8E ja $LN2+7 (37FFD2h)
0037FF90 mov ecx,dword ptr [ebp-4]
0037FF93 jmp dword ptr [ecx*4+40FFE0h]
Spot whats going wrong? The instruction addresses. The code has now been loaded into a different spot in memory. When exe1 was compiled, the compiler assumed that we will always be launching it and hence it would always be loaded at 0x0040000 [as is the case with all windows exes]. So it hard-coded a few values like 40FFE0h into the instructions. Only in the second case 40FFE0 is as good as junk memory since the instruction address table we are looking for is not located there.
How can I solve this without converting exe1 to a dll?

just don't do it. It doesn't worth the bother.
I've tried doing what you're trying a while ago. You can possibly solve the non-relocatable exe problem by changing the option in the properties window under "Linker->Advenced->Fixed base address" but then you'll have other problems.
The thing that finally made me realize its a waste of time is realizing that the EXE doesn't have a DllMain() function. This means that the CRT library is not getting initialized and that all sorts of stuff don't work the way you expect it to.
Here's the question I posted about this a while back

Have you considered another way of doing this? Such as making the 2nd .exe into a .dll and invoking it with rundll32 when you want to use it as an executable?
Otherwise:
The generated assembly is fine. The problem is that Win32 portable executables have a base address (0x0040000 in this case) and a section that contain details locations of addresses so that they can be rebased when required.
So one for two things is happening:
- Either the compiler isn't including the IMAGE_BASE_RELOCATION records when it builds the .exe.
- Or the runtime isn't performing the base relocations when it dynamiclaly loads the .exe
- (possibly both)
If the .exe does contain the relocation records, you can read them and perform the base relocation yourself. You'll have to jump through hoops like making sure you have write access to the memory (VirtualAlloc etc.) but it's conceptually quite simple.
If the .exe doesn't contain the relocation records you're stuffed - either find a compiler option to force their inclusion, or find another way to do what you're doing.
Edit: As shoosh points out, you may run into other problems once you fix this one.

Related

Understanding ASM. Why does this work in Windows?

Me and a couple of friends are fiddling with a very strange issue. We encountered a Crash in our application inside of a small assembler portion (used to speed up the process). The error was caused by fiddling with the stackpointer and not resetting it at the end, it looked like this:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
pop ebp
When correctly it should be written as:
push ebp
mov ebp, esp
; do stuff here including sub and add on esp
mov esp,ebp
pop ebp
Now what our mindbreak is: Why does this work in Windows? We found the error as we ported the application to Linux, where we encountered the crash. Neither in Windows or Android (using the NDK) we encountered any issues and would never have found this error. Is there any Stackpointer recovery? Is there a protection against misusing the stackpointer?
the ebp esp usage, is called a stack frame, and its purpose is to allocate variables on the stack, and afterward have a quick way to restore the stack back before the ret instruction. All new versions of x86 CPU can compress these instructions together using enter / leave instructions instead.
esp is the actual stack pointer used by the CPU when doing push/pop/call/ret.
ebp is a user-manipulated base pointer, more or less all compilers use this as a stack-pointer for local storage.
If the mov esp, ebp instruction is missing, the stack will misbehave if esp != ebp when the CPU reaches pop ebp, but only then.
it seems the compiler takes care of your stack in windows:
The only way I can imagine is:
Microsoft Visual C takes special care of functions that are B{__stdcall}. Since the number of parameters is known at compile time, the compiler encodes the parameter byte count in the symbol name itself.
The __stdcall convention is mainly used by the Windows API, and it's a bit more compact than __cdecl. The main difference is that any given function has a hard-coded set of parameters, and this cannot vary from call to call like it can in C (no "variadic functions").
see:
http://unixwiz.net/techtips/win32-callconv-asm.html
and:
https://en.wikipedia.org/wiki/X86_calling_conventions

Running windows shell commands NASM X86 Assembly language

I am writing a simple assembly program that will just execute windows commands. I will attach the current working code below. The code works if I hard code the base address of WinExec which is a function from Kernel32.dll, I used another program called Arwin to locate this address. However a reboot breaks this because of the windows memory protection Address Space Layout randomization (ASLR)
What I am looking to do is find a way to execute windows shell commands without having to hard code a memory address into my code that will change at the next reboot. I have found similar code around but nothing that I either understand or fits the purpose. I know this can be written in C but I am specifically using assembler to keep the size as small as possible.
Thanks for you advice/help.
;Just runs a simple netstat command.
;compile with nasm -f bin cmd.asm -o cmd.bin
[BITS 32]
global _start
section .text
_start:
jmp short command
function: ;Label
;WinExec("Command to execute",NULL)
pop ecx
xor eax,eax
push eax
push ecx
mov eax,0x77e6e5fd ;Address found by arwin for WinExec in Kernel32.dll
call eax
xor eax,eax
push eax
mov eax,0x7c81cafa
call eax
command: ;Label
call function
db "cmd.exe /c netstat /naob"
db 0x00
Just an update to say I found a way for referencing windows API hashes to perform any action I want in the stack. This negates the need to hard code memory addresses and allows you to write dynamic shellcode.
There are defenses against this however this would still work against the myriad of un-patched and out of date machines still around.
The following two sites were useful in finding what I needed:
http://blog.harmonysecurity.com/2009_08_01_archive.html
https://www.scriptjunkie.us/2010/03/shellcode-api-hashes/

Using ".init_array" section of ELF file

When there is a need to run a piece of code on the program startup (on Linux), how to use correctly the .init_section of an executable file (ELF32-i386)? I have the following code (GNU Assembler) which has ctor initialization function, and the address of this function is placed inside .init_array section:
.intel_syntax noprefix
.data
s1: .asciz "Init code\n"
s2: .asciz "Main code\n"
.global _start
.global ctor
.text
ctor:
mov eax, 4 # sys_write()
mov ebx, 1 # stdout
mov ecx, offset s1
mov edx, 10
int 0x80
ret
.section .init_array
.long ctor
.text
_start:
mov eax, 4
mov ebx, 1
mov ecx, offset s2
mov edx, 10
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This code is assembled with:
as -o init.o init.asm
ld -o init init.o
When the resulting executable is run, only the "Main code" string is printed. How to use properly the .init_array section?
EDIT1: I want to use .init_array because there are multiple source files with their own init code. One can call all this code 'manually' on startup and modify it every time when source files are added to or removed from the project, but .init_array seems to be designed just for this case :
Before transferring control to an application, the runtime linker
processes any initialization sections found in the application and any
loaded dependencies. The initialization sections .preinit_array,
.init_array, and .init are created by the link-editor when a dynamic
object is built.
The runtime linker executes functions whose addresses are contained in
the .preinit_array and .init_array sections. These functions are
executed in the same order in which their addresses appear in the
array.
In case when an executable is created without gcc, the linker seems to not execute the startup code. I tried to write my own standard init routine which reads function pointers in .init_array, section and calls them. It works OK for one file, where one can mark the end of the section, for example, with zero. But with multiple files this zero can be relocated in the middle of the section. How can one correctly determine the size of a section assembled from multiple source files?
If you make a statically linked bare executable the way you're doing, with your own code at the _start entry point, your code just runs from that point. If you want something to happen, your code has to make it happen. There is no magic.
Using sections can be useful to group startup code from multiple source files together, so all the startup code is cold and can potentially be paged out, or at least not need a TLB entry.
So you "properly use" sections by putting functions there and calling them from code that runs sometime after _start.
In your code example, it looks like .init_array is a list of function pointers. I assume the standard CRT startup files read the ELF file and find the length of that section, then walk through it making indirect calls to those functions. Since you're making custom code, it's going to be faster just to call an init function that does everything.
dynamic linking:
The "runtime linker" is the ELF interpreter for dynamic binaries. It runs code in your process before _start, so yes, apparently it does process that ELF section and make magic happen.
So in response to your edit, your options are: implement this processing of .init_array yourself, or create dynamic executables. I'm pretty sure this procedure has been covered in other questions, and I don't have time to research a correct command line for a dynamic executable that still doesn't link libc. (Although you might just want to use gcc -nostartfiles, or something.)
If you're stuck, leave a comment. I may update this later when I have more time anyway, or feel free to edit in a working command.
For normal C programs the .init_array is traversed by a function that is called from _start before main gets called. A good description is on this site.
So I see two ways: You can simply link against the glibc start code. Or you have to find out another mechanism to solve this issue by yourself.

Linux assembly x86 | trying to get stack value, syntax error

I have been messing around with linux assembly on an x86 machine,
Basically my question is: I have pushed couple values into the stack moved the stack pointer into the base pointer and moved a value of 8 into a register to get a pushed value and in the end i wanted to get the value and put it into %ebx for the system call so i would get the value, but it seems to get an error. no clue why.
Error is: junk (%ebp) after register
Example:
.section .data
.section .text
.globl _start
_start:
pushl $50
pushl $20
movl %esp,%ebp
movl $8,%edx
movl %edx(%ebp),%ebx ## Supposed to be return value at system termination // PROBLEM HERE
movl $1,%eax ## System call
int $0x80 # Terminate program
I think part of the problem might be that in x86 the stack actually grows downwards, not up. You're adding to the base pointer, which is giving junk, where you have to subtract from it. I don't have an x86 machine so I can't test this, but have you tried something like movl -%edx(%ebp),%ebx?
Oops, I reversed the direction of the operands in my head. In this case, your stack looks like this:
1952 - ???
1948 - 20
1944 - 50 <- ebp <- esp
So when you take ebp+8, you aren't getting 20, you're getting address 1952, and you don't know what that contains.
Check out the links in https://stackoverflow.com/tags/x86/info. I updated them recently, and added the info about using gdb to single-step asm.
What do you mean "get an error"? Segmentation fault? Syntax error? (The normal syntax is (%ebp, %edx). Only numeric-constant displacements go outside the parens, e.g. -4(%ebp, %edx))
Also, if you're going to use stack frame pointers at all, do the mov %esp, %ebp after pushing any registers you want to preserve, but before pushing args to any functions you're going to call. However, there's no need to use %ebp that way at all, though. gcc defaults to -fomit-frame-pointer since 4.4 I think. It can make it easier to keep track of where your local variables are, if you're pushing/popping stuff.
You might want to just start with 64bit asm, instead of messing around with the obsolete x86 args-on-the-stack ABI.
This just made me think of what's probably wrong with your code. You're probably getting a segfault. (But you didn't say if it was that, syntax error, or something else.) Because you probably built your code in 64bit mode. Build a 32bit binary, or change your code to use %rsp.
You might want to just start with 64bit asm, instead of messing around with the obsolete x86 args-on-the-stack ABI.
This just made me think of what's probably wrong with your code. You're probably getting a segfault. (But you didn't say if it was that, syntax error, or something else.) Because you probably built your code in 64bit mode. Build a 32bit binary, or change your code to use %rsp.

Some inline assembler questions

I already asked similar question here, but I still get some errors, so I hope you could tell me what am I doing wrong. Just know that I know assembler, and I have done several projects in 8051 assembler, and even it is not the same, is close to x86 asm.
There is block of code I tried in VC++ 2010 Express (I am trying to get information from CPUID instruction): `
int main()
{
char a[17]; //containing array for the CPUID string
a[16] = '\0'; //null termination for the std::cout
void *b=&a[0];
int c=0; //predefined value which need to be loaded into eax before cpuid
_asm
{
mov eax,c;
cpuid;
mov [b],eax;
mov [b+4],ebx;
mov [b+8],ecx;
mov [b+12],edx;
}
std::cout<<a;
}`
So, to quick summarize, I tried to create void pointer to the first element of array, and than using indirect addressing just move values from registers. But this approach gives me "stack around b variable is corrupted run-time error" but I donĀ“t know why.
Please help. Thanks. And this is just for study purposes, i know there are functions for CPUID....
EDIT: Also, how can you use direct addressing in x86 VC++ 2010 inline assembler? I mean common syntax for immediate number load in 8051 is mov src,#number but in VC++ asm its mov dest,number without # sign. So how to tell the compiler you want to access memory cell adress x directly?
The reason your stack is corrupted is because you're storing the value of eax in b. Then storing the value of ebx at the memory location b+4, etc. The inline assembler syntax [b+4] is equivalent to the C++ expression &(b+4), if b were a byte pointer.
You can see this if you watch b and single-step. As soon as you execute mov [b],eax, the value of b changes.
One way to fix the problem is to load the value of b into an index register and use indexed addressing:
mov edi,[b]
mov [edi],eax
mov [edi+4],ebx
mov [edi+8],ecx
mov [edi+12],edx
You don't really need b at all, to hold a pointer to a. You can load the index register directly with the lea (load effective address) instruction:
lea edi,a
mov [edi],eax
... etc ...
If you're fiddling with inline assembler, it's a good idea to open the Registers window in the debugger and watch how things change when you're single stepping.
You can also directly address the memory:
mov dword ptr a,eax
mov dword ptr a+4,ebx
... etc ...
However, direct addressing like that required more code bytes than the indexed addressing in the previous example.
I think the above, with the lea (Load Effective Address) instruction and the direct addressing I showed answers your final question.
The advice to open the Registers window in the debugger watch how things change is not going to work in VC++ 2010 Express.
You might be just as surprised as myself to find out that the VC++ 2010 Express is MISSING the registers window. This is especially surprising since stepping in disassembly works.
The only workaround I know is to open a watch window and type the register names in the Name field. Type in EAX EBX ECX EDX ESI EDI EIP ESP EBP EFL and CS DS ES SS FS GS if you want
ST1 ST2 ST3 ST4 ST5 ST6 ST7 also work in the watch window.
You will probably also want to set the Value to Hexadecimal by right clicking in the Watch window and checking Hexadecimal Display.

Resources