Running windows shell commands NASM X86 Assembly language - security

I am writing a simple assembly program that will just execute windows commands. I will attach the current working code below. The code works if I hard code the base address of WinExec which is a function from Kernel32.dll, I used another program called Arwin to locate this address. However a reboot breaks this because of the windows memory protection Address Space Layout randomization (ASLR)
What I am looking to do is find a way to execute windows shell commands without having to hard code a memory address into my code that will change at the next reboot. I have found similar code around but nothing that I either understand or fits the purpose. I know this can be written in C but I am specifically using assembler to keep the size as small as possible.
Thanks for you advice/help.
;Just runs a simple netstat command.
;compile with nasm -f bin cmd.asm -o cmd.bin
[BITS 32]
global _start
section .text
_start:
jmp short command
function: ;Label
;WinExec("Command to execute",NULL)
pop ecx
xor eax,eax
push eax
push ecx
mov eax,0x77e6e5fd ;Address found by arwin for WinExec in Kernel32.dll
call eax
xor eax,eax
push eax
mov eax,0x7c81cafa
call eax
command: ;Label
call function
db "cmd.exe /c netstat /naob"
db 0x00

Just an update to say I found a way for referencing windows API hashes to perform any action I want in the stack. This negates the need to hard code memory addresses and allows you to write dynamic shellcode.
There are defenses against this however this would still work against the myriad of un-patched and out of date machines still around.
The following two sites were useful in finding what I needed:
http://blog.harmonysecurity.com/2009_08_01_archive.html
https://www.scriptjunkie.us/2010/03/shellcode-api-hashes/

Related

Using ".init_array" section of ELF file

When there is a need to run a piece of code on the program startup (on Linux), how to use correctly the .init_section of an executable file (ELF32-i386)? I have the following code (GNU Assembler) which has ctor initialization function, and the address of this function is placed inside .init_array section:
.intel_syntax noprefix
.data
s1: .asciz "Init code\n"
s2: .asciz "Main code\n"
.global _start
.global ctor
.text
ctor:
mov eax, 4 # sys_write()
mov ebx, 1 # stdout
mov ecx, offset s1
mov edx, 10
int 0x80
ret
.section .init_array
.long ctor
.text
_start:
mov eax, 4
mov ebx, 1
mov ecx, offset s2
mov edx, 10
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This code is assembled with:
as -o init.o init.asm
ld -o init init.o
When the resulting executable is run, only the "Main code" string is printed. How to use properly the .init_array section?
EDIT1: I want to use .init_array because there are multiple source files with their own init code. One can call all this code 'manually' on startup and modify it every time when source files are added to or removed from the project, but .init_array seems to be designed just for this case :
Before transferring control to an application, the runtime linker
processes any initialization sections found in the application and any
loaded dependencies. The initialization sections .preinit_array,
.init_array, and .init are created by the link-editor when a dynamic
object is built.
The runtime linker executes functions whose addresses are contained in
the .preinit_array and .init_array sections. These functions are
executed in the same order in which their addresses appear in the
array.
In case when an executable is created without gcc, the linker seems to not execute the startup code. I tried to write my own standard init routine which reads function pointers in .init_array, section and calls them. It works OK for one file, where one can mark the end of the section, for example, with zero. But with multiple files this zero can be relocated in the middle of the section. How can one correctly determine the size of a section assembled from multiple source files?
If you make a statically linked bare executable the way you're doing, with your own code at the _start entry point, your code just runs from that point. If you want something to happen, your code has to make it happen. There is no magic.
Using sections can be useful to group startup code from multiple source files together, so all the startup code is cold and can potentially be paged out, or at least not need a TLB entry.
So you "properly use" sections by putting functions there and calling them from code that runs sometime after _start.
In your code example, it looks like .init_array is a list of function pointers. I assume the standard CRT startup files read the ELF file and find the length of that section, then walk through it making indirect calls to those functions. Since you're making custom code, it's going to be faster just to call an init function that does everything.
dynamic linking:
The "runtime linker" is the ELF interpreter for dynamic binaries. It runs code in your process before _start, so yes, apparently it does process that ELF section and make magic happen.
So in response to your edit, your options are: implement this processing of .init_array yourself, or create dynamic executables. I'm pretty sure this procedure has been covered in other questions, and I don't have time to research a correct command line for a dynamic executable that still doesn't link libc. (Although you might just want to use gcc -nostartfiles, or something.)
If you're stuck, leave a comment. I may update this later when I have more time anyway, or feel free to edit in a working command.
For normal C programs the .init_array is traversed by a function that is called from _start before main gets called. A good description is on this site.
So I see two ways: You can simply link against the glibc start code. Or you have to find out another mechanism to solve this issue by yourself.

NASM - Use labels for code loaded from disk

As a learning experience, I'm writing a boot-loader for BIOS in NASM on 16-bit real mode in my x86 emulator, Qemu.
BIOS loads your boot-sector at address 0x7C00. NASM assumes you start at 0x0, so your labels are useless unless you do something like specify the origin with [org 0x7C00] (or presumably other techniques). But, when you load the 2nd stage boot-loader, its RAM origin is different, which complicates the hell out of using labels in that newly loaded code.
What's the recommended way to deal with this? It this linker territory? Should I be using segment registers instead of org?
Thanks in advance!
p.s. Here's the code that works right now:
[bits 16]
[org 0x7c00]
LOAD_ADDR: equ 0x9000 ; This is where I'm loading the 2nd stage in RAM.
start:
mov bp, 0x8000 ; set up the stack
mov sp, bp ; relatively out of the way
call disk_load ; load the new instructions
; at 0x9000
jmp LOAD_ADDR
%include "disk_load.asm"
times 510 - ($ - $$) db 0
dw 0xaa55 ;; end of bootsector
seg_two:
;; this is ridiculous. Better way?
mov cx, LOAD_ADDR + print_j - seg_two
jmp cx
jmp $
print_j:
mov ah, 0x0E
mov al, 'k'
int 0x10
jmp $
times 2048 db 0xf
You may be making this harder than it is (not that this is trivial by any means!)
Your labels work fine and will continue to work fine. Remember that, if you look under the hood at the machine code generated, your short jumps (everything after seg_two in what you've posted) are relative jumps. This means that the assembler doesn't actually need to compute the real address, it simply needs to calculate the offset from the current opcode. However, when you load your code into RAM at 0x9000, that is an entirely different story.
Personally, when writing precisely the kind of code that you are, I would separate the code. The boot sector stops at the dw 0xaa55 and the 2nd stage gets its own file with an ORG 0x9000 at the top.
When you compile these to object code you simply need to concatenate them together. Essentially, that's what you're doing now except that you are getting the assembler to do it for you.
Hope this makes sense. :)

Some inline assembler questions

I already asked similar question here, but I still get some errors, so I hope you could tell me what am I doing wrong. Just know that I know assembler, and I have done several projects in 8051 assembler, and even it is not the same, is close to x86 asm.
There is block of code I tried in VC++ 2010 Express (I am trying to get information from CPUID instruction): `
int main()
{
char a[17]; //containing array for the CPUID string
a[16] = '\0'; //null termination for the std::cout
void *b=&a[0];
int c=0; //predefined value which need to be loaded into eax before cpuid
_asm
{
mov eax,c;
cpuid;
mov [b],eax;
mov [b+4],ebx;
mov [b+8],ecx;
mov [b+12],edx;
}
std::cout<<a;
}`
So, to quick summarize, I tried to create void pointer to the first element of array, and than using indirect addressing just move values from registers. But this approach gives me "stack around b variable is corrupted run-time error" but I donĀ“t know why.
Please help. Thanks. And this is just for study purposes, i know there are functions for CPUID....
EDIT: Also, how can you use direct addressing in x86 VC++ 2010 inline assembler? I mean common syntax for immediate number load in 8051 is mov src,#number but in VC++ asm its mov dest,number without # sign. So how to tell the compiler you want to access memory cell adress x directly?
The reason your stack is corrupted is because you're storing the value of eax in b. Then storing the value of ebx at the memory location b+4, etc. The inline assembler syntax [b+4] is equivalent to the C++ expression &(b+4), if b were a byte pointer.
You can see this if you watch b and single-step. As soon as you execute mov [b],eax, the value of b changes.
One way to fix the problem is to load the value of b into an index register and use indexed addressing:
mov edi,[b]
mov [edi],eax
mov [edi+4],ebx
mov [edi+8],ecx
mov [edi+12],edx
You don't really need b at all, to hold a pointer to a. You can load the index register directly with the lea (load effective address) instruction:
lea edi,a
mov [edi],eax
... etc ...
If you're fiddling with inline assembler, it's a good idea to open the Registers window in the debugger and watch how things change when you're single stepping.
You can also directly address the memory:
mov dword ptr a,eax
mov dword ptr a+4,ebx
... etc ...
However, direct addressing like that required more code bytes than the indexed addressing in the previous example.
I think the above, with the lea (Load Effective Address) instruction and the direct addressing I showed answers your final question.
The advice to open the Registers window in the debugger watch how things change is not going to work in VC++ 2010 Express.
You might be just as surprised as myself to find out that the VC++ 2010 Express is MISSING the registers window. This is especially surprising since stepping in disassembly works.
The only workaround I know is to open a watch window and type the register names in the Name field. Type in EAX EBX ECX EDX ESI EDI EIP ESP EBP EFL and CS DS ES SS FS GS if you want
ST1 ST2 ST3 ST4 ST5 ST6 ST7 also work in the watch window.
You will probably also want to set the Value to Hexadecimal by right clicking in the Watch window and checking Hexadecimal Display.

Loading an exe from an exe

I am exporting a function [using _declspec(dllexport)] from a C++ exe. The function works fine when invoked by the exe itself. I am loading this exe (lets call this exe1) from another exe [the test project's exe - I'll call this exe2] using static linking i.e I use exe1's .lib file while compiling exe2 and exe2 loads it into memory on startup just like any dll. This causes the function to fail in execution.
The exact problem is revealed in the disassembly for a switch case statement within the function.
Assembly code when exe1 invokes the function
switch (dwType)
0040FF84 mov eax,dword ptr [dwType]
0040FF87 mov dword ptr [ebp-4],eax
0040FF8A cmp dword ptr [ebp-4],0Bh
0040FF8E ja $LN2+7 (40FFD2h)
0040FF90 mov ecx,dword ptr [ebp-4]
0040FF93 jmp dword ptr (40FFE0h)[ecx*4]
Consider the final two instructions. The mov moves the passed in argument into ecx. At 40EFF0h we have addresses to the various instructions for the respective case statements. Thus, the jmp would take us to the relevant case instructions
Assembly code when exe2 invokes the function
switch (dwType)
0037FF84 mov eax,dword ptr [dwType]
0037FF87 mov dword ptr [ebp-4],eax
0037FF8A cmp dword ptr [ebp-4],0Bh
0037FF8E ja $LN2+7 (37FFD2h)
0037FF90 mov ecx,dword ptr [ebp-4]
0037FF93 jmp dword ptr [ecx*4+40FFE0h]
Spot whats going wrong? The instruction addresses. The code has now been loaded into a different spot in memory. When exe1 was compiled, the compiler assumed that we will always be launching it and hence it would always be loaded at 0x0040000 [as is the case with all windows exes]. So it hard-coded a few values like 40FFE0h into the instructions. Only in the second case 40FFE0 is as good as junk memory since the instruction address table we are looking for is not located there.
How can I solve this without converting exe1 to a dll?
just don't do it. It doesn't worth the bother.
I've tried doing what you're trying a while ago. You can possibly solve the non-relocatable exe problem by changing the option in the properties window under "Linker->Advenced->Fixed base address" but then you'll have other problems.
The thing that finally made me realize its a waste of time is realizing that the EXE doesn't have a DllMain() function. This means that the CRT library is not getting initialized and that all sorts of stuff don't work the way you expect it to.
Here's the question I posted about this a while back
Have you considered another way of doing this? Such as making the 2nd .exe into a .dll and invoking it with rundll32 when you want to use it as an executable?
Otherwise:
The generated assembly is fine. The problem is that Win32 portable executables have a base address (0x0040000 in this case) and a section that contain details locations of addresses so that they can be rebased when required.
So one for two things is happening:
- Either the compiler isn't including the IMAGE_BASE_RELOCATION records when it builds the .exe.
- Or the runtime isn't performing the base relocations when it dynamiclaly loads the .exe
- (possibly both)
If the .exe does contain the relocation records, you can read them and perform the base relocation yourself. You'll have to jump through hoops like making sure you have write access to the memory (VirtualAlloc etc.) but it's conceptually quite simple.
If the .exe doesn't contain the relocation records you're stuffed - either find a compiler option to force their inclusion, or find another way to do what you're doing.
Edit: As shoosh points out, you may run into other problems once you fix this one.

Why does this NASM code print my environment variables?

I'm just finishing up a computer architecture course this semester where, among other things, we've been dabbling in MIPS assembly and running it in the MARS simulator. Today, out of curiosity, I started messing around with NASM on my Ubuntu box, and have basically just been piecing things together from tutorials and getting a feel for how NASM is different from MIPS. Here is the code snippet I'm currently looking at:
global _start
_start:
mov eax, 4
mov ebx, 1
pop ecx
pop ecx
pop ecx
mov edx, 200
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This is saved as test.asm, and assembled with nasm -f elf test.asm and linked with ld -o test test.o. When I invoke it with ./test anArgument, it prints 'anArgument', as expected, followed by however many characters it takes to pad that string to 200 characters total (because of that mov edx, 200 statement). The interesting thing, though, is that these padding characters, which I would have expected to be gibberish, are actually from the beginning of my environment variables, as displayed by the env command. Why is this printing out my environment variables?
Without knowing the actual answer or having the time to look it up, I'm guessing that the environment variables get stored in memory after the command line arguments. Your code is simply buffer overflowing into the environment variable strings and printing them too.
This actually makes sense, since the command line arguments are handled by the system/loader, as are the environment variables, so it makes sense that they are stored near each other. To fix this, you would need to find the length of the command line arguments and only print that many characters. Or, since I assume they are null terminated strings, print until you reach a zero byte.
EDIT:
I assume that both command line arguments and environment variables are stored in the initialized data section (.data in NASM, I believe)
In order to understand why you are getting environment variables, you need to understand how the kernel arranges memory on process startup. Here is a good explanation with a picture (scroll down to "Stack layout").
As long as you're being curious, you might want to work out how to print the address of your string (I think it's passed in and you popped it off the stack). Also, write a hex dump routine so you can look at that memory and other addresses you're curious about. This may help you discover things about the program space.
Curiosity may be the most important thing in your programmer's toolbox.
I haven't investigated the particulars of starting processes but I think that every time a new shell starts, a copy of the environment is made for it. You may be seeing the leftovers of a shell that was started by a command you ran, or a script you wrote, etc.

Resources