Why does this NASM code print my environment variables?

Why does this NASM code print my environment variables? - linux

I'm just finishing up a computer architecture course this semester where, among other things, we've been dabbling in MIPS assembly and running it in the MARS simulator. Today, out of curiosity, I started messing around with NASM on my Ubuntu box, and have basically just been piecing things together from tutorials and getting a feel for how NASM is different from MIPS. Here is the code snippet I'm currently looking at:
global _start
_start:
mov eax, 4
mov ebx, 1
pop ecx
pop ecx
pop ecx
mov edx, 200
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This is saved as test.asm, and assembled with nasm -f elf test.asm and linked with ld -o test test.o. When I invoke it with ./test anArgument, it prints 'anArgument', as expected, followed by however many characters it takes to pad that string to 200 characters total (because of that mov edx, 200 statement). The interesting thing, though, is that these padding characters, which I would have expected to be gibberish, are actually from the beginning of my environment variables, as displayed by the env command. Why is this printing out my environment variables?

Without knowing the actual answer or having the time to look it up, I'm guessing that the environment variables get stored in memory after the command line arguments. Your code is simply buffer overflowing into the environment variable strings and printing them too.
This actually makes sense, since the command line arguments are handled by the system/loader, as are the environment variables, so it makes sense that they are stored near each other. To fix this, you would need to find the length of the command line arguments and only print that many characters. Or, since I assume they are null terminated strings, print until you reach a zero byte.
EDIT:
I assume that both command line arguments and environment variables are stored in the initialized data section (.data in NASM, I believe)

In order to understand why you are getting environment variables, you need to understand how the kernel arranges memory on process startup. Here is a good explanation with a picture (scroll down to "Stack layout").

As long as you're being curious, you might want to work out how to print the address of your string (I think it's passed in and you popped it off the stack). Also, write a hex dump routine so you can look at that memory and other addresses you're curious about. This may help you discover things about the program space.
Curiosity may be the most important thing in your programmer's toolbox.
I haven't investigated the particulars of starting processes but I think that every time a new shell starts, a copy of the environment is made for it. You may be seeing the leftovers of a shell that was started by a command you ran, or a script you wrote, etc.

Related

GDB: reading with the x command doesn't work

I'm trying to reverse engineer an ELF 64-bit program. I've set a breakpoint on the pointer of a <strcmp#plt>. I read here that the values that are being compared are stored in rax and rbx. When I use the x command (here I use the x/s command, to get string output, but i've tried with x as well) I get an error saying <error: Cannont access memory at address *some address*>, the exact command is x/s $rax. The print function does work, but that gives me raw data (hex i think?) and I need the string, are there ways to convert the value to string? My system is 64-bit windows 10, I'm using gdb in the linux subsystem in windows.
EDIT
I start my GDB session with gdb R (name of the program)
Program info:
Then I run disass main to find the address where my input is compared, that's where the strcmp#plt is.
I copy the address and set a breakpoint using b * 0x8001168.
After I inserted the breakpoint, I run run TestArg.
Now the program halted at my breakpoint.
I run info registers to see if there's something in it, there is.
When I try x/s $rax, I get the following output.
The print command does work, but I need the string value.

I read here that the values that are being compared are stored in rax and rbx.
That blog post appears to be plain wrong -- there is no way for parameters to strcmp() to be in rax and rbx on x86_64 -- the Linux / x86_64 calling convention requires them to be in rdi and rsi.
Looking at their register values, rax happens to contain the same value as rdi, and rdx happens to contain the same value as rsi.
The fact that they
use rax and rdx without mentioning (or apparently understanding) why and,
don't actually show the disassembly they refer to
indicates a low-quality content. You should probably stop reading this source, and use something more reliable.

Running windows shell commands NASM X86 Assembly language

I am writing a simple assembly program that will just execute windows commands. I will attach the current working code below. The code works if I hard code the base address of WinExec which is a function from Kernel32.dll, I used another program called Arwin to locate this address. However a reboot breaks this because of the windows memory protection Address Space Layout randomization (ASLR)
What I am looking to do is find a way to execute windows shell commands without having to hard code a memory address into my code that will change at the next reboot. I have found similar code around but nothing that I either understand or fits the purpose. I know this can be written in C but I am specifically using assembler to keep the size as small as possible.
Thanks for you advice/help.
;Just runs a simple netstat command.
;compile with nasm -f bin cmd.asm -o cmd.bin
[BITS 32]
global _start
section .text
_start:
jmp short command
function: ;Label
;WinExec("Command to execute",NULL)
pop ecx
xor eax,eax
push eax
push ecx
mov eax,0x77e6e5fd ;Address found by arwin for WinExec in Kernel32.dll
call eax
xor eax,eax
push eax
mov eax,0x7c81cafa
call eax
command: ;Label
call function
db "cmd.exe /c netstat /naob"
db 0x00

Just an update to say I found a way for referencing windows API hashes to perform any action I want in the stack. This negates the need to hard code memory addresses and allows you to write dynamic shellcode.
There are defenses against this however this would still work against the myriad of un-patched and out of date machines still around.
The following two sites were useful in finding what I needed:
http://blog.harmonysecurity.com/2009_08_01_archive.html
https://www.scriptjunkie.us/2010/03/shellcode-api-hashes/

Exploiting technique POP RET doesn't work

I've a virtual machine with Windows XP Professional SP3 x86 Spanish, and I've disabled DEP.
Well, I was executing the exploit POPPOPRET_JMPESP.pl for "Easy RM to MP3 Converter" (yes, the program of the tutorial Corelan), and didn't work, so I've done 2 tests:
The first, and successful, replacing the JMP ESP that jumps to the beginning of the shellcode, for "CCCC" (\x43\x43\x43\x43), and produce error trying to execute this direction.
Source
Screenshot
And the second, putting to the JMP ESP a valid direction and a shellcode that are many breakpoints. Here an error is produced due to the direction that the program has tried to execute, and JMP ESP NO points ONLY to the breakpoints put.
Source
Screenshot
The original stack is:
Buffer will be filled with As
RET ADDRESSS will be substituted with the direction of a POP POP RET
4 bytes of junk will be substituted for "XXXX"
Here points ESP before it's executed the POP POP RET instructions, and
it will be substituted with 4 NOPs
4 bytes of junk that it will be sustituted with 4 NOPs
4 bytes of junk where ESP will point after, and
will be substited for a JMP ESP, and that will take of the stack
the RET instruction we have put
Here is the beginning of the shellcode and is where will
points the JMP ESP when be executed

Are you correctly getting to your shellcode?
If yes, check your shellcode for any weird characters. That particular program will stop copying input at NULL characters (0x00) as well as a couple others. The easiest way to do this is find the last character in your shellcode that was copied correctly and excluding the one after that. You can use metasploit to generate shellcode excluding specific characters.
If no, check your offsets and padding.

Retrieving command line args in gas

I am struggling to find a way to retrieve first character of the first command line argument in GAS. To clarify what I mean here how I do it in NASM:
main:
pop ebx
pop ebx
pop ebx ; get first argument string address into EBX register
cmp byte [ebx], 45 ; compare the first char of the argument string to ASCII dash ('-', dec value 45)
...
EDIT: Literal conversion to AT&T syntax and compiling it in GAS won't produce expected results. EBX value will not be recognized as a character.

I'm not sure to understand why you want, in 2011, to code an entire application in assembly (unless fun is your main motivation, and coding thousands of assembly lines is fun to you).
And if you do that, you probably don't want to call the entry point of your program main (in C on Gnu/Linux, that function is called from crt0.o or similar), but more probably start.
And if you want to understand the detailed way of starting an application in assembly, read the Assembly Howto and the Linux ABI supplement for x86-64 and similar documents for your particular system.

Ok I figured it out myself. Entry point should NOT be called main, but _start. Thanks Basile for a hint, +1.

Hardware VGA Text Mode IO in old dos assembly Issue

After reading about at least the first 3 or 4 chapters of about 4 different books on assembly programming I got to a stage where I can put "Hello World" on a dosbox console using MASM 6.11. Imagine my delight!!
The first version of my program used DOS Function 13h.
The second version of my program used BIOS Function 10h
I now want to do the third version using direct hardware output. I have read the parts of the books that explain the screen is divided into 80x25 on a VGA monitors (not bothered about detecting CGA and all that so my program uses memory address 0B800h for colour VGA, because DOSBox is great and all, and my desire to move to Win Assembler sometime before im 90 years old). I have read that each character on the hardware screen is 2 bytes (1 for the attribute and one for the character itself, therefore you have 80x25x2=4000 bytes). The odd bytes describe the attribute, and the even bytes the ASCII character.
But my problem is this. No matter how I try, I cant get my program to output a simple black and white (which is just the attribute, I assume I can change this reasonably easily) string (which is just an array of bytes) 5 lines from the top of the screen, and 20 characters in from the left edge (which is just the number of blank characters away from a zero based index with 4000 bytes long). (if my calc is correct that is 5x80=400+20=420x2=840 is the starting position of my string within the array of 4000 bytes)
How do I separate the attribute from the character (I got it to work partially but it only shows every second character, then a bunch of random junk (thats how I figured I need some sort of byte pair for the attribute and text), or how do I set it up such that both are recognised together. How do I control he position of the text on the screen once the calcs are made. Where am I going wrong.
I have tried looking around the web for this seemingly simple question but am unable to find a solution. Is there anyone who used to program in DOS and x86 Assembly that can tell me how to do this easy little program by not using BIOS or DOS functions, just with the hardware.
I would really appreicate a simple code snippet if possible. Or a refrence to some site or free e-book. I dont want buying a big book on dos console programming which will end up useless when I move to windows shortly. The only reason I am focused on this is because I want to learn true assembly, not some macro language or some pretensious high level language that claims to be assembly.
I am trying to build a library of routines that will make Assembly easier to learn so people dont have to work though all the 3 to 6 chapters across 10 books of theory esentially explaining again and again the same stuff when really all that is needed is enough to know how to get some output, assign values to variables, get some input, and do some loops and decisions. The theory can come along later, and by the time they get to loops and decisions most people will have done enough assembler to have all the theory anyway. I beleive assembly should be taught no different than any other language starting with a simple hello world program then getting input ect. I want to make this possible. But hey, I'm just a beginner, maybe my taughts will change when I learn more.
One other note, I know for a fact the problem is NOT with DOSBox as I have a very old PC running true MS-DOS V6.2 and the program still doesnt work (but gives almost identical output). In fact, DOSBox actually runs some of my old programs even better than True dos. Gem desktop being one example. Just wanted to get that cleared before people try suggesting its a problem with the emulator. It cant be, not with such simple programs. No im afraid the problem is with my little brain not fully understanding what is needed.
Can anyone out there please help!!
Below is the program I used (MASM 6.1 Under DOSBox on Win 7 64-bit). It uses BIOS Intrrupt 10h Function 13h sub function 0. I want to do the very same using direct hardware IO.
.model small
.stack
.data ;part of the program containing data
;Constants - None
;Variables
MyMsg db 'Hello World'
.code
Main:
GetAddress:
mov ax,#data ;Gets address of data segment
mov es,ax ;Loads segment address into es regrister
mov bp,OFFSET MyMsg ;Load Offset into DX
SetAttributes:
mov bl,01001111b ;BG/FG Colour attributes
mov cx,11 ;Length of string in data segment
SetRowAndCol:
mov dh,24 ;Set the row to start printing at
mov dl,68 ;Set the column to start printing at
GetFunctionAndSub:
mov ah,13h ;BIOS Function 10h - String Output
mov al,0 ;BIOS Sub-Function (0-3)
Execute:
int 10h ;BIOS Interrupt 10h
EndProg:
mov ax,4c00h ;Terminate program return 0 to OS
int 21h ;DOS Interrupt 21h
end Main
end
I want to have this in a format that is easy to explain. So here is my current workings. I've almost got it. But it only prints the attributes, getting the characters on screen is a problem. (Ocasionally when I modify it slightly, I get every second character with random attributes (I think I know the technicalities of why, but dont know enough assembler to fix it)).
.model small
.stack
.data
;Constants
ScreenSeg equ 0B800h
;Variables
MyMsg db 'Hello World'
StrLen equ $-MyMsg
.code
Main:
SetSeg:
mov ax, ScreenSeg ;set segment register:
mov ds, ax
InitializeStringLoop: ;Display all characters: - Not working :( Y!
mov cx, StrLen ;number of characters.
mov di, 00h ;start from byte 'h'
OutputString:
mov [di], offset byte ptr MyMsg[di]
add di, 2 ;skip over next attribute code in vga memory.
loop OutputString
InitializeAttributeLoop:;Color all characters: - Atributes are working fine.
mov cx, StrLen ;number of characters.
mov di, 01h ;start from byte after 'h'
;Assuming I have all chars with same attributes - fine for now - later I would make this
;into a procedure that I will just pass the details into. - But for now I just want a
;basic output tutorial.
OutputAttributes:
mov [di], 11101100b ;light red(1100) on yellow(1110)
add di, 2 ;skip over next ascii code in vga memory.
loop OutputAttributes
EndPrg:
mov ax, 4C00h
int 21h
end Main
Of course I want to reduce the instructions used to the bare bones essentials. (for proper tuition purposes, less to cover when teaching others). Hense the reason I did not use MOVSB/W/D ect with REP. I opted instead for an easy to explain manual loop using standard MOV, INC, ADD ect. These are instructions that are basic enough and easy to explain to newcommers. So if possible I would like to keep it as close to this as possible.
I know esentially all that seems to be wrong is the loop for the actual string handler. Its not letting me increment the address the way I want it to. Its embarasssing to me cause I am actually quite a good progammer using C++, C#, VB, and Delphi (way back when)). I know you wouldnt think that given I cant even get a loop right in assembler, but it is such a different language. There are 2 or 3 loops in high level languages, and it seems there are an infinate combination of ways to do loops in assembler depending on the instructions. So I say "Simple Loop", but in reality there is little simple about it.
I hope someone can help me with this, you would be saving my assembly carreer, and ensuring I eventually become a good assembly teacher. Thanks in advance, and especially for reading this far.

The typical convention would be to use ds:si as source, and es:di as destination.
So it would end up being similar to (untested):
mov ax, #data
mov ds, ax
mov ax, ScreenSeg
mov es, ax
...
mov si, offset MyMsg
OutputString:
mov al, byte ptr ds:[si]
mov byte ptr es:[di], al
add si, 1 ; next character from string
add di, 2 ; skip over next attribute code in vga memory.
loop OutputString

I would suggest getting the Masm32 Package if you don't already have it. It is mainly geared towards easily using Assembly Language in "Windows today", which is very nice, but also will teach you a lot about Asm and also shows where to get the Intel Chip manuals that were mentioned in an earlier reply that are indispensable.
I started programming in the 80's and so I can see why you are so interested in the nitty gritty of it, I know I miss it. If more people were to start out there, it would pay off for them very much. You are doing a great service!
I am playing with exactly what you are talking about, Direct Hardware, and I have also learned that Windows has changed some of the DOS services and BIOS services have changed too, so that some don't work any more. I am in fact writing a small .com program and running it from Win7 in a Command Prompt Window, Prints a msg and waits for a key, Pretty cool considering it's Win7 in 2012!
In fact it was BIOS 10h - 0Eh that did not work and so I tried Dos 21h 02h to write to the screen and it worked. The code is below because it is a .com (Command Program) i thought it might be of use to you.
; This makes a .com program (64k Limit, Code, Data and all
; have to fit in this space. Used for small utilities and
; good for very fast tasks. In fact DOS Commands are mostly
; small .com programs like this (except more useful)!
;
; Assemble with Masm using
; c:\masm32\bin\ml /AT /c bfc.asm
; Link with Masm's Link16 using
; c:\masm32\bin\link16 bfc.obj,bfc.com;
;
; Link16 is the key to making this 16bit .com (Command) file
SEGMT SEGMENT
org 100h
Start:
push CS
pop DS
MOV SI, OFFSET Message
Next:
MOV ah, 02h ; Write Char to Standard out
MOV dl, [si] ; Char
INT 21h ; Write it
INC si ; Next Char
CMP byte ptr[si], 0 ; Done?
JNE Next ; Nope
WaitKey:
XOR ah, ah ; 0
INT 16h ; Wait for any Key
ExitHere:
MOV ah, 4Ch ; Exit with Return Code
xor al, al ; Return Code
INT 21h
Message db "It Works in Windows 7!", 0
SEGMT ENDS
END Start

I used to do all of what you are talking about. Trying to remember the details. Michael Abrash is a name you should be googling for. Mode-X for example a 200 something by 200 (240x200?) 256 color mode was very popular as it broke the 16 color boundary and at the time the games looked really good.
I think that the on the metal register programming was doable but painful and you likely need to get the programmers reference/datasheet for the chip you are interested in. As time passed from CGA to EGA to VGA to VESA the way things worked changed as well. I think typically you use int something calls to get access to the page frame then you could fill that in directly. VESA I think worked that way, VESA was a big livesaver when it came to video card support, you used to have to write your own drivers for each chip before then (if you didnt want the ugly standard modes).
I would look at mode-x or vesa and go from there. You need to have a bit of a hacker inside to get through some of this anyway, it is very rare to find a datasheet/programmers reference manual that is complete and accurate, you always have to just shove some bytes around to see what happens. Start filling those memory blocks that are supposed to be the page frames until you see something change on the screen...
I dont see any specific graphics programming books in my library other than the abrash books like the graphics programming black book, which was at the tail end of this period of time. I have bios and dos programmers references and used ralf browns list too. I am sure that I had copies of the manuals for popular video chips (before the internet remember you called a human on that phone thing with a cord hanging out of it, the human took a printed manual, sometimes nicely bound sometimes just a staple in the corner if that, put it in an envelope and mailed it to you and that was your only copy unless you ran it through the copier). I have stacks of printed stuff that, sorry to say, am not going to go through to answer this question. I will keep this question in my mind though and look around some more for info, actually I may have some of my old programs handy, drawing fractals and other such things (direct as practical to the video card/memory).
EDIT.
I know you are looking for text mode stuff, and this is a graphics mode but it may or may not shed some light on what you are trying to do. combination of int calls and filling pages and palette memory directly.
http://dwelch.s3.amazonaws.com/fly.tar.gz

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string