Retrieving command line args in gas

Retrieving command line args in gas - linux

I am struggling to find a way to retrieve first character of the first command line argument in GAS. To clarify what I mean here how I do it in NASM:
main:
pop ebx
pop ebx
pop ebx ; get first argument string address into EBX register
cmp byte [ebx], 45 ; compare the first char of the argument string to ASCII dash ('-', dec value 45)
...
EDIT: Literal conversion to AT&T syntax and compiling it in GAS won't produce expected results. EBX value will not be recognized as a character.

I'm not sure to understand why you want, in 2011, to code an entire application in assembly (unless fun is your main motivation, and coding thousands of assembly lines is fun to you).
And if you do that, you probably don't want to call the entry point of your program main (in C on Gnu/Linux, that function is called from crt0.o or similar), but more probably start.
And if you want to understand the detailed way of starting an application in assembly, read the Assembly Howto and the Linux ABI supplement for x86-64 and similar documents for your particular system.

Ok I figured it out myself. Entry point should NOT be called main, but _start. Thanks Basile for a hint, +1.

Related

Exploiting technique POP RET doesn't work

I've a virtual machine with Windows XP Professional SP3 x86 Spanish, and I've disabled DEP.
Well, I was executing the exploit POPPOPRET_JMPESP.pl for "Easy RM to MP3 Converter" (yes, the program of the tutorial Corelan), and didn't work, so I've done 2 tests:
The first, and successful, replacing the JMP ESP that jumps to the beginning of the shellcode, for "CCCC" (\x43\x43\x43\x43), and produce error trying to execute this direction.
Source
Screenshot
And the second, putting to the JMP ESP a valid direction and a shellcode that are many breakpoints. Here an error is produced due to the direction that the program has tried to execute, and JMP ESP NO points ONLY to the breakpoints put.
Source
Screenshot
The original stack is:
Buffer will be filled with As
RET ADDRESSS will be substituted with the direction of a POP POP RET
4 bytes of junk will be substituted for "XXXX"
Here points ESP before it's executed the POP POP RET instructions, and
it will be substituted with 4 NOPs
4 bytes of junk that it will be sustituted with 4 NOPs
4 bytes of junk where ESP will point after, and
will be substited for a JMP ESP, and that will take of the stack
the RET instruction we have put
Here is the beginning of the shellcode and is where will
points the JMP ESP when be executed

Are you correctly getting to your shellcode?
If yes, check your shellcode for any weird characters. That particular program will stop copying input at NULL characters (0x00) as well as a couple others. The easiest way to do this is find the last character in your shellcode that was copied correctly and excluding the one after that. You can use metasploit to generate shellcode excluding specific characters.
If no, check your offsets and padding.

Need guidance on understanding basic assembly

I've dabbled in and out of trying to get a grasp on how to do some simple programming in assembly. I am going over a tutorial hello world program and most of the stuff they have explained makes sense, but they are really glossing over it. I would like some help in understanding some different parts of the program. Here is their tutorial example -
section .text
global main ;must be declared for linker (ld)
main: ;tells linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;our dear string
len equ $ - msg ;length of our dear string
There is the text section and the data section. The data section seems to hold our user defined info for the program. It looks like the "frame" of the program is in the text section and the "meat" is in the data section... ? I assume the program when compiled executes the text section with data from the data section filled into the text section? The bss/text/data section interaction is kind of foreign to me. Also in the data section where the msg and len.... variables? are mentioned, they are followed by some information i'm not sure what to make of. msg is followed by db, what does this mean? Then the text, and then 0xa, what is the 0xa for? Also len is followed by equ, does this mean equals? len equals dollarsign minus msg variable? What is the dollar sign? A sort of operator? Also the instructions in the text section, mov ebx,1 apparently, or seems to tell the program to utilize STDOUT? Is moving 1 to the ebx register a standard instruction for setting stdout?
Perhaps someone has a little more thorough tutorial to recommend? I am looking to get dirty with assembly and need to teach myself some of the... "core fundamentals" if you will. Thanks for all the help!

[NB - I don't know what assembler dialect you're using, so I just took some "best guesses" at some parts of this stuff. If someone can help clarify, that would be great.]
It looks like the "frame" of the program is in the text section and the "meat" is in the data section... ?
The text section contains the executable instructions that make up your program. The data section contains the data that said program is going to operate on. The reason there are two different sections is to allow the program loader and operating system to be able to provide you with some protections. The text section can be loaded in to read-only memory, for example, and the data section can be loaded into memory marked as "non-executable", so code isn't accidentally (or maliciously) executed from that region.
I assume the program when compiled executes the text section with data from the data section filled into the text section?
The program (instructions in the text section) normally references symbols and manipulates data in the data section, if that's what you're asking.
The bss/text/data section interaction is kind of foreign to me.
The BSS section is similar to the data section, except it's all zero-initialized. That means it doesn't need to actually take up space in the executable file. The program loader just has to make an appropriately sized block of zero bytes in memory. Your program doesn't have a BSS section.
Also in the data section where the msg and len.... variables? are mentioned, they are followed by some information i'm not sure what to make of. msg is followed by db, what does this mean?
msg and len are variables of a sort, yes. msg is a global variable pointing to the string that follows - the db means data byte, indicating that the assembler should just emit the literal bytes that follow. len is being set to the length of the string (more below).
Then the text, and then 0xa, what is the 0xa for?
0x0a is the hexadecimal value of an ASCII newline character.
Also len is followed by equ, does this mean equals?
Yes.
len equals dollarsign minus msg variable? What is the dollar sign? A sort of operator?
The $ means "the current location". As the assembler is going about its job, it keeps track of how many bytes of data and code it's generated in a counter. So this code is saying: "subtract the location of the msg label from the current location and store that number as len". Since the "current location" is just past the end of the string, you get the length there.
Also the instructions in the text section, mov ebx,1 apparently, or seems to tell the program to utilize STDOUT? Is moving 1 to the ebx register a standard instruction for setting stdout?
The program is making a system call via the int 0x80 instruction. Before that, it has to set things up in a way the OS expects - in this case that looks like putting a 1 in ebx1 to mean stdout, along with the other three registers - the message length in edx, a pointer to the message in ecx, and the system call number in eax. I'd guess you're on linux - you can look up a system call table from google without too much trouble, I'm sure.
Perhaps someone has a little more thorough tutorial to recommend?
Sorry, not off the top of my head.

Hardware VGA Text Mode IO in old dos assembly Issue

After reading about at least the first 3 or 4 chapters of about 4 different books on assembly programming I got to a stage where I can put "Hello World" on a dosbox console using MASM 6.11. Imagine my delight!!
The first version of my program used DOS Function 13h.
The second version of my program used BIOS Function 10h
I now want to do the third version using direct hardware output. I have read the parts of the books that explain the screen is divided into 80x25 on a VGA monitors (not bothered about detecting CGA and all that so my program uses memory address 0B800h for colour VGA, because DOSBox is great and all, and my desire to move to Win Assembler sometime before im 90 years old). I have read that each character on the hardware screen is 2 bytes (1 for the attribute and one for the character itself, therefore you have 80x25x2=4000 bytes). The odd bytes describe the attribute, and the even bytes the ASCII character.
But my problem is this. No matter how I try, I cant get my program to output a simple black and white (which is just the attribute, I assume I can change this reasonably easily) string (which is just an array of bytes) 5 lines from the top of the screen, and 20 characters in from the left edge (which is just the number of blank characters away from a zero based index with 4000 bytes long). (if my calc is correct that is 5x80=400+20=420x2=840 is the starting position of my string within the array of 4000 bytes)
How do I separate the attribute from the character (I got it to work partially but it only shows every second character, then a bunch of random junk (thats how I figured I need some sort of byte pair for the attribute and text), or how do I set it up such that both are recognised together. How do I control he position of the text on the screen once the calcs are made. Where am I going wrong.
I have tried looking around the web for this seemingly simple question but am unable to find a solution. Is there anyone who used to program in DOS and x86 Assembly that can tell me how to do this easy little program by not using BIOS or DOS functions, just with the hardware.
I would really appreicate a simple code snippet if possible. Or a refrence to some site or free e-book. I dont want buying a big book on dos console programming which will end up useless when I move to windows shortly. The only reason I am focused on this is because I want to learn true assembly, not some macro language or some pretensious high level language that claims to be assembly.
I am trying to build a library of routines that will make Assembly easier to learn so people dont have to work though all the 3 to 6 chapters across 10 books of theory esentially explaining again and again the same stuff when really all that is needed is enough to know how to get some output, assign values to variables, get some input, and do some loops and decisions. The theory can come along later, and by the time they get to loops and decisions most people will have done enough assembler to have all the theory anyway. I beleive assembly should be taught no different than any other language starting with a simple hello world program then getting input ect. I want to make this possible. But hey, I'm just a beginner, maybe my taughts will change when I learn more.
One other note, I know for a fact the problem is NOT with DOSBox as I have a very old PC running true MS-DOS V6.2 and the program still doesnt work (but gives almost identical output). In fact, DOSBox actually runs some of my old programs even better than True dos. Gem desktop being one example. Just wanted to get that cleared before people try suggesting its a problem with the emulator. It cant be, not with such simple programs. No im afraid the problem is with my little brain not fully understanding what is needed.
Can anyone out there please help!!
Below is the program I used (MASM 6.1 Under DOSBox on Win 7 64-bit). It uses BIOS Intrrupt 10h Function 13h sub function 0. I want to do the very same using direct hardware IO.
.model small
.stack
.data ;part of the program containing data
;Constants - None
;Variables
MyMsg db 'Hello World'
.code
Main:
GetAddress:
mov ax,#data ;Gets address of data segment
mov es,ax ;Loads segment address into es regrister
mov bp,OFFSET MyMsg ;Load Offset into DX
SetAttributes:
mov bl,01001111b ;BG/FG Colour attributes
mov cx,11 ;Length of string in data segment
SetRowAndCol:
mov dh,24 ;Set the row to start printing at
mov dl,68 ;Set the column to start printing at
GetFunctionAndSub:
mov ah,13h ;BIOS Function 10h - String Output
mov al,0 ;BIOS Sub-Function (0-3)
Execute:
int 10h ;BIOS Interrupt 10h
EndProg:
mov ax,4c00h ;Terminate program return 0 to OS
int 21h ;DOS Interrupt 21h
end Main
end
I want to have this in a format that is easy to explain. So here is my current workings. I've almost got it. But it only prints the attributes, getting the characters on screen is a problem. (Ocasionally when I modify it slightly, I get every second character with random attributes (I think I know the technicalities of why, but dont know enough assembler to fix it)).
.model small
.stack
.data
;Constants
ScreenSeg equ 0B800h
;Variables
MyMsg db 'Hello World'
StrLen equ $-MyMsg
.code
Main:
SetSeg:
mov ax, ScreenSeg ;set segment register:
mov ds, ax
InitializeStringLoop: ;Display all characters: - Not working :( Y!
mov cx, StrLen ;number of characters.
mov di, 00h ;start from byte 'h'
OutputString:
mov [di], offset byte ptr MyMsg[di]
add di, 2 ;skip over next attribute code in vga memory.
loop OutputString
InitializeAttributeLoop:;Color all characters: - Atributes are working fine.
mov cx, StrLen ;number of characters.
mov di, 01h ;start from byte after 'h'
;Assuming I have all chars with same attributes - fine for now - later I would make this
;into a procedure that I will just pass the details into. - But for now I just want a
;basic output tutorial.
OutputAttributes:
mov [di], 11101100b ;light red(1100) on yellow(1110)
add di, 2 ;skip over next ascii code in vga memory.
loop OutputAttributes
EndPrg:
mov ax, 4C00h
int 21h
end Main
Of course I want to reduce the instructions used to the bare bones essentials. (for proper tuition purposes, less to cover when teaching others). Hense the reason I did not use MOVSB/W/D ect with REP. I opted instead for an easy to explain manual loop using standard MOV, INC, ADD ect. These are instructions that are basic enough and easy to explain to newcommers. So if possible I would like to keep it as close to this as possible.
I know esentially all that seems to be wrong is the loop for the actual string handler. Its not letting me increment the address the way I want it to. Its embarasssing to me cause I am actually quite a good progammer using C++, C#, VB, and Delphi (way back when)). I know you wouldnt think that given I cant even get a loop right in assembler, but it is such a different language. There are 2 or 3 loops in high level languages, and it seems there are an infinate combination of ways to do loops in assembler depending on the instructions. So I say "Simple Loop", but in reality there is little simple about it.
I hope someone can help me with this, you would be saving my assembly carreer, and ensuring I eventually become a good assembly teacher. Thanks in advance, and especially for reading this far.

The typical convention would be to use ds:si as source, and es:di as destination.
So it would end up being similar to (untested):
mov ax, #data
mov ds, ax
mov ax, ScreenSeg
mov es, ax
...
mov si, offset MyMsg
OutputString:
mov al, byte ptr ds:[si]
mov byte ptr es:[di], al
add si, 1 ; next character from string
add di, 2 ; skip over next attribute code in vga memory.
loop OutputString

I would suggest getting the Masm32 Package if you don't already have it. It is mainly geared towards easily using Assembly Language in "Windows today", which is very nice, but also will teach you a lot about Asm and also shows where to get the Intel Chip manuals that were mentioned in an earlier reply that are indispensable.
I started programming in the 80's and so I can see why you are so interested in the nitty gritty of it, I know I miss it. If more people were to start out there, it would pay off for them very much. You are doing a great service!
I am playing with exactly what you are talking about, Direct Hardware, and I have also learned that Windows has changed some of the DOS services and BIOS services have changed too, so that some don't work any more. I am in fact writing a small .com program and running it from Win7 in a Command Prompt Window, Prints a msg and waits for a key, Pretty cool considering it's Win7 in 2012!
In fact it was BIOS 10h - 0Eh that did not work and so I tried Dos 21h 02h to write to the screen and it worked. The code is below because it is a .com (Command Program) i thought it might be of use to you.
; This makes a .com program (64k Limit, Code, Data and all
; have to fit in this space. Used for small utilities and
; good for very fast tasks. In fact DOS Commands are mostly
; small .com programs like this (except more useful)!
;
; Assemble with Masm using
; c:\masm32\bin\ml /AT /c bfc.asm
; Link with Masm's Link16 using
; c:\masm32\bin\link16 bfc.obj,bfc.com;
;
; Link16 is the key to making this 16bit .com (Command) file
SEGMT SEGMENT
org 100h
Start:
push CS
pop DS
MOV SI, OFFSET Message
Next:
MOV ah, 02h ; Write Char to Standard out
MOV dl, [si] ; Char
INT 21h ; Write it
INC si ; Next Char
CMP byte ptr[si], 0 ; Done?
JNE Next ; Nope
WaitKey:
XOR ah, ah ; 0
INT 16h ; Wait for any Key
ExitHere:
MOV ah, 4Ch ; Exit with Return Code
xor al, al ; Return Code
INT 21h
Message db "It Works in Windows 7!", 0
SEGMT ENDS
END Start

I used to do all of what you are talking about. Trying to remember the details. Michael Abrash is a name you should be googling for. Mode-X for example a 200 something by 200 (240x200?) 256 color mode was very popular as it broke the 16 color boundary and at the time the games looked really good.
I think that the on the metal register programming was doable but painful and you likely need to get the programmers reference/datasheet for the chip you are interested in. As time passed from CGA to EGA to VGA to VESA the way things worked changed as well. I think typically you use int something calls to get access to the page frame then you could fill that in directly. VESA I think worked that way, VESA was a big livesaver when it came to video card support, you used to have to write your own drivers for each chip before then (if you didnt want the ugly standard modes).
I would look at mode-x or vesa and go from there. You need to have a bit of a hacker inside to get through some of this anyway, it is very rare to find a datasheet/programmers reference manual that is complete and accurate, you always have to just shove some bytes around to see what happens. Start filling those memory blocks that are supposed to be the page frames until you see something change on the screen...
I dont see any specific graphics programming books in my library other than the abrash books like the graphics programming black book, which was at the tail end of this period of time. I have bios and dos programmers references and used ralf browns list too. I am sure that I had copies of the manuals for popular video chips (before the internet remember you called a human on that phone thing with a cord hanging out of it, the human took a printed manual, sometimes nicely bound sometimes just a staple in the corner if that, put it in an envelope and mailed it to you and that was your only copy unless you ran it through the copier). I have stacks of printed stuff that, sorry to say, am not going to go through to answer this question. I will keep this question in my mind though and look around some more for info, actually I may have some of my old programs handy, drawing fractals and other such things (direct as practical to the video card/memory).
EDIT.
I know you are looking for text mode stuff, and this is a graphics mode but it may or may not shed some light on what you are trying to do. combination of int calls and filling pages and palette memory directly.
http://dwelch.s3.amazonaws.com/fly.tar.gz

Loading an exe from an exe

I am exporting a function [using _declspec(dllexport)] from a C++ exe. The function works fine when invoked by the exe itself. I am loading this exe (lets call this exe1) from another exe [the test project's exe - I'll call this exe2] using static linking i.e I use exe1's .lib file while compiling exe2 and exe2 loads it into memory on startup just like any dll. This causes the function to fail in execution.
The exact problem is revealed in the disassembly for a switch case statement within the function.
Assembly code when exe1 invokes the function
switch (dwType)
0040FF84 mov eax,dword ptr [dwType]
0040FF87 mov dword ptr [ebp-4],eax
0040FF8A cmp dword ptr [ebp-4],0Bh
0040FF8E ja $LN2+7 (40FFD2h)
0040FF90 mov ecx,dword ptr [ebp-4]
0040FF93 jmp dword ptr (40FFE0h)[ecx*4]
Consider the final two instructions. The mov moves the passed in argument into ecx. At 40EFF0h we have addresses to the various instructions for the respective case statements. Thus, the jmp would take us to the relevant case instructions
Assembly code when exe2 invokes the function
switch (dwType)
0037FF84 mov eax,dword ptr [dwType]
0037FF87 mov dword ptr [ebp-4],eax
0037FF8A cmp dword ptr [ebp-4],0Bh
0037FF8E ja $LN2+7 (37FFD2h)
0037FF90 mov ecx,dword ptr [ebp-4]
0037FF93 jmp dword ptr [ecx*4+40FFE0h]
Spot whats going wrong? The instruction addresses. The code has now been loaded into a different spot in memory. When exe1 was compiled, the compiler assumed that we will always be launching it and hence it would always be loaded at 0x0040000 [as is the case with all windows exes]. So it hard-coded a few values like 40FFE0h into the instructions. Only in the second case 40FFE0 is as good as junk memory since the instruction address table we are looking for is not located there.
How can I solve this without converting exe1 to a dll?

just don't do it. It doesn't worth the bother.
I've tried doing what you're trying a while ago. You can possibly solve the non-relocatable exe problem by changing the option in the properties window under "Linker->Advenced->Fixed base address" but then you'll have other problems.
The thing that finally made me realize its a waste of time is realizing that the EXE doesn't have a DllMain() function. This means that the CRT library is not getting initialized and that all sorts of stuff don't work the way you expect it to.
Here's the question I posted about this a while back

Have you considered another way of doing this? Such as making the 2nd .exe into a .dll and invoking it with rundll32 when you want to use it as an executable?
Otherwise:
The generated assembly is fine. The problem is that Win32 portable executables have a base address (0x0040000 in this case) and a section that contain details locations of addresses so that they can be rebased when required.
So one for two things is happening:
- Either the compiler isn't including the IMAGE_BASE_RELOCATION records when it builds the .exe.
- Or the runtime isn't performing the base relocations when it dynamiclaly loads the .exe
- (possibly both)
If the .exe does contain the relocation records, you can read them and perform the base relocation yourself. You'll have to jump through hoops like making sure you have write access to the memory (VirtualAlloc etc.) but it's conceptually quite simple.
If the .exe doesn't contain the relocation records you're stuffed - either find a compiler option to force their inclusion, or find another way to do what you're doing.
Edit: As shoosh points out, you may run into other problems once you fix this one.

Why does this NASM code print my environment variables?

I'm just finishing up a computer architecture course this semester where, among other things, we've been dabbling in MIPS assembly and running it in the MARS simulator. Today, out of curiosity, I started messing around with NASM on my Ubuntu box, and have basically just been piecing things together from tutorials and getting a feel for how NASM is different from MIPS. Here is the code snippet I'm currently looking at:
global _start
_start:
mov eax, 4
mov ebx, 1
pop ecx
pop ecx
pop ecx
mov edx, 200
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This is saved as test.asm, and assembled with nasm -f elf test.asm and linked with ld -o test test.o. When I invoke it with ./test anArgument, it prints 'anArgument', as expected, followed by however many characters it takes to pad that string to 200 characters total (because of that mov edx, 200 statement). The interesting thing, though, is that these padding characters, which I would have expected to be gibberish, are actually from the beginning of my environment variables, as displayed by the env command. Why is this printing out my environment variables?

Without knowing the actual answer or having the time to look it up, I'm guessing that the environment variables get stored in memory after the command line arguments. Your code is simply buffer overflowing into the environment variable strings and printing them too.
This actually makes sense, since the command line arguments are handled by the system/loader, as are the environment variables, so it makes sense that they are stored near each other. To fix this, you would need to find the length of the command line arguments and only print that many characters. Or, since I assume they are null terminated strings, print until you reach a zero byte.
EDIT:
I assume that both command line arguments and environment variables are stored in the initialized data section (.data in NASM, I believe)

In order to understand why you are getting environment variables, you need to understand how the kernel arranges memory on process startup. Here is a good explanation with a picture (scroll down to "Stack layout").

As long as you're being curious, you might want to work out how to print the address of your string (I think it's passed in and you popped it off the stack). Also, write a hex dump routine so you can look at that memory and other addresses you're curious about. This may help you discover things about the program space.
Curiosity may be the most important thing in your programmer's toolbox.
I haven't investigated the particulars of starting processes but I think that every time a new shell starts, a copy of the environment is made for it. You may be seeing the leftovers of a shell that was started by a command you ran, or a script you wrote, etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string