NASM assembler data error - nasm

After:section .data, the line reads:number db 7 dup 0. The error message is:Comma expected after operand 1. But, there is no operand. Why the error message?

NASM does not use dup as MASM does, instead it uses TIMES
number times 7 db 0
Section 3.2.5 from the manual:
3.2.5 TIMES: Repeating Instructions or Data

Related

GDB: reading with the x command doesn't work

I'm trying to reverse engineer an ELF 64-bit program. I've set a breakpoint on the pointer of a <strcmp#plt>. I read here that the values that are being compared are stored in rax and rbx. When I use the x command (here I use the x/s command, to get string output, but i've tried with x as well) I get an error saying <error: Cannont access memory at address *some address*>, the exact command is x/s $rax. The print function does work, but that gives me raw data (hex i think?) and I need the string, are there ways to convert the value to string? My system is 64-bit windows 10, I'm using gdb in the linux subsystem in windows.
EDIT
I start my GDB session with gdb R (name of the program)
Program info:
Then I run disass main to find the address where my input is compared, that's where the strcmp#plt is.
I copy the address and set a breakpoint using b * 0x8001168.
After I inserted the breakpoint, I run run TestArg.
Now the program halted at my breakpoint.
I run info registers to see if there's something in it, there is.
When I try x/s $rax, I get the following output.
The print command does work, but I need the string value.
I read here that the values that are being compared are stored in rax and rbx.
That blog post appears to be plain wrong -- there is no way for parameters to strcmp() to be in rax and rbx on x86_64 -- the Linux / x86_64 calling convention requires them to be in rdi and rsi.
Looking at their register values, rax happens to contain the same value as rdi, and rdx happens to contain the same value as rsi.
The fact that they
use rax and rdx without mentioning (or apparently understanding) why and,
don't actually show the disassembly they refer to
indicates a low-quality content. You should probably stop reading this source, and use something more reliable.

Why does a write system call print a bunch of junk when you mov edx, [msgLen] from db $-msg - same as with the address?

For some reason [msgLen] and msgLen produce the same result. I know that I ought to use equ or something, but I want to know why this isn't working. Is NASM ignoring the dereference? It prints my string and a bunch of junk bc it thinks the string is bigger than it is right? Thanks!
Please see the assembly below:
_start:
mov edx,[msgLen]
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
mov eax,1
int 0x80
section .data
msg db 'Zoom',0xA
msgLen db $-msg
Use a debugger to look at EDX after the load, or use strace ./a.out to see the length you actually pass to the system call. It will be different (address vs. value loaded). Both ways are buggy in different ways that happen to produce large values, so the effect only happens to be the same. Debugging / tracing tools would have clearly shown you that NASM isn't "ignoring the dereference", though.
mov edx, msgLen is the address, right next to msg. It will be something like 0x804a005 if you link into a standard Linux non-PIE static executable with ld -melf_i386 -o foo foo.o.
mov edx, [msgLen] loads 4 bytes from that address (the width of EDX), where you only assembled the size into 1 byte in the data section. The 3 high bytes come from whatever comes next in the part of the file that's mapped to that memory page.
When I assemble with just nasm -felf32 foo.asm and link with ld -melf_i386, that happens to be 3 bytes of zeros, so I don't get any garbage printed from the dword-load version. (x86 is little-endian).
But it seems your non-zero bytes are debug info. I've found NASM's debug info is the opposite of helpful, sometimes making GDB's disassembly confused (layout reg / layout asm sometimes fail to disassemble a whole block after a label) so I don't use it.
But if I use nasm -felf32 -Fdwarf, then I do get 7173 from that dword load that goes past the end of the .data section. That's a different large number, so this is just wrong in a different way, not the same problem. 7173 is 0x1c05, so it corresponds to db 5, 0x1c, 0, 0. i.e. your calculated length of 5 is the low byte, but there's a 0x1c after it. yasm -gdwarf2 gives me 469762053 = 0x1c000005.
If you'd used db $-msg, 0,0,0 or dd $-msg, you could load a whole dword. (To load and zero-extend a byte into a dword register, use movzx edx, byte [mem])
write() behaviour with a large length
If you give it some very large length, write will go until it reaches an unreadable page, then return the length it actually wrote. (It doesn't check the whole buffer for readability before starting to copy_from_user. And it returns the number of bytes written if that's non-zero before encountering an unreadable page. You can only get -EFAULT if an unreadable page is encountered right away, not if you pass a huge length that includes some unmapped pages later.)
e.g. with mov edx, msgLen (the label address)
$ nasm -felf32 -Fdwarf foo.asm
$ ld -melf_i386 -o foo foo.o
$ strace -o foo.tr ./foo # write trace to a file so it doesn't mix with terminal output
Zoom
foo.asmmsgmsgLen__bss_start_edata_end.symtab.strtab.shstrtab.text.data.debug_aranges.debug_info.debug_abbrev.debug_lin! ' 6& 9B_
$ cat foo.tr
execve("./foo", ["./foo"], 0x7fffdf062e20 /* 53 vars */) = 0
write(1, "Zoom\n\5\34\0\0\0\2\0\0\0\0\0\4\0\0\0\0\0\0\220\4\10\35\0\0\0\0\0"..., 134520837) = 4096
exit(1) = ?
+++ exited with 1 +++
134520837 is the length you passed, 0x804a005 (the address of msgLen). The system call writes 4096 bytes, 1 whole page, before getting to an unmapped page and stopping early.** It doesn't matter number you pass higher than that, because there's only 1 page before the end of the mapping. (And msg: is apparently right at the start of that page.)
On the terminal (where \0 prints as empty), you mostly just see the printable characters; pipe into hexdump -C if you want a better look at the binary data. It includes bits of metadata from the file, because the kernel's ELF program loader works by mmaping the file into memory (with a MAP_PRIVATE read-write no-exec mapping for that part). Use readelf -a and look at the ELF program headers: they tell the kernel which parts of the file to map into memory where, with what permissions.
Fun fact: If I redirect to /dev/null (strace ./foo > /dev/null), the kernel's write handler for that special device driver doesn't even check the buffer for permissions, so write() actually does return 134520837.
write(1, "Zoom\n\5\0\0[...]\0\0\220\4\10"..., 134520837) = 134520837
Using EQU properly
And yes, you should be using equ so you can use the actual length as an immediate. Having the assembler calculate it and then assemble that byte into the data section is less convenient. This prints exactly the right size, and is more efficient than using an absolute address to reference 1 constant byte.
mov edx, msg.len ; mov r32, imm32
...
section .rodata
msg: db 'Zoom',0xA
.len equ $-msg
(Using a NASM . local label is unrelated to using equ; I'm showing that too because I like how it helps organize the global namespace.)
Also semi-related to how ld lays out sections into ELF segments for the program-loader: recent ld pads sections for 4k page alignment or something like that to avoid getting data mapped where it doesn't need to be. Especially out of executable pages.

Exploiting technique POP RET doesn't work

I've a virtual machine with Windows XP Professional SP3 x86 Spanish, and I've disabled DEP.
Well, I was executing the exploit POPPOPRET_JMPESP.pl for "Easy RM to MP3 Converter" (yes, the program of the tutorial Corelan), and didn't work, so I've done 2 tests:
The first, and successful, replacing the JMP ESP that jumps to the beginning of the shellcode, for "CCCC" (\x43\x43\x43\x43), and produce error trying to execute this direction.
Source
Screenshot
And the second, putting to the JMP ESP a valid direction and a shellcode that are many breakpoints. Here an error is produced due to the direction that the program has tried to execute, and JMP ESP NO points ONLY to the breakpoints put.
Source
Screenshot
The original stack is:
Buffer will be filled with As
RET ADDRESSS will be substituted with the direction of a POP POP RET
4 bytes of junk will be substituted for "XXXX"
Here points ESP before it's executed the POP POP RET instructions, and
it will be substituted with 4 NOPs
4 bytes of junk that it will be sustituted with 4 NOPs
4 bytes of junk where ESP will point after, and
will be substited for a JMP ESP, and that will take of the stack
the RET instruction we have put
Here is the beginning of the shellcode and is where will
points the JMP ESP when be executed
Are you correctly getting to your shellcode?
If yes, check your shellcode for any weird characters. That particular program will stop copying input at NULL characters (0x00) as well as a couple others. The easiest way to do this is find the last character in your shellcode that was copied correctly and excluding the one after that. You can use metasploit to generate shellcode excluding specific characters.
If no, check your offsets and padding.

Retrieving command line args in gas

I am struggling to find a way to retrieve first character of the first command line argument in GAS. To clarify what I mean here how I do it in NASM:
main:
pop ebx
pop ebx
pop ebx ; get first argument string address into EBX register
cmp byte [ebx], 45 ; compare the first char of the argument string to ASCII dash ('-', dec value 45)
...
EDIT: Literal conversion to AT&T syntax and compiling it in GAS won't produce expected results. EBX value will not be recognized as a character.
I'm not sure to understand why you want, in 2011, to code an entire application in assembly (unless fun is your main motivation, and coding thousands of assembly lines is fun to you).
And if you do that, you probably don't want to call the entry point of your program main (in C on Gnu/Linux, that function is called from crt0.o or similar), but more probably start.
And if you want to understand the detailed way of starting an application in assembly, read the Assembly Howto and the Linux ABI supplement for x86-64 and similar documents for your particular system.
Ok I figured it out myself. Entry point should NOT be called main, but _start. Thanks Basile for a hint, +1.

Why does this NASM code print my environment variables?

I'm just finishing up a computer architecture course this semester where, among other things, we've been dabbling in MIPS assembly and running it in the MARS simulator. Today, out of curiosity, I started messing around with NASM on my Ubuntu box, and have basically just been piecing things together from tutorials and getting a feel for how NASM is different from MIPS. Here is the code snippet I'm currently looking at:
global _start
_start:
mov eax, 4
mov ebx, 1
pop ecx
pop ecx
pop ecx
mov edx, 200
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This is saved as test.asm, and assembled with nasm -f elf test.asm and linked with ld -o test test.o. When I invoke it with ./test anArgument, it prints 'anArgument', as expected, followed by however many characters it takes to pad that string to 200 characters total (because of that mov edx, 200 statement). The interesting thing, though, is that these padding characters, which I would have expected to be gibberish, are actually from the beginning of my environment variables, as displayed by the env command. Why is this printing out my environment variables?
Without knowing the actual answer or having the time to look it up, I'm guessing that the environment variables get stored in memory after the command line arguments. Your code is simply buffer overflowing into the environment variable strings and printing them too.
This actually makes sense, since the command line arguments are handled by the system/loader, as are the environment variables, so it makes sense that they are stored near each other. To fix this, you would need to find the length of the command line arguments and only print that many characters. Or, since I assume they are null terminated strings, print until you reach a zero byte.
EDIT:
I assume that both command line arguments and environment variables are stored in the initialized data section (.data in NASM, I believe)
In order to understand why you are getting environment variables, you need to understand how the kernel arranges memory on process startup. Here is a good explanation with a picture (scroll down to "Stack layout").
As long as you're being curious, you might want to work out how to print the address of your string (I think it's passed in and you popped it off the stack). Also, write a hex dump routine so you can look at that memory and other addresses you're curious about. This may help you discover things about the program space.
Curiosity may be the most important thing in your programmer's toolbox.
I haven't investigated the particulars of starting processes but I think that every time a new shell starts, a copy of the environment is made for it. You may be seeing the leftovers of a shell that was started by a command you ran, or a script you wrote, etc.

Resources