I'm trying to disassembly app written in assembly. I'm on Linux, x64:
$ objdump -d my_app
my_app: file format elf64-x86-64
That's it. What's wrong with it? It's not a simple hello world of a few lines, it's around 200 lines of code.
The same with gbd:
$ gdb -q my_app
Reading symbols from my_app...(no debugging symbols found)...done.
(gdb)
And
$ radare2 my_app
Warning: Cannot initialize section headers
Warning: Cannot initialize strings table
Warning: Cannot initialize dynamic strings
Warning: Cannot initialize dynamic section
-- Calculate checksums for the current block with the commands starting with '#' (#md5, #crc32, #all, ..)
update:
$ objdump -D my_app
my_app: file format elf64-x86-64
compiling:
$ fasm my_app.asm
# => my_app
update2:
; simplified
format ELF64 executable 3
include "import64.inc"
interpreter "/lib64/ld-linux-x86-64.so.2"
needed "libc.so.6"
import printf, close
segment readable
A equ 123
B equ 222
C equ 333
segment readable writeable
struc s1 a, b, c {
.a1 dw a
.b1 dw b
.c dd c
}
msg:
.m1 db "aaa", 0
.m2 db "bbb", 0
.m3 db "ccc", 0
segment readable executable
entry $
mov rax, 2
mov rdi, "something.txt"
mov rsi, 0
syscall
; .............
; omitted
Asking fasm to directly produce an ELF binary without the use of a linker will only create segments but no sections in the output. This confuses some tools. In particular objdump -d is specifically documented to operate on sections. Note that gdb can still debug and disassemble it, if you give it some addresses, e.g. the entry point.
Related
Ok.
So i have been messing around with Assembly, and i was wondering: just HOW does a linkes ELF64 File look like, and can i directly write a linked file in plain-text? (like create a file e.G "main", write the hex-values of the system-calls, and then run it without linking or assembling.)
I have tried objdump -x main but i don't think, this is the entire ELF-File, because there is too less information, as i think.
Here the output:
main: Dateiformat elf64-x86-64
Inhalt von Abschnitt .text:
4000b0 b8040000 00bb0100 0000b9d0 006000ba .............`..
4000c0 0c000000 cd80b801 000000cd 80 .............
Inhalt von Abschnitt .data:
6000d0 48454c4c 4f2c2057 4f524c44 HELLO, WORLD
my Assembler Code:
section .data
msg: db "HELLO, WORLD"
len: equ $-msg
section .text
;write
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, len
int 80h;
;quit
mov eax, 1
int 80h;
EDIT: My Compiler is finished now, I just stuck with assembler and let NASM/ld do the job
If you want to see the entire structure of your executable try:
objdump -D some_exe
and if you want to see your file in hex format do:
xxd some_exe
or
hexdump some_exe
can i directly write a linked file in plain-text?
Well... Theoretically you can if you know exactly the instructions of the executable and you write them in binary to a plaintext file.
For example, for any given executable exe_file you can do this:
touch temp_file plaintext_file
xxd -p exe_file > temp_file
xxd -p -r temp_file > plaintext_file
chmod u+x plaintext_file
The plaintext_file will be an executable exactly the same as your exe_file. If between steps 2 and 3 you modify the temp_file you are directly modifying the executable by hand, although it is not very likely to change something "specific", unless you have very deep understanding of elf64 format (which I don't and I'm not sure what can be achieved with this).
Note: I know step 1 is redundant, I used it for demonstrating that you are starting with 2 simple plaintext files.
I'm learning assembly with NASM for a class I have in college. I would like to link the C Runtime Library with ld, but I just can't seem to wrap my head around it. I have a 64 bit machine with Linux Mint installed.
The reason I'm confused is that -- to my knowledge -- instead of linking the C runtime, gcc copies the things that you need into your program. I might be wrong though, so don't hesitate to correct me on this, please.
What I did up to this point is, to link it using gcc. That produces a mess of a machine code that I'm unable to follow though, even for a small program like swapping rax with rbx, which isn't that great for learning purposes. (Please note that the program works.)
I'm not sure if it's relevant, but these are the commands that I'm using to compile and link:
# compilation
nasm -f elf64 swap.asm
# gcc
gcc -o swap swap.o
# ld, no c runtime
ld -s -o swap swap.o
Thank you in advance!
Conclusion:
Now that I have a proper answer to the question, here are a few things that I would like to mention. Linking glibc dynamically can be done like in Z boson's answer (for 64 bit systems). If you would like to do it statically, do follow this link (that I'm re-posting from Z boson's answer).
Here's an article that Jester posted, about how programs start in linux.
To see what gcc does to link your .o-s, try this command out: gcc -v -o swap swap.o. Note that 'v' stands for 'verbose'.
Also, you should read this if you are interested in 64 bit assembly.
Thank you for your answers and helpful insight! End of speech.
Here is an example which uses libc without using GCC.
extern printf
extern _exit
section .data
hello: db 'Hello world!',10
section .text
global _start
_start:
xor eax, eax
mov edi, hello
call printf
mov rax, 0
jmp _exit
Compile and link like this:
nasm -f elf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
This has worked fine so far for me but for static linkage it's complicated.
If you want to call simple library functions like atoi, but still avoid using the C runtime, you can do that. (i.e. you write _start, rather than just writing a main that gets called after a bunch of boiler-plate code runs.)
gcc -o swap -nostartfiles swap.o
As people say in comments, some parts of glibc depend on constructors/destructors run from the standard startup files. Probably this is the case for stdio (puts/printf/scanf/getchar), and maybe malloc. A lot of functions are "pure" functions that just process the input they're given, though. sprintf/sscanf might be ok to use.
For example:
$ cat >exit64.asm <<EOF
section .text
extern exit
global _start
_start:
xor edi, edi
jmp exit ; doesn't return, so optimize like a tail-call
;; or make the syscall directly, if the jmp is commented
mov eax, 231 ; exit(0)
syscall
; movl eax, 1 ; 32bit call
; int 0x80
EOF
$ yasm -felf64 exit64.asm && gcc -nostartfiles exit64.o -o exit64-dynamic
$ nm exit64-dynamic
0000000000601020 D __bss_start
0000000000600ec0 d _DYNAMIC
0000000000601020 D _edata
0000000000601020 D _end
U exit##GLIBC_2.2.5
0000000000601000 d _GLOBAL_OFFSET_TABLE_
00000000004002d0 T _start
$ ltrace ./exit64-dynamic
enable_breakpoint pid=11334, addr=0x1, symbol=(null): Input/output error
exit(0 <no return ...>
+++ exited (status 0) +++
$ strace ... # shows the usual system calls by the runtime dynamic linker
i'm new to assembly language, I finished writing a simple program so i ran the follow commends
nasm -o learn.bin learn.asm
to assemble the code then
chmod +x learn.bin
and then finally to run it
./learn.bin
but the last returned an error
bash: ./learn.bin: cannot execute binary file
im running ubuntu with an atom intel CPU
any help would be awesome,
Thanks in advance
The error message sounds like you don't have a proper ELF executable header on it. It IS possible to assemble a file using Nasm's -f bin output format (the default, if you don't specify an output format). But it needs an ELF header stuffed into it.
The usual way would be nasm -f elf32 learn.asm (or perhaps -f elf64 if you've got 64-bit code). This "should" produce "learn.o", if all goes well. Then you've got to link this "linkable object" file using ld -o learn learn.o (add -melf-i386 if you're using 64-bit ld... which you probably are). Or, depending on the code, gcc -o learn learn.o (add -m32 for 64-bit gcc). I see that Jester has just told you that (in fewer words).
Here's an example of a file that "should" work the way you're trying to do it:
[map all hkhw.map] ; optional
;==========================
bits 32
ORIGIN equ 8048000h
org ORIGIN
section .text
code_offset equ 0
code_addr:
;--------------------------- ELF header----------------------
dd $464c457f,$00010101,0,0,$00030002,1,main,$34,0,0,$00200034,2,0
dd 1,code_offset,code_addr,code_addr,code_filez,code_memsz,5,4096
dd 1,data_offset,data_addr,data_addr,data_filez,data_memsz,6,4096
main:
;--------- your code goes here -------------------------------
push byte 4
pop eax
xor ebx, ebx
mov ecx, msg
push byte msg_len
pop edx
int 80h
push byte 1
pop eax
int 80h
;------------ constant data -----------------------
; (note that we're in .text, not .rdata)
align 4
;-------------------------------------------------------------
align 4
code_memsz equ $ - $$
code_filez equ code_memsz
data_addr equ (ORIGIN+code_memsz+4095)/4096*4096 + (code_filez % 4096)
data_offset equ code_filez
section .data vstart=data_addr
;------------ initialized data -------------
msg db "Hello from Nasm, all by itself!", 10
msg_len equ $ - msg
;---------------------------------------------------------------------------
idat_memsz equ $ - $$
bss_addr equ data_addr + ($ - $$)
section .bss vstart=bss_addr
;------------- uninitialized data ----------------------
;-------------------------------------------------
udat_memsz equ $ - $$
data_memsz equ idat_memsz + udat_memsz
data_filez equ idat_memsz
;========================
Well... that didn't format well. Probably unreadable. Try Nasm Forum. We can help you more if you post the code
You can't (normally) run plain binary files under linux. You'll have to create an ELF executable by first asking nasm to produce an object file and then using a linker. Note that your code should also of course be written for linux. There are plenty of examples on the internet, see this tutorial for example.
I'm running on Ubuntu 12.10 64bit.
I am trying to debug a simple assembly program in GDB. However GDB's gui mode (-tui) seems unable to find the source code of my assembly file. I've rebuilt the project in the currently directory and searched google to no avail, please help me out here.
My commands:
nasm -f elf64 -g -F dwarf hello.asm
gcc -g hello.o -o hello
gdb -tui hello
Debug information seems to be loaded, I can set a breakpoint at main() but the top half the screen still says '[ No Source Available ]'.
Here is hello.asm if you're interested:
; hello.asm a first program for nasm for Linux, Intel, gcc
;
; assemble: nasm -f elf -l hello.lst hello.asm
; link: gcc -o hello hello.o
; run: hello
; output is: Hello World
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=cr
len: equ $-msg ; "$" means "here"
; len is a value, not an address
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov edx,len ; arg3, length of string to print
mov ecx,msg ; arg2, pointer to string
mov ebx,1 ; arg1, where to write, screen
mov eax,4 ; write command to int 80 hex
int 0x80 ; interrupt 80 hex, call kernel
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
This statement is false.
The assembler does produce line number information (note the -g -F dwarf) bits.
On the other hand he assembles what is obviously 32-bit code as 64 bits, which may or may not work.
Now if there are bugs in NASM's debugging output we need to know that.
A couple of quick experiments shows that addr2line (but not gdb!) does decode NASM-generated line number information correctly using stabs but not using dwarf, so there is probably something wrong in the way NASM generates DWARF... but also something odd with gdb.
GNU addr2line version 2.22.52.0.1-10.fc17 20120131, GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)).
The problem in this case is that the assembler isn't producing line-number information for the debugger. So although the source is there (if you do "list" in gdb, it shows a listing of the source file - at least when I follow your steps, it does), but the debugger needs line-number information from the file to know what line corresponds to what address. It can't do that with the information given.
As far as I can find, there isn't a way to get NASM to issue the .loc directive that is used by as when using gcc for example. But as isn't able to take your source file without generating a gazillion errors [even with -msyntax=intel -mmnemonic=intel -- you would think that should work].
So unless someone more clever can come up with a way to generate the .loc entries which gives the debugger line number information, I'm not entirely sure how we can answer your question in a way that you'll be happy with.
Is there a way to execute a flat binary image in Linux, using a syntax something like:
nasm -f bin -o foo.bin foo.asm
runbinary foo.bin
The Linux kernel can load several different binary formats - ELF is just the most common, though the a.out format is also pretty well known.
The supported binary formats are controlled by which binfmt modules are loaded or compiled in to the kernel (they're under the Filesystem section of the kernel config). There's a binfmt_flat for uClinux BFLT flat format binaries which are pretty minimal - they can even be zlib compressed which will let you make your binary even smaller, so this could be a good choice.
It doesn't look like nasm natively supports this format, but it's pretty easy to add the necessary header manually as Jim Lewis describes for ELF. There's a description of the format here.
Is there some reason you don't want to use "-f elf" instead of "-f bin"?
I think Linux won't run a binary that's not in ELF format. I can't find a tool that converts flat binaries to ELF, but you can cheat by putting the ELF information in foo.asm,
using the technique described here :
We can look at the ELF
specification, and
/usr/include/linux/elf.h, and
executables created by the standard
tools, to figure out what our empty
ELF executable should look like. But,
if you're the impatient type, you can
just use the one I've supplied here:
BITS 32
org 0x08048000
ehdr: ; Elf32_Ehdr
db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
times 8 db 0
dw 2 ; e_type
dw 3 ; e_machine
dd 1 ; e_version
dd _start ; e_entry
dd phdr - $$ ; e_phoff
dd 0 ; e_shoff
dd 0 ; e_flags
dw ehdrsize ; e_ehsize
dw phdrsize ; e_phentsize
dw 1 ; e_phnum
dw 0 ; e_shentsize
dw 0 ; e_shnum
dw 0 ; e_shstrndx
ehdrsize equ $ - ehdr
phdr: ; Elf32_Phdr
dd 1 ; p_type
dd 0 ; p_offset
dd $$ ; p_vaddr
dd $$ ; p_paddr
dd filesize ; p_filesz
dd filesize ; p_memsz
dd 5 ; p_flags
dd 0x1000 ; p_align
phdrsize equ $ - phdr
_start:
; your program here
filesize equ $ - $$
This image contains an ELF header,
identifying the file as an Intel 386
executable, with no section header
table and a program header table
containing one entry. Said entry
instructs the program loader to load
the entire file into memory (it's
normal behavior for a program to
include its ELF header and program
header table in its memory image)
starting at memory address 0x08048000
(which is the default address for
executables to load), and to begin
executing the code at _start, which
appears immediately after the program
header table. No .data segment, no
.bss segment, no commentary — nothing
but the bare necessities.
So, let's add in our little program:
; tiny.asm
org 0x08048000
;
; (as above)
;
_start: mov bl, 42 xor eax, eax inc eax int 0x80 filesize equ $ - $$
and try it out:
$ nasm -f bin -o a.out tiny.asm
$ chmod +x a.out
$ ./a.out ; echo $?
42
Minimally, Linux will need to figure out the format of the executable and it will get that from the first bytes. For example, if it's a script that will be #!, shebang. If it's ELF that will be 0x7F 'E' 'L' 'F'. Those magic numbers will determine the handler from a lookup.
So you're gonna need a header with a recognized magic number. You can get a list of shebang supported formats in /proc/sys/fs/binfmt_misc. Getting a list of native binary formats is (unfortunately) a little trickier.
bFLT may be a good choice. Indeed, it's a popular embedded executable format. But you can also squeeze ELF down quite far. This article got an ELF executable down to 45 bytes. That said, you'd be squeezing it down mostly by hand rather than by tool.