I'm learning assembly with NASM for a class I have in college. I would like to link the C Runtime Library with ld, but I just can't seem to wrap my head around it. I have a 64 bit machine with Linux Mint installed.
The reason I'm confused is that -- to my knowledge -- instead of linking the C runtime, gcc copies the things that you need into your program. I might be wrong though, so don't hesitate to correct me on this, please.
What I did up to this point is, to link it using gcc. That produces a mess of a machine code that I'm unable to follow though, even for a small program like swapping rax with rbx, which isn't that great for learning purposes. (Please note that the program works.)
I'm not sure if it's relevant, but these are the commands that I'm using to compile and link:
# compilation
nasm -f elf64 swap.asm
# gcc
gcc -o swap swap.o
# ld, no c runtime
ld -s -o swap swap.o
Thank you in advance!
Conclusion:
Now that I have a proper answer to the question, here are a few things that I would like to mention. Linking glibc dynamically can be done like in Z boson's answer (for 64 bit systems). If you would like to do it statically, do follow this link (that I'm re-posting from Z boson's answer).
Here's an article that Jester posted, about how programs start in linux.
To see what gcc does to link your .o-s, try this command out: gcc -v -o swap swap.o. Note that 'v' stands for 'verbose'.
Also, you should read this if you are interested in 64 bit assembly.
Thank you for your answers and helpful insight! End of speech.
Here is an example which uses libc without using GCC.
extern printf
extern _exit
section .data
hello: db 'Hello world!',10
section .text
global _start
_start:
xor eax, eax
mov edi, hello
call printf
mov rax, 0
jmp _exit
Compile and link like this:
nasm -f elf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
This has worked fine so far for me but for static linkage it's complicated.
If you want to call simple library functions like atoi, but still avoid using the C runtime, you can do that. (i.e. you write _start, rather than just writing a main that gets called after a bunch of boiler-plate code runs.)
gcc -o swap -nostartfiles swap.o
As people say in comments, some parts of glibc depend on constructors/destructors run from the standard startup files. Probably this is the case for stdio (puts/printf/scanf/getchar), and maybe malloc. A lot of functions are "pure" functions that just process the input they're given, though. sprintf/sscanf might be ok to use.
For example:
$ cat >exit64.asm <<EOF
section .text
extern exit
global _start
_start:
xor edi, edi
jmp exit ; doesn't return, so optimize like a tail-call
;; or make the syscall directly, if the jmp is commented
mov eax, 231 ; exit(0)
syscall
; movl eax, 1 ; 32bit call
; int 0x80
EOF
$ yasm -felf64 exit64.asm && gcc -nostartfiles exit64.o -o exit64-dynamic
$ nm exit64-dynamic
0000000000601020 D __bss_start
0000000000600ec0 d _DYNAMIC
0000000000601020 D _edata
0000000000601020 D _end
U exit##GLIBC_2.2.5
0000000000601000 d _GLOBAL_OFFSET_TABLE_
00000000004002d0 T _start
$ ltrace ./exit64-dynamic
enable_breakpoint pid=11334, addr=0x1, symbol=(null): Input/output error
exit(0 <no return ...>
+++ exited (status 0) +++
$ strace ... # shows the usual system calls by the runtime dynamic linker
Related
I have a shellcode file.
Then use ndisasm to build the assembly code.
ndisasm -b 64 shellcode > shellcode.asm
cat shellcode.asm | cut -c29->key.asm
I add 2 lines to the key.asm file
global_start:
_start:
$vi key.asm
global_start:
_start:
xor eax,eax
push rax
push qword 0x79237771
push qword 0x76772427
push qword 0x25747320
. . .
. . .
. . .
push qword 0x20757577
push rsp
pop rsi
mov edi,esi
mov edx,edi
cld
mov ecx,0x80
mov ebx,0x41
xor eax,eax
push rax
lodsb
xor eax,ebx
An then I assemble and link it to 64 bit executables
$nasm -f elf64 -g -F stabs key.asm
$ld -o key key.o
It gives me a warning
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
I tested it out with gcc
gcc -o key key.o
I still get an error almost the same as the first one
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o: In
function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
And when I run ./key with gdb after I use $ld NOT $gcc
$gdb -q ./key
$run
I get a seg fault
Starting program: /mnt/c/Users/owner/Documents/U
M/Computer_Security/ExtraCredit/key
Program received signal SIGSEGV, Segmentation fault.
0x000000000040013a in global_start ()
If I debug after run with gcc then the file will not be found because of exit status
Can you explain why does it happen? And how can I fix this problem? Thanks
It gives me a warning
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080
This isn't actually a problem, as long as you're fine with the entry point being the start of the text segment (i.e. the first instruction in your shellcode).
You got the error because you left out the space between the global keyword and the _start symbol name. i.e. use global _start, or don't bother. What you did defined a label called global_start, as you can see from your later error message.
You segfault on lodsb because you truncated a stack address with mov edi,esi instead of mov rdi, rsi. And if you fix that then you fall off the end of your code into garbage instructions because you don't make an exit system call. You're already running this inside gdb, use it!
I tested it out with gcc: gcc -o key key.o
I still get an error almost the same as the first one
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crt1.o:
In function `_start':
(.text+0x20): undefined reference to `main'
No, that's a totally different error. If you had exported _start correctly, you would have gotten an error for conflicting definitions of _start (between your code and the CRT start files).
This error is that the _start definition in crt1.o (provided by gcc) has a reference to main, but your code doesn't provide a main. This is what happens when you try to compile a C or C++ program that doesn't define main.
To link with gcc, use -nostdlib to omit the CRT start files and all other libraries. (i.e. link pretty much exactly like you were doing manually with ld.)
gcc -nostdlib -static key.o -o key # static executable: just your code
Or dynamically linked without the CRT start files, using your _start.
gcc -nostdinc -no-pie key.o -o key
You can call libc functions from code linked that way, but only on Linux or other platforms where dynamic linking takes care of running libc initialization functions.
If you statically link libc, you can only call functions like printf if you first call all the libc init functions that the normal CRT startup code does. (Not going into detail here because this code doesn't use libc)
Your code is wrong. Between global and _start must have a space. That is one of your problems.
section .text
global _start
_start:
xor eax,eax
push rax
...
In addition, to get why the segmentation fault is happening, you have to debug it. You could look the assembly instruction that does the segfault.
x/5i $eip
I've been following this tutorial for an intro to assembly on Linux.
section .text
global _start ;must be declared for linker (ld)
_start:
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptior
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x080 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;the string
len equ $ - msg ;length of the string
I've then had problems compiling it. I've looked around and found (on SO) that I should compile it like this:
nasm -f elf64 hello.asm
gcc -o hello hello.o
But I keep getting this error from GCC:
hello.o: In function `_start':
hello.asm:(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
(NB: I'm running Debian Linux on a 64 bit Intel i7)
If you are going to learn assembly, then you are much better served learning to use the assembler nasm and the linker ld without relying on gcc. There is nothing wrong with using gcc, but it masks part of the linking process that you need to understand going forward.
Learning assembly in the current environment (generally building on x86_64 but using examples that are written in x86 32-bit assembler), you must learn to build for the proper target and the language (syscall) differences between the two. Your code example is 32-bit assembler. As such your nasm compile string is incorrect:
nasm -f elf64 hello.asm
The -f elf64 attempts to compile a 64-bit object file, but the instructions in your code are 32-bit instructions. (It won't work)
Understanding and using ld provides a better understanding of the differences. Rather than using gcc, you can use nasm and ld to accomplish the same thing. For example (with slight modification to the code):
msg db 0xa, 'Hello, StackOverflow!', 0xa, 0xa ;the string
You compile and build with:
nasm -f elf -o hello-stack_32.o hello-stack_32.asm
ld -m elf_i386 -o hello-stack_32 hello-stack_32.o
Note the use of -f elf for 32-bit code in the nasm call and the -m elf_i386 linker option to create a compatible executable.
output:
Hello, StackOverflow!
If you are serious about learning assembler, there are a number of good references on the web. One of the best is The Art of Assembly. (it is written primarily for 8086 and x86, but the foundation it provides is invaluable). In addition, looking at the executables you create in binary can be helpful. Take a look at Binary Vi (BVI). It is a good tool.
bvi screenshot
You should add -nostdlib when linking your binary.
gcc -o hello hello.o -nostdlib
I'm interested in building a static ELF program without (g)libc, using unistd.h provided by the Linux headers.
I've read through these articles/question which give a rough idea of what I'm trying to do, but not quite:
http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html
Compiling without libc
https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free
I have basic code which depends only on unistd.h, of which, my understanding is that each of those functions are provided by the kernel, and that libc should not be needed. Here's the path I've taken that seems the most promising:
$ gcc -I /usr/include/asm/ -nostdlib grabbytes.c -o grabbytesstatic
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144
/tmp/ccn1mSkn.o: In function `main':
grabbytes.c:(.text+0x38): undefined reference to `open'
grabbytes.c:(.text+0x64): undefined reference to `lseek'
grabbytes.c:(.text+0x8f): undefined reference to `lseek'
grabbytes.c:(.text+0xaa): undefined reference to `read'
grabbytes.c:(.text+0xc5): undefined reference to `write'
grabbytes.c:(.text+0xe0): undefined reference to `read'
collect2: error: ld returned 1 exit status
Before this, I had to manually define SEEK_END and SEEK_SET according to the values found in the kernel headers. Else it would error saying that those were not defined, which makes sense.
I imagine that I need to link into an unstripped vmlinux to provide the symbols to utilize. However, I read through the symbols and while there were plenty of llseeks, they were not llseek verbatim.
So my question can go in a few directions:
How can I specify an ELF file to utilize symbols from? And I'm guessing if/how that's possible, the symbols won't match up. If this is correct, is there an existing header file which will redefine llseek and default_llseek or whatever is exactly in the kernel?
Is there a better way to write Posix code in C without a libc?
My goal is to write or port fairly standard C code using (perhaps solely) unistd.h and invoke it without libc. I'm probably okay without a few unistd functions, and am not sure which ones exist "purely" as kernel calls or not. I love assembly, but that's not my goal here. Hoping to stay as strictly C as possible (I'm fine with a few external assembly files if I have to), to allow for a libc-less static system at some point.
Thank you for reading!
If you're looking to write POSIX code in C, the abandonment of libc is not going to be helpful. Although you could implement a syscall function in assembler, and copy structures and defines from the kernel header, you would essentially be writing your own libc, which almost certainly would not be POSIX compliant. With all the great libc implementations out there, there's almost no reason to begin implementing your own.
dietlibc and musl libc are both frugal libc implementations which yield impressively small binaries The linker is generally smart; as long as a library is written to avoid the accidentally pulling in numerous dependencies, only the functions you use will actually be linked into your program.
Here is a simple hello world program:
#include<unistd.h>
int main(){
char str[] = "Hello, World!\n";
write(1, str, sizeof str - 1);
return 0;
}
Compiling it with musl below yeilds a binary of a less than 3K
$ musl-gcc -Os -static hello.c
$ strip a.out
$ wc -c a.out
2800 a.out
dietlibc produces an even smaller binary, less than 1.5K:
$ diet -Os gcc hello.c
$ strip a.out
$ wc -c a.out
1360 a.out
This is far from ideal, but a little bit of (x86_64) assembler has me down to just under 5KB (but most of that is "other things than code" - the actual code is under 1KB [771 bytes to be precise], but the file size is much larger, I think because the code size is rounded to 4KB, and then some header/footer/extra stuff is added to that]
Here's what I did:
gcc -g -static -nostdlib -o glibc start.s glibc.c -Os -lc
glibc.c contains:
#include <unistd.h>
int main()
{
const char str[] = "Hello, World!\n";
write(1, str, sizeof(str));
_exit(0);
}
start.s contains:
.globl _start
_start:
xor %ebp, %ebp
mov %rdx, %r9
mov %rsp, %rdx
and $~16, %rsp
push $0
push %rsp
call main
hlt
.globl _exit
_exit:
// We known %RDI already has the exit code...
mov $0x3c, %eax
syscall
hlt
That main point of this is not to show that it's not the system call part of glibc that takes up a lot of space, but the "prepar things" - and beware that if you were to call for example printf, possibly even (v)sprintf, or exit(), or any other "standard library" function, you are in the land of "nobody knows what will happen".
Edit: Updated "start.s" to put argc/argv in the right places:
_start:
xor %ebp, %ebp
mov %rdx, %r9
pop %rdi
mov %rsp, %rsi
and $~16, %rsp
push %rax
push %rsp
// %rdi = argc, %rsi=argv
call main
Note that I've changed which register contains what thing, so that it matches main - I had them slightly wrong order in the previous code.
I'm running on Ubuntu 12.10 64bit.
I am trying to debug a simple assembly program in GDB. However GDB's gui mode (-tui) seems unable to find the source code of my assembly file. I've rebuilt the project in the currently directory and searched google to no avail, please help me out here.
My commands:
nasm -f elf64 -g -F dwarf hello.asm
gcc -g hello.o -o hello
gdb -tui hello
Debug information seems to be loaded, I can set a breakpoint at main() but the top half the screen still says '[ No Source Available ]'.
Here is hello.asm if you're interested:
; hello.asm a first program for nasm for Linux, Intel, gcc
;
; assemble: nasm -f elf -l hello.lst hello.asm
; link: gcc -o hello hello.o
; run: hello
; output is: Hello World
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=cr
len: equ $-msg ; "$" means "here"
; len is a value, not an address
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov edx,len ; arg3, length of string to print
mov ecx,msg ; arg2, pointer to string
mov ebx,1 ; arg1, where to write, screen
mov eax,4 ; write command to int 80 hex
int 0x80 ; interrupt 80 hex, call kernel
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel
This statement is false.
The assembler does produce line number information (note the -g -F dwarf) bits.
On the other hand he assembles what is obviously 32-bit code as 64 bits, which may or may not work.
Now if there are bugs in NASM's debugging output we need to know that.
A couple of quick experiments shows that addr2line (but not gdb!) does decode NASM-generated line number information correctly using stabs but not using dwarf, so there is probably something wrong in the way NASM generates DWARF... but also something odd with gdb.
GNU addr2line version 2.22.52.0.1-10.fc17 20120131, GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)).
The problem in this case is that the assembler isn't producing line-number information for the debugger. So although the source is there (if you do "list" in gdb, it shows a listing of the source file - at least when I follow your steps, it does), but the debugger needs line-number information from the file to know what line corresponds to what address. It can't do that with the information given.
As far as I can find, there isn't a way to get NASM to issue the .loc directive that is used by as when using gcc for example. But as isn't able to take your source file without generating a gazillion errors [even with -msyntax=intel -mmnemonic=intel -- you would think that should work].
So unless someone more clever can come up with a way to generate the .loc entries which gives the debugger line number information, I'm not entirely sure how we can answer your question in a way that you'll be happy with.
I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I was following some tutorials and I couldn't find anything specific to Linux.
So, my question is, what is a good package to compile/run assembler and what are the command line commands to compile/run for that package?
The GNU assembler is probably already installed on your system. Try man as to see full usage information. You can use as to compile individual files and ld to link if you really, really want to.
However, GCC makes a great front-end. It can assemble .s files for you. For example:
$ cat >hello.s <<"EOF"
.section .rodata # read-only static data
.globl hello
hello:
.string "Hello, world!" # zero-terminated C string
.text
.global main
main:
push %rbp
mov %rsp, %rbp # create a stack frame
mov $hello, %edi # put the address of hello into RDI
call puts # as the first arg for puts
mov $0, %eax # return value = 0. Normally xor %eax,%eax
leave # tear down the stack frame
ret # pop the return address off the stack into RIP
EOF
$ gcc hello.s -no-pie -o hello
$ ./hello
Hello, world!
The code above is x86-64. If you want to make a position-independent executable (PIE), you'd need lea hello(%rip), %rdi, and call puts#plt.
A non-PIE executable (position-dependent) can use 32-bit absolute addressing for static data, but a PIE should use RIP-relative LEA. (See also Difference between movq and movabsq in x86-64 neither movq nor movabsq are a good choice.)
If you wanted to write 32-bit code, the calling convention is different, and RIP-relative addressing isn't available. (So you'd push $hello before the call, and pop the stack args after.)
You can also compile C/C++ code directly to assembly if you're curious how something works:
$ cat >hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
EOF
$ gcc -S hello.c -o hello.s
See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output, and writing useful small functions that will compile to interesting output.
The GNU assembler (gas) and NASM are both good choices. However, they have some differences, the big one being the order you put operations and their operands.
gas uses AT&T syntax (guide: https://stackoverflow.com/tags/att/info):
mnemonic source, destination
nasm uses Intel style (guide: https://stackoverflow.com/tags/intel-syntax/info):
mnemonic destination, source
Either one will probably do what you need. GAS also has an Intel-syntax mode, which is a lot like MASM, not NASM.
Try out this tutorial: http://asm.sourceforge.net/intro/Assembly-Intro.html
See also more links to guides and docs in Stack Overflow's x86 tag wiki
If you are using NASM, the command-line is just
nasm -felf32 -g -Fdwarf file.asm -o file.o
where 'file.asm' is your assembly file (code) and 'file.o' is an object file you can link with gcc -m32 or ld -melf_i386. (Assembling with nasm -felf64 will make a 64-bit object file, but the hello world example below uses 32-bit system calls, and won't work in a PIE executable.)
Here is some more info:
http://www.nasm.us/doc/nasmdoc2.html#section-2.1
You can install NASM in Ubuntu with the following command:
apt-get install nasm
Here is a basic Hello World in Linux assembly to whet your appetite:
http://web.archive.org/web/20120822144129/http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html
I hope this is what you were asking...
There is also FASM for Linux.
format ELF executable
segment readable executable
start:
mov eax, 4
mov ebx, 1
mov ecx, hello_msg
mov edx, hello_size
int 80h
mov eax, 1
mov ebx, 0
int 80h
segment readable writeable
hello_msg db "Hello World!",10,0
hello_size = $-hello_msg
It comiles with
fasm hello.asm hello
My suggestion would be to get the book Programming From Ground Up:
http://nongnu.askapache.com/pgubook/ProgrammingGroundUp-1-0-booksize.pdf
That is a very good starting point for getting into assembler programming under linux and it explains a lot of the basics you need to understand to get started.
The assembler(GNU) is as(1)
3 syntax (nasm, tasm, gas ) in 1 assembler, yasm.
http://www.tortall.net/projects/yasm/
For Ubuntu 18.04 installnasm . Open the terminal and type:
sudo apt install as31 nasm
nasm docs
For compiling and running:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file