I'm currently trying to learn assembly on my Trisquel distribution (which I guess uses Ubuntu under the hood?). For some reason, I'm stuck on the very first step of creating and executing a assembly snippet.
.section data
.section text
.globl _start
_start:
movl $1, %eax # syscall for exiting a program
movl $0, %ebx # status code to be returned
int $0x80
When I try to assemble and link it for creating an executable and run the executable, I get something like:
> as myexit.s -o myexit.o && ld myexit.o -o myexit
> ./myexit
bash: ./myexit: cannot execute binary file
I'm not sure what exactly is going on here. After searching around, it seems that this error usually pops up when trying to execute 32 bit executable on a 64 bit OS or maybe vice-versa which isn't the case for me.
Here is the output of file and uname command:
$ file myexit
myexit: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$ uname -a
Linux user 2.6.35-28-generic #50trisquel2-Ubuntu SMP Tue May 3 00:54:52 UTC 2011 i686 GNU/Linux
Can someone help me out in understanding what exactly is going wrong here? Thanks.
.section text
is incorrect, that creates a section called text when you need your code to be in the .text section. Replace that with:
.data
.text
.globl _start
_start:
...
Related
I'm writing assembly language, program like this:
.data
.equ b,3
.text
.globl _start
_start:
movl $2,%ebx
movl $b,%ecx
movl $1,%eax
int $0x80
I compile it under ubuntu 64bit version. I wish to get a 32bit version, so under shell I can do:
$ as my.s -32
$ ld a.out -o my
OK, no problem. I wish to use scons to manage this process, so I have SConstruct:
Program('my.s')
This will first compile using 'as my.s -o my.o' and 'gcc my.o -o my', and report and error of redefinition of '_start'.
My problem is:
How can I pass '-32' option to make sure I compile out 32bit version object file?
How can I specify the linker to be 'ld' but not 'gcc', to make sure I can use '_start' as entry point in my assembly source file?
For passing flags to the assembler, ASFLAGS should work.
For passing flags to linker, LINKFLAGS should work
For setting which executable to use for linker, LINK (or SHLINK) should do the trick.
All these are listed in the manpage: http://scons.org/doc/production/HTML/scons-man.html
Likely the following should work for you:
env=Environment(tools=['as','gnulink'])
env['ASFLAGS'] = '-32'
env['LINK'] = 'ld'
env.Program('my',['my.s'])
I'm learning assembly with NASM for a class I have in college. I would like to link the C Runtime Library with ld, but I just can't seem to wrap my head around it. I have a 64 bit machine with Linux Mint installed.
The reason I'm confused is that -- to my knowledge -- instead of linking the C runtime, gcc copies the things that you need into your program. I might be wrong though, so don't hesitate to correct me on this, please.
What I did up to this point is, to link it using gcc. That produces a mess of a machine code that I'm unable to follow though, even for a small program like swapping rax with rbx, which isn't that great for learning purposes. (Please note that the program works.)
I'm not sure if it's relevant, but these are the commands that I'm using to compile and link:
# compilation
nasm -f elf64 swap.asm
# gcc
gcc -o swap swap.o
# ld, no c runtime
ld -s -o swap swap.o
Thank you in advance!
Conclusion:
Now that I have a proper answer to the question, here are a few things that I would like to mention. Linking glibc dynamically can be done like in Z boson's answer (for 64 bit systems). If you would like to do it statically, do follow this link (that I'm re-posting from Z boson's answer).
Here's an article that Jester posted, about how programs start in linux.
To see what gcc does to link your .o-s, try this command out: gcc -v -o swap swap.o. Note that 'v' stands for 'verbose'.
Also, you should read this if you are interested in 64 bit assembly.
Thank you for your answers and helpful insight! End of speech.
Here is an example which uses libc without using GCC.
extern printf
extern _exit
section .data
hello: db 'Hello world!',10
section .text
global _start
_start:
xor eax, eax
mov edi, hello
call printf
mov rax, 0
jmp _exit
Compile and link like this:
nasm -f elf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
This has worked fine so far for me but for static linkage it's complicated.
If you want to call simple library functions like atoi, but still avoid using the C runtime, you can do that. (i.e. you write _start, rather than just writing a main that gets called after a bunch of boiler-plate code runs.)
gcc -o swap -nostartfiles swap.o
As people say in comments, some parts of glibc depend on constructors/destructors run from the standard startup files. Probably this is the case for stdio (puts/printf/scanf/getchar), and maybe malloc. A lot of functions are "pure" functions that just process the input they're given, though. sprintf/sscanf might be ok to use.
For example:
$ cat >exit64.asm <<EOF
section .text
extern exit
global _start
_start:
xor edi, edi
jmp exit ; doesn't return, so optimize like a tail-call
;; or make the syscall directly, if the jmp is commented
mov eax, 231 ; exit(0)
syscall
; movl eax, 1 ; 32bit call
; int 0x80
EOF
$ yasm -felf64 exit64.asm && gcc -nostartfiles exit64.o -o exit64-dynamic
$ nm exit64-dynamic
0000000000601020 D __bss_start
0000000000600ec0 d _DYNAMIC
0000000000601020 D _edata
0000000000601020 D _end
U exit##GLIBC_2.2.5
0000000000601000 d _GLOBAL_OFFSET_TABLE_
00000000004002d0 T _start
$ ltrace ./exit64-dynamic
enable_breakpoint pid=11334, addr=0x1, symbol=(null): Input/output error
exit(0 <no return ...>
+++ exited (status 0) +++
$ strace ... # shows the usual system calls by the runtime dynamic linker
I'm using ubuntu 64-bit and trying to run a .asm file on NASM. But it returns this error when I try to run the following code. What Iḿ trying to do is build an executable by compiling (or assembling) object file from the source
$ nasm -f elf hello.asm, and then after created the file hello.o is producing executable file itself from the object file by invoking linker
$ ld -s -o hello hello.o
This will finally build hello executable.
I'm following this tutorial http://www.faqs.org/docs/Linux-HOWTO/Assembly-HOWTO.html
Error:
i386 architecture of input file `hello.o' is incompatible with i386:x86-64 output
Code:
section .data ;section declaration
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
section .text ;section declaration
;we must export the entry point to the ELF linker or
global _start ;loader. They conventionally recognize _start as their
;entry point. Use ld -e foo to override the default.
_start:
;write our string to stdout
mov edx,len ;third argument: message length
mov ecx,msg ;second argument: pointer to message to write
mov ebx,1 ;first argument: file handle (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
;and exit
mov ebx,0 ;first syscall argument: exit code
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
This looks like it may be a simple mismatch between what's produced by nasm and what ld is trying to make:
i386 architecture of input file 'hello.o' is incompatible with i386:x86-64 output
In other words, nasm has produced a 32-bit object file hello.o and ld wants to take that and make a 64-bit executable file.
The nasm -hf command should give you the available output formats:
valid output formats for -f are (`*' denotes default):
* bin flat-form binary files (e.g. DOS .COM, .SYS)
ith Intel hex
srec Motorola S-records
aout Linux a.out object files
aoutb NetBSD/FreeBSD a.out object files
coff COFF (i386) object files (e.g. DJGPP for DOS)
elf32 ELF32 (i386) object files (e.g. Linux)
elf ELF (short name for ELF32)
elf64 ELF64 (x86_64) object files (e.g. Linux)
as86 Linux as86 (bin86 version 0.3) object files
obj MS-DOS 16-bit/32-bit OMF object files
win32 Microsoft Win32 (i386) object files
win64 Microsoft Win64 (x86-64) object files
rdf Relocatable Dynamic Object File Format v2.0
ieee IEEE-695 (LADsoft variant) object file format
macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files
macho MACHO (short name for MACHO32)
macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files
dbg Trace of all info passed to output stage
I see that your linked tutorial asks you to run:
nasm -f elf hello.asm
Try using:
nasm -f elf64 hello.asm
instead, and you may find ld stops complaining about the input file.
You need to tell the linker to produce an i386 output file, since you're writing i386 assembly:
ld -m elf_i386 -s -o hello hello.o
How to compile, link, and run a nasm app on Ubuntu 64 bit.
Install nasm:
sudo apt-get install nasm
Save a file with filename hello.asm:
section .data
hello: db 'Hello world!',10 ; 'Hello world!' plus a linefeed character
helloLen: equ $-hello ; Length of the 'Hello world!' string
; (I'll explain soon)
section .text
global _start
_start:
mov eax,4 ; The system call for write (sys_write)
mov ebx,1 ; File descriptor 1 - standard output
mov ecx,hello ; Put the offset of hello in ecx
mov edx,helloLen ; helloLen is a constant, so we don't need to say
; mov edx,[helloLen] to get it's actual value
int 80h ; Call the kernel
mov eax,1 ; The system call for exit (sys_exit)
mov ebx,0 ; Exit with return code of 0 (no error)
int 80h
Compile it:
nasm -f elf64 hello.asm
Link it:
ld -s -o hello hello.o
Run it
el#apollo:~$ ./hello
Hello world!
It works! What now? Request that your favorite compiler generate the assembly code that it would have been normally passed on to be converted to machine code. Google search: "convert php/java/python/c++ program to assembly"
Perspective: With all the people today attempting to tear down and get rid of general purpose computing for the general public, it's imperative that we teach the new students the concepts of how to build a general purpose turing machine from core principles, on up through the bare metal, then finally assemblers and programming languages.
How does learning assembly aid in programming?
99% of computer programs out there are 10 to 100 times slower than they could optimized to be only because programmers don't know what delays are being forced on them by their favorite high level compiler or interpreter.
A thorough understanding of the full stack here means you can coerce your programs to have that coveted property of only taking nanoseconds to do the job at hand. Time == money. So this knowledge of how to shun anything that takes longer than a few nanoseconds to complete saves time, and therefore money.
https://softwareengineering.stackexchange.com/questions/156722/how-does-learning-assembly-aid-in-programming
I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I was following some tutorials and I couldn't find anything specific to Linux.
So, my question is, what is a good package to compile/run assembler and what are the command line commands to compile/run for that package?
The GNU assembler is probably already installed on your system. Try man as to see full usage information. You can use as to compile individual files and ld to link if you really, really want to.
However, GCC makes a great front-end. It can assemble .s files for you. For example:
$ cat >hello.s <<"EOF"
.section .rodata # read-only static data
.globl hello
hello:
.string "Hello, world!" # zero-terminated C string
.text
.global main
main:
push %rbp
mov %rsp, %rbp # create a stack frame
mov $hello, %edi # put the address of hello into RDI
call puts # as the first arg for puts
mov $0, %eax # return value = 0. Normally xor %eax,%eax
leave # tear down the stack frame
ret # pop the return address off the stack into RIP
EOF
$ gcc hello.s -no-pie -o hello
$ ./hello
Hello, world!
The code above is x86-64. If you want to make a position-independent executable (PIE), you'd need lea hello(%rip), %rdi, and call puts#plt.
A non-PIE executable (position-dependent) can use 32-bit absolute addressing for static data, but a PIE should use RIP-relative LEA. (See also Difference between movq and movabsq in x86-64 neither movq nor movabsq are a good choice.)
If you wanted to write 32-bit code, the calling convention is different, and RIP-relative addressing isn't available. (So you'd push $hello before the call, and pop the stack args after.)
You can also compile C/C++ code directly to assembly if you're curious how something works:
$ cat >hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
EOF
$ gcc -S hello.c -o hello.s
See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output, and writing useful small functions that will compile to interesting output.
The GNU assembler (gas) and NASM are both good choices. However, they have some differences, the big one being the order you put operations and their operands.
gas uses AT&T syntax (guide: https://stackoverflow.com/tags/att/info):
mnemonic source, destination
nasm uses Intel style (guide: https://stackoverflow.com/tags/intel-syntax/info):
mnemonic destination, source
Either one will probably do what you need. GAS also has an Intel-syntax mode, which is a lot like MASM, not NASM.
Try out this tutorial: http://asm.sourceforge.net/intro/Assembly-Intro.html
See also more links to guides and docs in Stack Overflow's x86 tag wiki
If you are using NASM, the command-line is just
nasm -felf32 -g -Fdwarf file.asm -o file.o
where 'file.asm' is your assembly file (code) and 'file.o' is an object file you can link with gcc -m32 or ld -melf_i386. (Assembling with nasm -felf64 will make a 64-bit object file, but the hello world example below uses 32-bit system calls, and won't work in a PIE executable.)
Here is some more info:
http://www.nasm.us/doc/nasmdoc2.html#section-2.1
You can install NASM in Ubuntu with the following command:
apt-get install nasm
Here is a basic Hello World in Linux assembly to whet your appetite:
http://web.archive.org/web/20120822144129/http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html
I hope this is what you were asking...
There is also FASM for Linux.
format ELF executable
segment readable executable
start:
mov eax, 4
mov ebx, 1
mov ecx, hello_msg
mov edx, hello_size
int 80h
mov eax, 1
mov ebx, 0
int 80h
segment readable writeable
hello_msg db "Hello World!",10,0
hello_size = $-hello_msg
It comiles with
fasm hello.asm hello
My suggestion would be to get the book Programming From Ground Up:
http://nongnu.askapache.com/pgubook/ProgrammingGroundUp-1-0-booksize.pdf
That is a very good starting point for getting into assembler programming under linux and it explains a lot of the basics you need to understand to get started.
The assembler(GNU) is as(1)
3 syntax (nasm, tasm, gas ) in 1 assembler, yasm.
http://www.tortall.net/projects/yasm/
For Ubuntu 18.04 installnasm . Open the terminal and type:
sudo apt install as31 nasm
nasm docs
For compiling and running:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file
I'm in an interesting problem.I forgot I'm using 64bit machine & OS and wrote a 32 bit assembly code. I don't know how to write 64 bit code.
This is the x86 32-bit assembly code for Gnu Assembler (AT&T syntax) on Linux.
//hello.S
#include <asm/unistd.h>
#include <syscall.h>
#define STDOUT 1
.data
hellostr:
.ascii "hello wolrd\n";
helloend:
.text
.globl _start
_start:
movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count);
movl $(STDOUT) , %ebx
movl $hellostr , %ecx
movl $(helloend-hellostr) , %edx
int $0x80
movl $(SYS_exit), %eax //void _exit(int status);
xorl %ebx, %ebx
int $0x80
ret
Now, This code should run fine on a 32bit processor & 32 bit OS right? As we know 64 bit processors are backward compatible with 32 bit processors. So, that also wouldn't be a problem. The problem arises because of differences in system calls & call mechanism in 64-bit OS & 32-bit OS. I don't know why but they changed the system call numbers between 32-bit linux & 64-bit linux.
asm/unistd_32.h defines:
#define __NR_write 4
#define __NR_exit 1
asm/unistd_64.h defines:
#define __NR_write 1
#define __NR_exit 60
Anyway using Macros instead of direct numbers is paid off. Its ensuring correct system call numbers.
when I assemble & link & run the program.
$cpp hello.S hello.s //pre-processor
$as hello.s -o hello.o //assemble
$ld hello.o // linker : converting relocatable to executable
Its not printing helloworld.
In gdb its showing:
Program exited with code 01.
I don't know how to debug in gdb. using tutorial I tried to debug it and execute instruction by instruction checking registers at each step. its always showing me "program exited with 01". It would be great if some on could show me how to debug this.
(gdb) break _start
Note: breakpoint -10 also set at pc 0x4000b0.
Breakpoint 8 at 0x4000b0
(gdb) start
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Temporary breakpoint 9 (main) pending.
Starting program: /home/claws/helloworld
Program exited with code 01.
(gdb) info breakpoints
Num Type Disp Enb Address What
8 breakpoint keep y 0x00000000004000b0 <_start>
9 breakpoint del y <PENDING> main
I tried running strace. This is its output:
execve("./helloworld", ["./helloworld"], [/* 39 vars */]) = 0
write(0, NULL, 12 <unfinished ... exit status 1>
Explain the parameters of write(0, NULL, 12) system call in the output of strace?
What exactly is happening? I want to know the reason why exactly its exiting with exitstatus=1?
Can some one please show me how to debug this program using gdb?
Why did they change the system call numbers?
Kindly change this program appropriately so that it can run correctly on this machine.
EDIT:
After reading Paul R's answer. I checked my files
claws#claws-desktop:~$ file ./hello.o
./hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
claws#claws-desktop:~$ file ./hello
./hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
I agree with him that these should be ELF 32-bit relocatable & executable. But that doesn't answer my my questions. All of my questions still questions. What exactly is happening in this case? Can someone please answer my questions and provide an x86-64 version of this code?
Remember that everything by default on a 64-bit OS tends to assume 64-bit. You need to make sure that you are (a) using the 32-bit versions of your #includes where appropriate (b) linking with 32-bit libraries and (c) building a 32-bit executable. It would probably help if you showed the contents of your makefile if you have one, or else the commands that you are using to build this example.
FWIW I changed your code slightly (_start -> main):
#include <asm/unistd.h>
#include <syscall.h>
#define STDOUT 1
.data
hellostr:
.ascii "hello wolrd\n" ;
helloend:
.text
.globl main
main:
movl $(SYS_write) , %eax //ssize_t write(int fd, const void *buf, size_t count);
movl $(STDOUT) , %ebx
movl $hellostr , %ecx
movl $(helloend-hellostr) , %edx
int $0x80
movl $(SYS_exit), %eax //void _exit(int status);
xorl %ebx, %ebx
int $0x80
ret
and built it like this:
$ gcc -Wall test.S -m32 -o test
verfied that we have a 32-bit executable:
$ file test
test: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), not stripped
and it appears to run OK:
$ ./test
hello wolrd
As noted by Paul, if you want to build 32-bit binaries on a 64-bit system, you need to use the -m32 flag, which may not be available by default on your installation (some 64-bit Linux distros don't include 32-bit compiler/linker/lib support by default).
On the other hand, you could instead build your code as 64-bit, in which case you need to use the 64-bit calling conventions. In that case, the system call number goes in %rax, and the arguments go in %rdi, %rsi, and %rdx
Edit
Best place I've found for this is www.x86-64.org, specifically abi.pdf
64-bit CPUs can run 32-bit code, but they have to use a special mode to do it. Those instructions are all valid in 64-bit mode, so nothing stopped you from building a 64-bit executable.
Your code builds and runs correctly with gcc -m32 -nostdlib hello.S. That's because -m32 defines __i386, so /usr/include/asm/unistd.h includes <asm/unistd_32.h>, which has the right constants for the int $0x80 ABI.
See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more about _start vs. main with/without libc and static vs. dynamic executables.
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, BuildID[sha1]=973fd6a0b7fa15b2d95420c7a96e454641c31b24, not stripped
$ strace ./a.out > /dev/null
execve("./a.out", ["./a.out"], 0x7ffd43582110 /* 64 vars */) = 0
strace: [ Process PID=2773 runs in 32 bit mode. ]
write(1, "hello wolrd\n", 12) = 12
exit(0) = ?
+++ exited with 0 +++
Technically, if you'd used the right call numbers, your code would happen to work from 64-bit mode as well: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? But int 0x80 is not recommended in 64-bit code. (Actually, it's never recommended. For efficiency, 32-bit code should call through the kernel's exported VDSO page so it can use sysenter for fast system calls on CPUs that support it).
But that doesn't answer my my questions. What exactly is happening in this case?
Good question.
On Linux, int $0x80 with eax=1 is sys_exit(ebx), regardless of what mode the calling process was in. The 32-bit ABI is available in 64-bit mode (unless your kernel was compiled without i386 ABI support), but don't use it. Your exit status is from movl $(STDOUT), %ebx.
(BTW, there's a STDOUT_FILENO macro defined in unistd.h, but you can't #include <unistd.h> from a .S because it also contains C prototypes which aren't valid asm syntax.)
Notice that __NR_exit from unistd_32.h and __NR_write from unistd_64.h are both 1, so your first int $0x80 exits your process. You're using the wrong system call numbers for the ABI you're invoking.
strace is decoding it incorrectly, as if you'd invoked syscall (because that's the ABI a 64-bit process is expected to use). What are the calling conventions for UNIX & Linux system calls on x86-64
eax=1 / syscall means write(rd=edi, buf=rsi, len=rdx), and this is how strace is incorrectly decoding your int $0x80.
rdi and rsi are 0 (aka NULL) on entry to _start, and your code sets rdx=12 with movl $(helloend-hellostr) , %edx.
Linux initializes registers to zero in a fresh process after execve. (The ABI says undefined, Linux chooses zero to avoid info leaks). In your statically-linked executable, _start is the first user-space code that runs. (In a dynamic executable, the dynamic linker runs before _start, and does leave garbage in registers).
See also the x86 tag wiki for more asm links.