Compile/run assembler in Linux?

Compile/run assembler in Linux? - linux

I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I was following some tutorials and I couldn't find anything specific to Linux.
So, my question is, what is a good package to compile/run assembler and what are the command line commands to compile/run for that package?

The GNU assembler is probably already installed on your system. Try man as to see full usage information. You can use as to compile individual files and ld to link if you really, really want to.
However, GCC makes a great front-end. It can assemble .s files for you. For example:
$ cat >hello.s <<"EOF"
.section .rodata # read-only static data
.globl hello
hello:
.string "Hello, world!" # zero-terminated C string
.text
.global main
main:
push %rbp
mov %rsp, %rbp # create a stack frame
mov $hello, %edi # put the address of hello into RDI
call puts # as the first arg for puts
mov $0, %eax # return value = 0. Normally xor %eax,%eax
leave # tear down the stack frame
ret # pop the return address off the stack into RIP
EOF
$ gcc hello.s -no-pie -o hello
$ ./hello
Hello, world!
The code above is x86-64. If you want to make a position-independent executable (PIE), you'd need lea hello(%rip), %rdi, and call puts#plt.
A non-PIE executable (position-dependent) can use 32-bit absolute addressing for static data, but a PIE should use RIP-relative LEA. (See also Difference between movq and movabsq in x86-64 neither movq nor movabsq are a good choice.)
If you wanted to write 32-bit code, the calling convention is different, and RIP-relative addressing isn't available. (So you'd push $hello before the call, and pop the stack args after.)
You can also compile C/C++ code directly to assembly if you're curious how something works:
$ cat >hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
EOF
$ gcc -S hello.c -o hello.s
See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output, and writing useful small functions that will compile to interesting output.

The GNU assembler (gas) and NASM are both good choices. However, they have some differences, the big one being the order you put operations and their operands.
gas uses AT&T syntax (guide: https://stackoverflow.com/tags/att/info):
mnemonic source, destination
nasm uses Intel style (guide: https://stackoverflow.com/tags/intel-syntax/info):
mnemonic destination, source
Either one will probably do what you need. GAS also has an Intel-syntax mode, which is a lot like MASM, not NASM.
Try out this tutorial: http://asm.sourceforge.net/intro/Assembly-Intro.html
See also more links to guides and docs in Stack Overflow's x86 tag wiki

If you are using NASM, the command-line is just
nasm -felf32 -g -Fdwarf file.asm -o file.o
where 'file.asm' is your assembly file (code) and 'file.o' is an object file you can link with gcc -m32 or ld -melf_i386. (Assembling with nasm -felf64 will make a 64-bit object file, but the hello world example below uses 32-bit system calls, and won't work in a PIE executable.)
Here is some more info:
http://www.nasm.us/doc/nasmdoc2.html#section-2.1
You can install NASM in Ubuntu with the following command:
apt-get install nasm
Here is a basic Hello World in Linux assembly to whet your appetite:
http://web.archive.org/web/20120822144129/http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html
I hope this is what you were asking...

There is also FASM for Linux.
format ELF executable
segment readable executable
start:
mov eax, 4
mov ebx, 1
mov ecx, hello_msg
mov edx, hello_size
int 80h
mov eax, 1
mov ebx, 0
int 80h
segment readable writeable
hello_msg db "Hello World!",10,0
hello_size = $-hello_msg
It comiles with
fasm hello.asm hello

My suggestion would be to get the book Programming From Ground Up:
http://nongnu.askapache.com/pgubook/ProgrammingGroundUp-1-0-booksize.pdf
That is a very good starting point for getting into assembler programming under linux and it explains a lot of the basics you need to understand to get started.

The assembler(GNU) is as(1)

3 syntax (nasm, tasm, gas ) in 1 assembler, yasm.
http://www.tortall.net/projects/yasm/

For Ubuntu 18.04 installnasm . Open the terminal and type:
sudo apt install as31 nasm
nasm docs
For compiling and running:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

Related

How to link the C Runtime Library with 'ld'?

I'm learning assembly with NASM for a class I have in college. I would like to link the C Runtime Library with ld, but I just can't seem to wrap my head around it. I have a 64 bit machine with Linux Mint installed.
The reason I'm confused is that -- to my knowledge -- instead of linking the C runtime, gcc copies the things that you need into your program. I might be wrong though, so don't hesitate to correct me on this, please.
What I did up to this point is, to link it using gcc. That produces a mess of a machine code that I'm unable to follow though, even for a small program like swapping rax with rbx, which isn't that great for learning purposes. (Please note that the program works.)
I'm not sure if it's relevant, but these are the commands that I'm using to compile and link:
# compilation
nasm -f elf64 swap.asm
# gcc
gcc -o swap swap.o
# ld, no c runtime
ld -s -o swap swap.o
Thank you in advance!
Conclusion:
Now that I have a proper answer to the question, here are a few things that I would like to mention. Linking glibc dynamically can be done like in Z boson's answer (for 64 bit systems). If you would like to do it statically, do follow this link (that I'm re-posting from Z boson's answer).
Here's an article that Jester posted, about how programs start in linux.
To see what gcc does to link your .o-s, try this command out: gcc -v -o swap swap.o. Note that 'v' stands for 'verbose'.
Also, you should read this if you are interested in 64 bit assembly.
Thank you for your answers and helpful insight! End of speech.

Here is an example which uses libc without using GCC.
extern printf
extern _exit
section .data
hello: db 'Hello world!',10
section .text
global _start
_start:
xor eax, eax
mov edi, hello
call printf
mov rax, 0
jmp _exit
Compile and link like this:
nasm -f elf64 hello.asm
ld hello.o -dynamic-linker /lib64/ld-linux-x86-64.so.2 -lc -m elf_x86_64
This has worked fine so far for me but for static linkage it's complicated.

If you want to call simple library functions like atoi, but still avoid using the C runtime, you can do that. (i.e. you write _start, rather than just writing a main that gets called after a bunch of boiler-plate code runs.)
gcc -o swap -nostartfiles swap.o
As people say in comments, some parts of glibc depend on constructors/destructors run from the standard startup files. Probably this is the case for stdio (puts/printf/scanf/getchar), and maybe malloc. A lot of functions are "pure" functions that just process the input they're given, though. sprintf/sscanf might be ok to use.
For example:
$ cat >exit64.asm <<EOF
section .text
extern exit
global _start
_start:
xor edi, edi
jmp exit ; doesn't return, so optimize like a tail-call
;; or make the syscall directly, if the jmp is commented
mov eax, 231 ; exit(0)
syscall
; movl eax, 1 ; 32bit call
; int 0x80
EOF
$ yasm -felf64 exit64.asm && gcc -nostartfiles exit64.o -o exit64-dynamic
$ nm exit64-dynamic
0000000000601020 D __bss_start
0000000000600ec0 d _DYNAMIC
0000000000601020 D _edata
0000000000601020 D _end
U exit##GLIBC_2.2.5
0000000000601000 d _GLOBAL_OFFSET_TABLE_
00000000004002d0 T _start
$ ltrace ./exit64-dynamic
enable_breakpoint pid=11334, addr=0x1, symbol=(null): Input/output error
exit(0 <no return ...>
+++ exited (status 0) +++
$ strace ... # shows the usual system calls by the runtime dynamic linker

NASM Hello World on Linux: undefined reference to `main'

I've been following this tutorial for an intro to assembly on Linux.
section .text
global _start ;must be declared for linker (ld)
_start:
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptior
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x080 ;call kernel
section .data
msg db 'Hello, world!', 0xa ;the string
len equ $ - msg ;length of the string
I've then had problems compiling it. I've looked around and found (on SO) that I should compile it like this:
nasm -f elf64 hello.asm
gcc -o hello hello.o
But I keep getting this error from GCC:
hello.o: In function `_start':
hello.asm:(.text+0x0): multiple definition of `_start'
/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crt1.o:(.text+0x0): first defined here
/usr/lib/gcc/x86_64-linux-gnu/4.7/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status
(NB: I'm running Debian Linux on a 64 bit Intel i7)

If you are going to learn assembly, then you are much better served learning to use the assembler nasm and the linker ld without relying on gcc. There is nothing wrong with using gcc, but it masks part of the linking process that you need to understand going forward.
Learning assembly in the current environment (generally building on x86_64 but using examples that are written in x86 32-bit assembler), you must learn to build for the proper target and the language (syscall) differences between the two. Your code example is 32-bit assembler. As such your nasm compile string is incorrect:
nasm -f elf64 hello.asm
The -f elf64 attempts to compile a 64-bit object file, but the instructions in your code are 32-bit instructions. (It won't work)
Understanding and using ld provides a better understanding of the differences. Rather than using gcc, you can use nasm and ld to accomplish the same thing. For example (with slight modification to the code):
msg db 0xa, 'Hello, StackOverflow!', 0xa, 0xa ;the string
You compile and build with:
nasm -f elf -o hello-stack_32.o hello-stack_32.asm
ld -m elf_i386 -o hello-stack_32 hello-stack_32.o
Note the use of -f elf for 32-bit code in the nasm call and the -m elf_i386 linker option to create a compatible executable.
output:
Hello, StackOverflow!
If you are serious about learning assembler, there are a number of good references on the web. One of the best is The Art of Assembly. (it is written primarily for 8086 and x86, but the foundation it provides is invaluable). In addition, looking at the executables you create in binary can be helpful. Take a look at Binary Vi (BVI). It is a good tool.
bvi screenshot

You should add -nostdlib when linking your binary.
gcc -o hello hello.o -nostdlib

GDB complains No Source Available

I'm running on Ubuntu 12.10 64bit.
I am trying to debug a simple assembly program in GDB. However GDB's gui mode (-tui) seems unable to find the source code of my assembly file. I've rebuilt the project in the currently directory and searched google to no avail, please help me out here.
My commands:
nasm -f elf64 -g -F dwarf hello.asm
gcc -g hello.o -o hello
gdb -tui hello
Debug information seems to be loaded, I can set a breakpoint at main() but the top half the screen still says '[ No Source Available ]'.
Here is hello.asm if you're interested:
; hello.asm a first program for nasm for Linux, Intel, gcc
;
; assemble: nasm -f elf -l hello.lst hello.asm
; link: gcc -o hello hello.o
; run: hello
; output is: Hello World
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=cr
len: equ $-msg ; "$" means "here"
; len is a value, not an address
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov edx,len ; arg3, length of string to print
mov ecx,msg ; arg2, pointer to string
mov ebx,1 ; arg1, where to write, screen
mov eax,4 ; write command to int 80 hex
int 0x80 ; interrupt 80 hex, call kernel
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel

This statement is false.
The assembler does produce line number information (note the -g -F dwarf) bits.
On the other hand he assembles what is obviously 32-bit code as 64 bits, which may or may not work.
Now if there are bugs in NASM's debugging output we need to know that.
A couple of quick experiments shows that addr2line (but not gdb!) does decode NASM-generated line number information correctly using stabs but not using dwarf, so there is probably something wrong in the way NASM generates DWARF... but also something odd with gdb.
GNU addr2line version 2.22.52.0.1-10.fc17 20120131, GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)).

The problem in this case is that the assembler isn't producing line-number information for the debugger. So although the source is there (if you do "list" in gdb, it shows a listing of the source file - at least when I follow your steps, it does), but the debugger needs line-number information from the file to know what line corresponds to what address. It can't do that with the information given.
As far as I can find, there isn't a way to get NASM to issue the .loc directive that is used by as when using gcc for example. But as isn't able to take your source file without generating a gazillion errors [even with -msyntax=intel -mmnemonic=intel -- you would think that should work].
So unless someone more clever can come up with a way to generate the .loc entries which gives the debugger line number information, I'm not entirely sure how we can answer your question in a way that you'll be happy with.

Linux (64-bit), nasm and gdb

I was searching other threads without luck.
My problem is perhaps simple but frustrating.
I'm compiling two files on 64-bit Ubuntu 11.04:
nasm -f elf64 -g file64.asm
gcc -g -o file file.c file64.o
Then I debug the resulting executables with gdb.
With C, everything is OK.
However, when debugging assembly, the source code is "not visible" to the debugger. I'm getting the following output:
(gdb) step
Single stepping until exit from function line,
which has no line number information.
0x0000000000400962 in convert ()
A quick investigation with:
objdump --source file64.o
shows that the assembly source code (and line information) is contained in the file.
Why can't I see it in a debug session? What am I doing wrong?
These problems arose after moving to 64-bit Ubuntu. In the 32-bit Linux it worked (as it should).

With NASM, I've had much better experience in gdb when using the dwarf debugging format. gdb then treats the assembly source as if it were any other language (i.e., no disassemble commands necessary)
nasm -f elf64 -g -F dwarf file64.asm
(Versions 2.03.01 and later automatically enable -g if -F is specified.)
I'm using NASM version 2.10.07. I'm not sure if that makes a difference or not.

GDB is a source-level (or symbolic) debugger, which means that it's supposed to work with 'high-level programming languages' ... which is not you're case!
But wait a second, because, from a debugger's point of view, debugging ASM programs is way easier than higher level languages: there's almost nothing to do! The program binary always contains the assembly instruction, there're just written in their machine format, instead of ascii format.
And GDB has the ability to convert it for you. Instead of executing list to see the code, use disassemble to see a function code:
(gdb) disassemble <your symbol>
Dump of assembler code for function <your symbol>:
0x000000000040051e <+0>: push %rbp
0x000000000040051f <+1>: mov %rsp,%rbp
=> 0x0000000000400522 <+4>: mov 0x20042f(%rip),%rax
0x0000000000400529 <+11>: mov %rax,%rdx
0x000000000040052c <+14>: mov $0x400678,%eax
0x0000000000400531 <+19>: mov %rdx,%rcx
or x/5i $pc to see 5 i nstruction after your $pc
(gdb) x/5i $pc
=> 0x400522 <main+4>: mov 0x20042f(%rip),%rax
0x400529 <main+11>: mov %rax,%rdx
0x40052c <main+14>: mov $0x400678,%eax
0x400531 <main+19>: mov %rdx,%rcx
0x400534 <main+22>: mov $0xc,%edx
then use stepi (si) instread of step and nexti (ni) instead of next.
display $pc could also be useful to print the current pc whenever the inferior stops (ie, after each nexti/stepi.

For anyone else stuck with the broken things on NASM (the bug is not fixed so far): just download the NASM git repository and switch to version 2.7, which is probably the last version that works fine, i.e. supports gdb. Building from source this outdated version is only a workaround (you don't have support for the last ISA for example), but it's sufficient for most students.

GDB might not know where to search for your source files. Try to explicitly tell it with directory.

Error when trying to run .asm file on NASM on Ubuntu

I'm using ubuntu 64-bit and trying to run a .asm file on NASM. But it returns this error when I try to run the following code. What Iḿ trying to do is build an executable by compiling (or assembling) object file from the source
$ nasm -f elf hello.asm, and then after created the file hello.o is producing executable file itself from the object file by invoking linker
$ ld -s -o hello hello.o
This will finally build hello executable.
I'm following this tutorial http://www.faqs.org/docs/Linux-HOWTO/Assembly-HOWTO.html
Error:
i386 architecture of input file `hello.o' is incompatible with i386:x86-64 output
Code:
section .data ;section declaration
msg db "Hello, world!",0xa ;our dear string
len equ $ - msg ;length of our dear string
section .text ;section declaration
;we must export the entry point to the ELF linker or
global _start ;loader. They conventionally recognize _start as their
;entry point. Use ld -e foo to override the default.
_start:
;write our string to stdout
mov edx,len ;third argument: message length
mov ecx,msg ;second argument: pointer to message to write
mov ebx,1 ;first argument: file handle (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
;and exit
mov ebx,0 ;first syscall argument: exit code
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel

This looks like it may be a simple mismatch between what's produced by nasm and what ld is trying to make:
i386 architecture of input file 'hello.o' is incompatible with i386:x86-64 output
In other words, nasm has produced a 32-bit object file hello.o and ld wants to take that and make a 64-bit executable file.
The nasm -hf command should give you the available output formats:
valid output formats for -f are (`*' denotes default):
* bin flat-form binary files (e.g. DOS .COM, .SYS)
ith Intel hex
srec Motorola S-records
aout Linux a.out object files
aoutb NetBSD/FreeBSD a.out object files
coff COFF (i386) object files (e.g. DJGPP for DOS)
elf32 ELF32 (i386) object files (e.g. Linux)
elf ELF (short name for ELF32)
elf64 ELF64 (x86_64) object files (e.g. Linux)
as86 Linux as86 (bin86 version 0.3) object files
obj MS-DOS 16-bit/32-bit OMF object files
win32 Microsoft Win32 (i386) object files
win64 Microsoft Win64 (x86-64) object files
rdf Relocatable Dynamic Object File Format v2.0
ieee IEEE-695 (LADsoft variant) object file format
macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files
macho MACHO (short name for MACHO32)
macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files
dbg Trace of all info passed to output stage
I see that your linked tutorial asks you to run:
nasm -f elf hello.asm
Try using:
nasm -f elf64 hello.asm
instead, and you may find ld stops complaining about the input file.

You need to tell the linker to produce an i386 output file, since you're writing i386 assembly:
ld -m elf_i386 -s -o hello hello.o

How to compile, link, and run a nasm app on Ubuntu 64 bit.
Install nasm:
sudo apt-get install nasm
Save a file with filename hello.asm:
section .data
hello: db 'Hello world!',10 ; 'Hello world!' plus a linefeed character
helloLen: equ $-hello ; Length of the 'Hello world!' string
; (I'll explain soon)
section .text
global _start
_start:
mov eax,4 ; The system call for write (sys_write)
mov ebx,1 ; File descriptor 1 - standard output
mov ecx,hello ; Put the offset of hello in ecx
mov edx,helloLen ; helloLen is a constant, so we don't need to say
; mov edx,[helloLen] to get it's actual value
int 80h ; Call the kernel
mov eax,1 ; The system call for exit (sys_exit)
mov ebx,0 ; Exit with return code of 0 (no error)
int 80h
Compile it:
nasm -f elf64 hello.asm
Link it:
ld -s -o hello hello.o
Run it
el#apollo:~$ ./hello
Hello world!
It works! What now? Request that your favorite compiler generate the assembly code that it would have been normally passed on to be converted to machine code. Google search: "convert php/java/python/c++ program to assembly"
Perspective: With all the people today attempting to tear down and get rid of general purpose computing for the general public, it's imperative that we teach the new students the concepts of how to build a general purpose turing machine from core principles, on up through the bare metal, then finally assemblers and programming languages.
How does learning assembly aid in programming?
99% of computer programs out there are 10 to 100 times slower than they could optimized to be only because programmers don't know what delays are being forced on them by their favorite high level compiler or interpreter.
A thorough understanding of the full stack here means you can coerce your programs to have that coveted property of only taking nanoseconds to do the job at hand. Time == money. So this knowledge of how to shun anything that takes longer than a few nanoseconds to complete saves time, and therefore money.
https://softwareengineering.stackexchange.com/questions/156722/how-does-learning-assembly-aid-in-programming

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string