How do you assemble, link and run a .s file in linux?

How do you assemble, link and run a .s file in linux? - linux

I'm getting a weird error message when trying to assemble and run a .s file using AT&T Intel Syntax. Not sure if I'm even using the correct architecture to begin with, or if I'm having syntax errors, if I'm not using the correct commands to assemble and link, etc. Completely lost and I do not know where to begin.
So basically, I have a file called yea.s , which contains some simple assembler instructions. I then try to compile it using the command as yea.s -o yea.o and then link is using ld yea.o -o yea. When running ld, I get this weird message:ld: warning: cannot find entry symbol _start; defaulting to 000000440000.
This is the program im trying to run, very simple and doesn't really do anything.
resMsg: .asciz "xxxxxxxx"
.text
.global main
main:
pushq $0
ret
I just cannot figure out what's going on. Obviously, this is for school homework. I'm not looking for the answer to the homework, obviously, but this is the starting point to where I can actually start the coding. And I just cant figure out how to simple run the program, which it doesn't say in the assignment. Anyway, thanks in advance guys!

Linux executables require an entry point to be specified. The entry point is the address of the first instruction to be executed in your program. If not specified otherwise, the link editor looks for a symbol named _start to use as an entry point. Your program does not contain such a symbol, thus the linker complains and picks the beginning of the .text section as the entry point. To fix this problem, rename main to _start.
Note further that unlike on DOS, there is nothing to return to from _start. So your attempt to return is going to cause a crash. Instead, call the system call sys_exit to exit the program:
mov $0, %edi # exit status
mov $60, %eax # system call number
syscall # perform exit call
Alternatively, if you want to use the C runtime environment and call functions from the C library, leave your program as is and instead assemble and link using the C compiler driver cc:
cc -o yea yea.s
If you do so, the C runtime environment provides the entry point for you and eventually tries to call a function main which is where your code comes in. This approach is required if you want to call functions from the C library. If you do it this way, make sure that main follows the SysV ABI (calling convention).
Note that even then your code is incorrect. The return value of a function is given in the eax (resp. rax) register and not pushed on the stack. To return zero from main, write
mov $0, %eax # exit status
ret # return from function

In all currently supported versions of Ubuntu open the terminal and type:
sudo apt install as31 nasm
as31: Intel 8031/8051 assembler
This is a fast, simple, easy to use Intel 8031/8051 assembler.
nasm: General-purpose x86 assembler
Netwide Assembler. NASM will currently output flat-form binary files, a.out, COFF and ELF Unix object files, and Microsoft 16-bit DOS and Win32 object files.
If you are using NASM in Ubuntu 18.04, the commands to compile and run an .asm file named example.asm are:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

Related

Assembly executable on Termux now produces Illegal instruction error [duplicate]

This question already has an answer here:
How to implement system call in ARM64?
(1 answer)
Closed 3 years ago.
Can you let me know what I'm doing wrong?
I'm new to assembly programming and am unfamiliar with the various options in ld.
I've been trying to use the yasm compiler initially but then realised that as is the way to go for the ARM architecture while composing GNU compliant assembly code.
Better luck running as from the binutils package, i.e. the GNU assembler. But the assembly code has to be ARM-compliant.
The following is the code within arm.s:
.text /* Start of the program code section */
.global main /* declares the main identifier */
.type main, %function
main: /* Address of the main function */
/* Program code would go here */
BR LR
/* Return to the caller */
.end /* End of the program */
The above was throwing an Illegal Instruction error. That can be fixed
by substituting ret for BR LR. This is new to ARM V8.
ARM, a RISC architecture, is not supported by YASM.
My build file is as follows:
#/usr/bin/env bash
#display usage
[ $# -eq 0 ] && { echo "Usage: $0 <File Name without extension> ";exit 1; }
set +e
rm -f $1.exe $1 $1.o
as -o $1.o $1.s
[ -e $1.o ] && { file $1.o;}
gcc -s -o $1.exe $1.o -fpic
ld -s -o $1 -pie --dynamic-linker /system/bin/linker64 /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o $1.o -lc -lgcc -ldl /data/data/com.termux/files/usr/lib/crtend_android.o
[ -e $1.exe ] && { file $1.exe;nohup ./$1.exe; }
[ -e $1 ] && { file $1;nohup ./$1;}
set -e
The code was causing either a segmentation fault or a bus error earlier.
I was able to run a program or two without any segmentation or bus errors with the updated build file above. I set up the build file to produce two executables, one using gcc and the other ld, since some online tutorials use ld instead of gcc for the linking step. Using the verbose setting of gcc, you can look at the options passed to the linker and thus mimic the same for the linker independently.
There may be some redundant settings that I've missed.
You can access updates to the source code and build file at
Learn Assembly.
Check out this resource from Keil here. arm Keil product guides
More resources:
https://thinkingeek.com/2016/10/08/exploring-aarch64-assembler-chapter1/
How to link a gas assembly program that uses the C standard library with ld without using gcc?
While the above problem appears to be fixed for now, I have errors running the following code:
.text
.global main
main:
mov w0, #2
mov w7, #1 // request to exit program
svc 0
I obtain an illegal instruction error when I try to execute the code.
Secondly, if I alter the main to _start (since I don't want to be using main all the time), I have the following error from the buildrun script.
./buildrun myprogram
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: myprogram.o: in function `_start': (.text+0x0): multiple definition of `_start'; /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o:crtbegin.c:(.text+0x0): first defined here /data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o: in function `_start_main': crtbegin.c:(.text+0x38): undefined reference to `main
/data/data/com.termux/files/usr/bin/aarch64-linux-android-ld: crtbegin.c:(.text+0x3c): undefined reference to `main' clang-8: error: linker command failed with exit co
de 1 (use -v to see invocation)
ld: myprogram.o: in function `_start': (.text+0x0): multiple definition of `_start'; /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o:crtbegin.c:(.text+0x0): first defined here ld: /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o: in function `_start_main': crtbegin.c:(.text+0x38): undefined reference to `main'
ld: crtbegin.c:(.text+0x3c): undefined reference to `main'
How do I create programs with entry points other than main?
I want to be able to :
Create a statically linked executable that works.
Create an executable that has a function named _start instead of main.
This file builds static executables that don't use main or call any library calls.
Create a dynamically linked executable with an entry point other than main.
My build file handles this, sort of, with the entry point as second parameter.
Create an executable that uses supervisor call svc to exit without throwing an illegal instruction error as against using ret.
I was able to call svc by setting the system call number in register X8 as against W7 in version 7 ARM. Additionally, ARM 64 has renumbered the system call numbers as per the following header file.
https://github.com/torvalds/linux/blob/v4.17/include/uapi/asm-generic/unistd.h
https://reverseengineering.stackexchange.com/q/16917
.data
.balign 8
labs: .asciz "Azeria Labs\n" //.asciz adds a null-byte to the end of the string .balign 8 after_labs: .set size_of_labs, after_labs - labs .balign 8 addr_of_labs: .dword labs .balign 8 .text
.global main
main:
mov x0, #1 //STDOUT ldr x1,addr_of_labs //memory address of labs mov w2, #size_of_labs //size of labs mov x8,#64 svc #0x0 // invoke syscall _exit: mov x8, #93 //exit syscall
svc #0x0 //invoke syscall
The above code was ported from the example code listed below.
https://azeria-labs.com/writing-arm-shellcode/
Compacting the data section into one instead of splitting it as in the example from the site mitigates the relocation errors while linking.
Other useful references:
https://thinkingeek.com/2013/01/09/arm-assembler-raspberry-pi-chapter-1/
*Check the comment by ehrt74 on the above post for the motivation to explore svc call further. *

Yasm is an x86 assembler. It cannot produce executables for an ARM processor.
The tutorials you are working with are describing x86 assembly. They are intended to be followed on an x86 system.

Is it possible to assemble and run raw CPU instructions using `as`?

There are a couple of related questions here.
Consider a program consisting only of the following two instructions
movq 1, %rax
cpuid
If I throw this into a file called Foo.asm, and run as Foo.asm, where as is the portable GNU assembler, I will get a file called a.out, of size 665 bytes on my system.
If I then chmod 700 a.out and try ./a.out, I will get an error saying cannot execute binary file.
Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Why can the binary not be executed? I am providing valid instructions, so I would expect the CPU to be able to execute them.
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
Once I have the answer to 3, how can I get my processor to execute them? (Assuming that I am not running privileged instructions.)

Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Because the assembler creates a relocatable object file which includes additional information, like memory Sections and Symbol tables.
Why can the binary not be executed?
Because it is an (relocatable) object file, not a loadable file. You need to link it in order to make it executable so that it can be loaded by the operating system:
$ ld -o Foo a.out
You also need to give the linker a hint about where your program starts, by specifying the _start symbol.
But then, still, the Foo executable is larger than you might expect since it still contains additional information (e.g. the elf header) required by the operating system to actually launch the program.
Also, if you launch the executable now, it will result in a segmentation fault, since you are loading the contents of address 1, which is not mapped into your address space, into rax. Still, if you fix this, the program will run into undefined code at the end - you need to make sure to gracefully exit the program through a syscall.
A minimal running example (assumed x86_64 architecture) would look like
.globl _start
_start:
movq $1, %rax
cpuid
mov $60, %rax # System-call "sys_exit"
mov $0, %rdi # exit code 0
syscall
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
You can use objcopy to generate a raw binary image from an object file:
$ objcopy -O binary a.out Foo.bin
Then, Foo.bin will only contain the instruction opcodes.
nasm has a -f bin option which creates a binary-only representation of your assembly code. I used this to implement a bare boot loader for VirtualBox (warning: undocumented, protoype only!) to directly launch binary code inside a VirtualBox image without operating system.
Once I have the answer to 3, how can I get my processor to execute them?
You will not be able to directly execute the raw binary file under Linux. You will need to write your own loader for that or not use an operating system at all. For an example, see my bare boot loader link above - this writes the opcodes into the boot loader of a VirtualBox disc image, so that the instructions are getting executed when launching the VirtualBox machine.

The old MS-DOS COM file format does not include a header. It really only contains the binary executable code. The code size can, however, not exceed 64kb. I don't know whether Linux can execute these.

You can write the opcodes into a file using a hexeditor. Then you just need to surround it with an elf header that Linux knows how to execute it.
Here's an example:
hexedit myfile.bin
Now just write your opcodes inside the file using the hexeditor.
After that you need to add the elf header. You could do this by hand and write the elf header into your .bin file, but that a bit tricky. Easiest method is to use a few commands (In this example for 64 bit).
objcopy --input-target=binary --output-target=elf64-x86-64 myfile.bin myfile.o
ld -o myfile myfile.o -T binary.ld
You will also need a linker script. I called this for example binary.ld.
And that are the contents of binary.ld:
ENTRY(_start);
SECTIONS
{
_start = 0x0;
}
Now you can execute your program: ./myfile

Perhaps there's something like exe2bin utility for the GNU tool set. I've used various versions of exe2bin with Microsoft tools, and the ARM toolkit has the ability to produce binaries, but I don't recall if it was directly from the linked output or something like exe2bin.

GDB complains No Source Available

I'm running on Ubuntu 12.10 64bit.
I am trying to debug a simple assembly program in GDB. However GDB's gui mode (-tui) seems unable to find the source code of my assembly file. I've rebuilt the project in the currently directory and searched google to no avail, please help me out here.
My commands:
nasm -f elf64 -g -F dwarf hello.asm
gcc -g hello.o -o hello
gdb -tui hello
Debug information seems to be loaded, I can set a breakpoint at main() but the top half the screen still says '[ No Source Available ]'.
Here is hello.asm if you're interested:
; hello.asm a first program for nasm for Linux, Intel, gcc
;
; assemble: nasm -f elf -l hello.lst hello.asm
; link: gcc -o hello hello.o
; run: hello
; output is: Hello World
SECTION .data ; data section
msg: db "Hello World",10 ; the string to print, 10=cr
len: equ $-msg ; "$" means "here"
; len is a value, not an address
SECTION .text ; code section
global main ; make label available to linker
main: ; standard gcc entry point
mov edx,len ; arg3, length of string to print
mov ecx,msg ; arg2, pointer to string
mov ebx,1 ; arg1, where to write, screen
mov eax,4 ; write command to int 80 hex
int 0x80 ; interrupt 80 hex, call kernel
mov ebx,0 ; exit code, 0=normal
mov eax,1 ; exit command to kernel
int 0x80 ; interrupt 80 hex, call kernel

This statement is false.
The assembler does produce line number information (note the -g -F dwarf) bits.
On the other hand he assembles what is obviously 32-bit code as 64 bits, which may or may not work.
Now if there are bugs in NASM's debugging output we need to know that.
A couple of quick experiments shows that addr2line (but not gdb!) does decode NASM-generated line number information correctly using stabs but not using dwarf, so there is probably something wrong in the way NASM generates DWARF... but also something odd with gdb.
GNU addr2line version 2.22.52.0.1-10.fc17 20120131, GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)).

The problem in this case is that the assembler isn't producing line-number information for the debugger. So although the source is there (if you do "list" in gdb, it shows a listing of the source file - at least when I follow your steps, it does), but the debugger needs line-number information from the file to know what line corresponds to what address. It can't do that with the information given.
As far as I can find, there isn't a way to get NASM to issue the .loc directive that is used by as when using gcc for example. But as isn't able to take your source file without generating a gazillion errors [even with -msyntax=intel -mmnemonic=intel -- you would think that should work].
So unless someone more clever can come up with a way to generate the .loc entries which gives the debugger line number information, I'm not entirely sure how we can answer your question in a way that you'll be happy with.

Linux (64-bit), nasm and gdb

I was searching other threads without luck.
My problem is perhaps simple but frustrating.
I'm compiling two files on 64-bit Ubuntu 11.04:
nasm -f elf64 -g file64.asm
gcc -g -o file file.c file64.o
Then I debug the resulting executables with gdb.
With C, everything is OK.
However, when debugging assembly, the source code is "not visible" to the debugger. I'm getting the following output:
(gdb) step
Single stepping until exit from function line,
which has no line number information.
0x0000000000400962 in convert ()
A quick investigation with:
objdump --source file64.o
shows that the assembly source code (and line information) is contained in the file.
Why can't I see it in a debug session? What am I doing wrong?
These problems arose after moving to 64-bit Ubuntu. In the 32-bit Linux it worked (as it should).

With NASM, I've had much better experience in gdb when using the dwarf debugging format. gdb then treats the assembly source as if it were any other language (i.e., no disassemble commands necessary)
nasm -f elf64 -g -F dwarf file64.asm
(Versions 2.03.01 and later automatically enable -g if -F is specified.)
I'm using NASM version 2.10.07. I'm not sure if that makes a difference or not.

GDB is a source-level (or symbolic) debugger, which means that it's supposed to work with 'high-level programming languages' ... which is not you're case!
But wait a second, because, from a debugger's point of view, debugging ASM programs is way easier than higher level languages: there's almost nothing to do! The program binary always contains the assembly instruction, there're just written in their machine format, instead of ascii format.
And GDB has the ability to convert it for you. Instead of executing list to see the code, use disassemble to see a function code:
(gdb) disassemble <your symbol>
Dump of assembler code for function <your symbol>:
0x000000000040051e <+0>: push %rbp
0x000000000040051f <+1>: mov %rsp,%rbp
=> 0x0000000000400522 <+4>: mov 0x20042f(%rip),%rax
0x0000000000400529 <+11>: mov %rax,%rdx
0x000000000040052c <+14>: mov $0x400678,%eax
0x0000000000400531 <+19>: mov %rdx,%rcx
or x/5i $pc to see 5 i nstruction after your $pc
(gdb) x/5i $pc
=> 0x400522 <main+4>: mov 0x20042f(%rip),%rax
0x400529 <main+11>: mov %rax,%rdx
0x40052c <main+14>: mov $0x400678,%eax
0x400531 <main+19>: mov %rdx,%rcx
0x400534 <main+22>: mov $0xc,%edx
then use stepi (si) instread of step and nexti (ni) instead of next.
display $pc could also be useful to print the current pc whenever the inferior stops (ie, after each nexti/stepi.

For anyone else stuck with the broken things on NASM (the bug is not fixed so far): just download the NASM git repository and switch to version 2.7, which is probably the last version that works fine, i.e. supports gdb. Building from source this outdated version is only a workaround (you don't have support for the last ISA for example), but it's sufficient for most students.

GDB might not know where to search for your source files. Try to explicitly tell it with directory.

Compile/run assembler in Linux?

I'm fairly new to Linux (Ubuntu 10.04) and a total novice to assembler. I was following some tutorials and I couldn't find anything specific to Linux.
So, my question is, what is a good package to compile/run assembler and what are the command line commands to compile/run for that package?

The GNU assembler is probably already installed on your system. Try man as to see full usage information. You can use as to compile individual files and ld to link if you really, really want to.
However, GCC makes a great front-end. It can assemble .s files for you. For example:
$ cat >hello.s <<"EOF"
.section .rodata # read-only static data
.globl hello
hello:
.string "Hello, world!" # zero-terminated C string
.text
.global main
main:
push %rbp
mov %rsp, %rbp # create a stack frame
mov $hello, %edi # put the address of hello into RDI
call puts # as the first arg for puts
mov $0, %eax # return value = 0. Normally xor %eax,%eax
leave # tear down the stack frame
ret # pop the return address off the stack into RIP
EOF
$ gcc hello.s -no-pie -o hello
$ ./hello
Hello, world!
The code above is x86-64. If you want to make a position-independent executable (PIE), you'd need lea hello(%rip), %rdi, and call puts#plt.
A non-PIE executable (position-dependent) can use 32-bit absolute addressing for static data, but a PIE should use RIP-relative LEA. (See also Difference between movq and movabsq in x86-64 neither movq nor movabsq are a good choice.)
If you wanted to write 32-bit code, the calling convention is different, and RIP-relative addressing isn't available. (So you'd push $hello before the call, and pop the stack args after.)
You can also compile C/C++ code directly to assembly if you're curious how something works:
$ cat >hello.c <<EOF
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
return 0;
}
EOF
$ gcc -S hello.c -o hello.s
See also How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output, and writing useful small functions that will compile to interesting output.

The GNU assembler (gas) and NASM are both good choices. However, they have some differences, the big one being the order you put operations and their operands.
gas uses AT&T syntax (guide: https://stackoverflow.com/tags/att/info):
mnemonic source, destination
nasm uses Intel style (guide: https://stackoverflow.com/tags/intel-syntax/info):
mnemonic destination, source
Either one will probably do what you need. GAS also has an Intel-syntax mode, which is a lot like MASM, not NASM.
Try out this tutorial: http://asm.sourceforge.net/intro/Assembly-Intro.html
See also more links to guides and docs in Stack Overflow's x86 tag wiki

If you are using NASM, the command-line is just
nasm -felf32 -g -Fdwarf file.asm -o file.o
where 'file.asm' is your assembly file (code) and 'file.o' is an object file you can link with gcc -m32 or ld -melf_i386. (Assembling with nasm -felf64 will make a 64-bit object file, but the hello world example below uses 32-bit system calls, and won't work in a PIE executable.)
Here is some more info:
http://www.nasm.us/doc/nasmdoc2.html#section-2.1
You can install NASM in Ubuntu with the following command:
apt-get install nasm
Here is a basic Hello World in Linux assembly to whet your appetite:
http://web.archive.org/web/20120822144129/http://www.cin.ufpe.br/~if817/arquivos/asmtut/index.html
I hope this is what you were asking...

There is also FASM for Linux.
format ELF executable
segment readable executable
start:
mov eax, 4
mov ebx, 1
mov ecx, hello_msg
mov edx, hello_size
int 80h
mov eax, 1
mov ebx, 0
int 80h
segment readable writeable
hello_msg db "Hello World!",10,0
hello_size = $-hello_msg
It comiles with
fasm hello.asm hello

My suggestion would be to get the book Programming From Ground Up:
http://nongnu.askapache.com/pgubook/ProgrammingGroundUp-1-0-booksize.pdf
That is a very good starting point for getting into assembler programming under linux and it explains a lot of the basics you need to understand to get started.

The assembler(GNU) is as(1)

3 syntax (nasm, tasm, gas ) in 1 assembler, yasm.
http://www.tortall.net/projects/yasm/

For Ubuntu 18.04 installnasm . Open the terminal and type:
sudo apt install as31 nasm
nasm docs
For compiling and running:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string