Why is yasm generating incorrect debugging information? - linux

I have an x86_64 assembly program I'm trying to debug on Linux, but when I try to use gdb, it skips around randomly and loops through the same couple instructions or repeats instructions. It also seems to loop through different instructions depending on where I set a breakpoint.
I tried researching this problem online, and I saw a number of people having this same issue with C++ when compilers were optimizing too aggressively and generating incorrect debugging information. I didn't see anything about assembly, but I believe yasm might be the problem here as well.
Here's my Makefile.
myprog : myprog.o
gcc -static -fdwarf2-cfi-asm myprog.o -o myprog
myprog.o : myprog.asm
yasm -f elf64 -g dwarf2 myprog.asm -o myprog.o
Note that I'm statically linking because I can't get dynamic linking to work. I might ask a separate question about that in the future.
And here's more or less what the gdb session looks like.
...
(gdb)# n
65 call findrepl
(gdb)# n
73 mov rdi, str3
(gdb)# n
75 call findrepl
(gdb)# n
75 call findrepl
(gdb)# n
65 call findrepl
...
Using s, si, or ni all do the same as n shown above. I haven't had this issue with my previous assembly programs. Sometimes gdb will say that I've executed something like xor eax, eax but then show the output from a call I made to printf.
I'm relatively new to assembly programming and gdb, so in the back of my mind I wonder if this is my fault. Is there a way to fix this? I would also like to know if there are any workarounds, as I can't seem to debug it without using gdb.

After much trial and error, I discovered the '-tui' option for gdb and the 'layout asm' command. There's probably a better way to do this, but here's basically what you type in to make it work:
[user#comp ~/prog/]$ gdb -tui myprog
(gdb)# layout asm
(gdb)# break main
(gdb)# run
And then you can go wild. By default, it's disassembled into AT&T notation. You can check which notation it's currently displaying with show disassembly-flavor. You can change it with set disassembly-flavor intel or set disassembly-flavor att.
For more information, run help tui, help layout, or help set disassembly-flavor in gdb.

Related

How to debug assembly?

I have the following assembly file test which I want to debug,
How can I do that?
Note I am working with x86-64 and att syntax, plus I don't have access to c code.
I want to stop after each line and being able to see the registers in a table (I remember there is such an option).
I tried:
gdb test
r
but I get:
Starting program:
No executable file specified.
Use the "file" or "exec-file" command.
After running GDB on the executable1:
Use start or starti to set a breakpoint in main or _start respectively and run the program.
Or set breakpoints yourself with b 12 to set a breakpoint on source line 12 (if you built with enough debug info for this to work), or b *0x00401007 to set a breakpoint on an address you copy/pasted from disas output.
layout asm / layout reg puts GDB into text-UI mode with "windows" in your terminal for disassembly and registers. (This can be a bit flaky, you sometimes need control-L to redraw the screen, and sometimes GDB crashes when your process exits although I'm not sure if that's specifically from TUI.)
Otherwise without TUI mode, info reg and disas can be useful.
See the bottom of https://stackoverflow.com/tags/x86/info for more asm debugging tips.
Especially strace ./test is highly useful to see the system calls your program makes, decoded into C style. In toy programs you're playing with for your own experimentation, this basically works as an alternative to checking for error return values.
Footnote 1: You're not doing that part correctly:
No executable file specified.
That means no file called test existed in the directory where you ran gdb test.
You have to assemble + link test.S into an executable called test before you can run GDB on that file. If ls -l test shows it, then gdb test can debug it. (And ./test can run it.)
Often gcc -no-pie foo.S is a good choice to make debugging easier: addresses will be fixed at link time, so objdump -drwC -Mintel test output will match the addresses you see at run-time. And the addresses will be numerically smaller, so it's easier to visually spot a code (.text) address vs. .rodata (modern ld puts it in a separate page so it can avoid exec permission) vs. .data / .bss.
Either way, stack addresses are still easy to distinguish from code either way, 0x555... or 0x0000...XXXXXX is in the executable, 0x7fffff... is in the stack, other addresses from mmap are randomized. (But libc also gets mapped to a high address near the stack, with or without PIE.)
(Or if you're writing _start instead of main, gcc -nostdlib -static foo.S implies -no-pie)

How do you assemble, link and run a .s file in linux?

I'm getting a weird error message when trying to assemble and run a .s file using AT&T Intel Syntax. Not sure if I'm even using the correct architecture to begin with, or if I'm having syntax errors, if I'm not using the correct commands to assemble and link, etc. Completely lost and I do not know where to begin.
So basically, I have a file called yea.s , which contains some simple assembler instructions. I then try to compile it using the command as yea.s -o yea.o and then link is using ld yea.o -o yea. When running ld, I get this weird message:ld: warning: cannot find entry symbol _start; defaulting to 000000440000.
This is the program im trying to run, very simple and doesn't really do anything.
resMsg: .asciz "xxxxxxxx"
.text
.global main
main:
pushq $0
ret
I just cannot figure out what's going on. Obviously, this is for school homework. I'm not looking for the answer to the homework, obviously, but this is the starting point to where I can actually start the coding. And I just cant figure out how to simple run the program, which it doesn't say in the assignment. Anyway, thanks in advance guys!
Linux executables require an entry point to be specified. The entry point is the address of the first instruction to be executed in your program. If not specified otherwise, the link editor looks for a symbol named _start to use as an entry point. Your program does not contain such a symbol, thus the linker complains and picks the beginning of the .text section as the entry point. To fix this problem, rename main to _start.
Note further that unlike on DOS, there is nothing to return to from _start. So your attempt to return is going to cause a crash. Instead, call the system call sys_exit to exit the program:
mov $0, %edi # exit status
mov $60, %eax # system call number
syscall # perform exit call
Alternatively, if you want to use the C runtime environment and call functions from the C library, leave your program as is and instead assemble and link using the C compiler driver cc:
cc -o yea yea.s
If you do so, the C runtime environment provides the entry point for you and eventually tries to call a function main which is where your code comes in. This approach is required if you want to call functions from the C library. If you do it this way, make sure that main follows the SysV ABI (calling convention).
Note that even then your code is incorrect. The return value of a function is given in the eax (resp. rax) register and not pushed on the stack. To return zero from main, write
mov $0, %eax # exit status
ret # return from function
In all currently supported versions of Ubuntu open the terminal and type:
sudo apt install as31 nasm
as31: Intel 8031/8051 assembler
This is a fast, simple, easy to use Intel 8031/8051 assembler.
nasm: General-purpose x86 assembler
Netwide Assembler. NASM will currently output flat-form binary files, a.out, COFF and ELF Unix object files, and Microsoft 16-bit DOS and Win32 object files.
If you are using NASM in Ubuntu 18.04, the commands to compile and run an .asm file named example.asm are:
nasm -f elf64 example.asm # assemble the program
ld -s -o example example.o # link the object file nasm produced into an executable file
./example # example is an executable file

GDB Debugger Error

I am very sorry, if my English is bad. This problem is getting me for days.
I have a simple C source code with a sub function which I am examining. First I am creating the .out file with gcc. This file I am examining with GDB. But if I want to disassemble the called function I always get an error message from gdb.
Prolog:
unix#unix-laptop:~/booksrc $ gcc -g stack_example.c
unix#unix-laptop:~/booksrc $ gdb -q ./a.out
Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1".
(gdb) disass main
Dump of assembler code for function main:
0x08048357 <main+0>: push %ebp
0x08048358 <main+1>: mov %esp,%ebp
0x0804835a <main+3>: sub $0x18,%esp
0x0804835d <main+6>: and $0xfffffff0,%esp
0x08048360 <main+9>: mov $0x0,%eax
0x08048365 <main+14>: sub %eax,%esp
0x08048367 <main+16>: movl $0x4,0xc(%esp)
0x0804836f <main+24>: movl $0x3,0x8(%esp)
0x08048377 <main+32>: movl $0x2,0x4(%esp)
0x0804837f <main+40>: movl $0x1,(%esp)
0x08048386 <main+47>: call 0x8048344 <test_function>
0x0804838b <main+52>: leave
0x0804838c <main+53>: ret
End of assembler dump.
(gdb) disass test_function()
You can't do that without a process to debug.
(gdb)
Do you have an idea for the reason of the error? I have already used google but I can't find anything to solve the problem. I also searched for the instructions to be sure that the syntax is right.
http://visualgdb.com/gdbreference/commands/disassemble
Thanks for reading,
Intersect!
The syntax (of the gdb command) is disass function-name so you should type
disass test_function
Read the genuine GDB documentation.
But you typed wrongly disass test_function() ; then ending parenthesis are wrong.
Be sure that you compiled your source code with gcc -Wall -g
At last, you could ask gcc to output an assembler file. Try for instance to compile your source.c file with
gcc -O1 -S -fverbose-asm source.c
(you could omit the -O1 or replace it with -g if you wanted to)
then look with an editor (or some pager) into the generated source.s assembly file.
Maybe the function isn't there because it was inlined during compilation. I had never seen your error message before, sorry.
Please try to compile with the following additional flags:
-O0 -g
You can also see all function start addresses with:
objdump -x <filename>
This gives you a list of symbols in your executable file which includes all the start points of functions.
You can also disassemble your code with:
objdump -d <filename>

Is it possible to assemble and run raw CPU instructions using `as`?

There are a couple of related questions here.
Consider a program consisting only of the following two instructions
movq 1, %rax
cpuid
If I throw this into a file called Foo.asm, and run as Foo.asm, where as is the portable GNU assembler, I will get a file called a.out, of size 665 bytes on my system.
If I then chmod 700 a.out and try ./a.out, I will get an error saying cannot execute binary file.
Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Why can the binary not be executed? I am providing valid instructions, so I would expect the CPU to be able to execute them.
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
Once I have the answer to 3, how can I get my processor to execute them? (Assuming that I am not running privileged instructions.)
Why is the file so large, if I am merely trying to translate two asm instructions into binary?
Because the assembler creates a relocatable object file which includes additional information, like memory Sections and Symbol tables.
Why can the binary not be executed?
Because it is an (relocatable) object file, not a loadable file. You need to link it in order to make it executable so that it can be loaded by the operating system:
$ ld -o Foo a.out
You also need to give the linker a hint about where your program starts, by specifying the _start symbol.
But then, still, the Foo executable is larger than you might expect since it still contains additional information (e.g. the elf header) required by the operating system to actually launch the program.
Also, if you launch the executable now, it will result in a segmentation fault, since you are loading the contents of address 1, which is not mapped into your address space, into rax. Still, if you fix this, the program will run into undefined code at the end - you need to make sure to gracefully exit the program through a syscall.
A minimal running example (assumed x86_64 architecture) would look like
.globl _start
_start:
movq $1, %rax
cpuid
mov $60, %rax # System-call "sys_exit"
mov $0, %rdi # exit code 0
syscall
How can I get exactly the binary opcodes for the asm instructions in my input file, instead of a bunch of extra stuff?
You can use objcopy to generate a raw binary image from an object file:
$ objcopy -O binary a.out Foo.bin
Then, Foo.bin will only contain the instruction opcodes.
nasm has a -f bin option which creates a binary-only representation of your assembly code. I used this to implement a bare boot loader for VirtualBox (warning: undocumented, protoype only!) to directly launch binary code inside a VirtualBox image without operating system.
Once I have the answer to 3, how can I get my processor to execute them?
You will not be able to directly execute the raw binary file under Linux. You will need to write your own loader for that or not use an operating system at all. For an example, see my bare boot loader link above - this writes the opcodes into the boot loader of a VirtualBox disc image, so that the instructions are getting executed when launching the VirtualBox machine.
The old MS-DOS COM file format does not include a header. It really only contains the binary executable code. The code size can, however, not exceed 64kb. I don't know whether Linux can execute these.
You can write the opcodes into a file using a hexeditor. Then you just need to surround it with an elf header that Linux knows how to execute it.
Here's an example:
hexedit myfile.bin
Now just write your opcodes inside the file using the hexeditor.
After that you need to add the elf header. You could do this by hand and write the elf header into your .bin file, but that a bit tricky. Easiest method is to use a few commands (In this example for 64 bit).
objcopy --input-target=binary --output-target=elf64-x86-64 myfile.bin myfile.o
ld -o myfile myfile.o -T binary.ld
You will also need a linker script. I called this for example binary.ld.
And that are the contents of binary.ld:
ENTRY(_start);
SECTIONS
{
_start = 0x0;
}
Now you can execute your program: ./myfile
Perhaps there's something like exe2bin utility for the GNU tool set. I've used various versions of exe2bin with Microsoft tools, and the ARM toolkit has the ability to produce binaries, but I don't recall if it was directly from the linked output or something like exe2bin.

Linux (64-bit), nasm and gdb

I was searching other threads without luck.
My problem is perhaps simple but frustrating.
I'm compiling two files on 64-bit Ubuntu 11.04:
nasm -f elf64 -g file64.asm
gcc -g -o file file.c file64.o
Then I debug the resulting executables with gdb.
With C, everything is OK.
However, when debugging assembly, the source code is "not visible" to the debugger. I'm getting the following output:
(gdb) step
Single stepping until exit from function line,
which has no line number information.
0x0000000000400962 in convert ()
A quick investigation with:
objdump --source file64.o
shows that the assembly source code (and line information) is contained in the file.
Why can't I see it in a debug session? What am I doing wrong?
These problems arose after moving to 64-bit Ubuntu. In the 32-bit Linux it worked (as it should).
With NASM, I've had much better experience in gdb when using the dwarf debugging format. gdb then treats the assembly source as if it were any other language (i.e., no disassemble commands necessary)
nasm -f elf64 -g -F dwarf file64.asm
(Versions 2.03.01 and later automatically enable -g if -F is specified.)
I'm using NASM version 2.10.07. I'm not sure if that makes a difference or not.
GDB is a source-level (or symbolic) debugger, which means that it's supposed to work with 'high-level programming languages' ... which is not you're case!
But wait a second, because, from a debugger's point of view, debugging ASM programs is way easier than higher level languages: there's almost nothing to do! The program binary always contains the assembly instruction, there're just written in their machine format, instead of ascii format.
And GDB has the ability to convert it for you. Instead of executing list to see the code, use disassemble to see a function code:
(gdb) disassemble <your symbol>
Dump of assembler code for function <your symbol>:
0x000000000040051e <+0>: push %rbp
0x000000000040051f <+1>: mov %rsp,%rbp
=> 0x0000000000400522 <+4>: mov 0x20042f(%rip),%rax
0x0000000000400529 <+11>: mov %rax,%rdx
0x000000000040052c <+14>: mov $0x400678,%eax
0x0000000000400531 <+19>: mov %rdx,%rcx
or x/5i $pc to see 5 i nstruction after your $pc
(gdb) x/5i $pc
=> 0x400522 <main+4>: mov 0x20042f(%rip),%rax
0x400529 <main+11>: mov %rax,%rdx
0x40052c <main+14>: mov $0x400678,%eax
0x400531 <main+19>: mov %rdx,%rcx
0x400534 <main+22>: mov $0xc,%edx
then use stepi (si) instread of step and nexti (ni) instead of next.
display $pc could also be useful to print the current pc whenever the inferior stops (ie, after each nexti/stepi.
For anyone else stuck with the broken things on NASM (the bug is not fixed so far): just download the NASM git repository and switch to version 2.7, which is probably the last version that works fine, i.e. supports gdb. Building from source this outdated version is only a workaround (you don't have support for the last ISA for example), but it's sufficient for most students.
GDB might not know where to search for your source files. Try to explicitly tell it with directory.

Resources