I'm learning AT&T assembly,I know arrays/variables can be declared using .int/.long, or using .equ to declare a symbol, that's to be replaced by assembly.
They're declared insided either .data section(initialzed),or .bss section(uninitialzed).
But when I used gcc to compiled a very simple .c file with '-S' command line option to check the disassembly code, I noticed that:
(1) .s is not using both .data and .bss, but only .data
(2) The declaration of an integer(.long) cost several statements, some of them seems redundant or useless to me.
Shown as below, I've added some comments as per my questions.
$ cat n.c
int i=23;
int j;
int main(){
return 0;
}
$ gcc -S n.c
$ cat n.s
.file "n.c"
.globl i
.data
.align 4
.type i, #object #declare i, I think it's useless
.size i, 4 #There's '.long 23', we know it's 4 bytes, why need this line?
i:
.long 23 #Only this line is needed, I think
.comm j,4,4 #Why j is not put inside .bss .section?
.text
.globl main
.type main, #function
main:
.LFB0: #What does this symbol mean, I don't find it useful.
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0: #What does this symbol mean, I don't find it useful.
.size main, .-main
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.2) 5.4.0 20160609"
.section .note.GNU-stack,"",#progbits
All my questions are in the comments above, I re-emphasize here again:
.type i, #object
.size i, 4
i:
.long 23
I really think above code is redundant, should be as simple as:
i:
.long 23
Also, "j" doesn't have a symbol tag, and is not put inside .bss section.
Did I get wrong with anything? Please help to correct. Thanks a lot.
(I am guessing you are using some Linux system)
They're declared insided either .data section(initialzed),or .bss section(uninitialzed).
No, you have many other sections, notably .comm (for the "common" section, with initialized data common to several object files, that the linker would "merge") and .rodata for read-only data. The ELF format is flexible enough to permit many sections and many segments (some of which are not loaded -more precisely memory mapped- in memory).
The description of sections in ELF files is much more complex than what you believe. Take time to read more, e.g. Linkers and loaders by Levine. Read also the documentation and the scripts of GNU binutils and also ld(1) & as(1). use objdump(1) and readelf(1) to explore existing ELF executables, object files and shared objects. Read also execve(2) & elf(5)
But when I used gcc to compiled a very simple .c file with -S command line option
When examining the assembler file generated by gcc I strongly recommend passing at least -fverbose-asm to ask gcc to emit some additional and useful comments in the assembler file. I also usually recommend to use some optimization flag -e.g. -O1 at least (or perhaps -Og on recent versions of gcc).
I noticed that:
(1) .s is not using both .data and .bss, but only .data
No, your generated code use the .comm section and put the value of j there.
(2) The declaration of an integer(.long) cost several statements, some of them seems redundant or useless to me.
These are mostly not assembler statements (translated into machine code) but assembler directives; they are very useful (and they don't waste space in the memory segment produced by ld, but the ELF format has information elsewhere). In particular .size and .type are both needed because the symbol tables in ELF files contain more than addresses (it also has a notion of size, and a very primitive notion of type).
The .LFB0 is a gcc (actually cc1-) generated label. GCC does not care about generating useless labels (it is simpler for the assembler generator in the GCC backend), since they don't appear in object files.
There's '.long 23', we know it's 4 bytes,
you might know that a long is 4 bytes, but that information (size of j) should go into the ELF file so requires explicit assembler directives....
(I don't have space or time to explain the ELF format, you need to read many pages about it, and it is a lot more complex and more complete than what you believe)
BTW, Drepper's How To Write Shared Libraries is quite long (more than 40 pages) and explains a good deal about ELF files, focusing on shared libraries.
Related
For some reason [msgLen] and msgLen produce the same result. I know that I ought to use equ or something, but I want to know why this isn't working. Is NASM ignoring the dereference? It prints my string and a bunch of junk bc it thinks the string is bigger than it is right? Thanks!
Please see the assembly below:
_start:
mov edx,[msgLen]
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
mov eax,1
int 0x80
section .data
msg db 'Zoom',0xA
msgLen db $-msg
Use a debugger to look at EDX after the load, or use strace ./a.out to see the length you actually pass to the system call. It will be different (address vs. value loaded). Both ways are buggy in different ways that happen to produce large values, so the effect only happens to be the same. Debugging / tracing tools would have clearly shown you that NASM isn't "ignoring the dereference", though.
mov edx, msgLen is the address, right next to msg. It will be something like 0x804a005 if you link into a standard Linux non-PIE static executable with ld -melf_i386 -o foo foo.o.
mov edx, [msgLen] loads 4 bytes from that address (the width of EDX), where you only assembled the size into 1 byte in the data section. The 3 high bytes come from whatever comes next in the part of the file that's mapped to that memory page.
When I assemble with just nasm -felf32 foo.asm and link with ld -melf_i386, that happens to be 3 bytes of zeros, so I don't get any garbage printed from the dword-load version. (x86 is little-endian).
But it seems your non-zero bytes are debug info. I've found NASM's debug info is the opposite of helpful, sometimes making GDB's disassembly confused (layout reg / layout asm sometimes fail to disassemble a whole block after a label) so I don't use it.
But if I use nasm -felf32 -Fdwarf, then I do get 7173 from that dword load that goes past the end of the .data section. That's a different large number, so this is just wrong in a different way, not the same problem. 7173 is 0x1c05, so it corresponds to db 5, 0x1c, 0, 0. i.e. your calculated length of 5 is the low byte, but there's a 0x1c after it. yasm -gdwarf2 gives me 469762053 = 0x1c000005.
If you'd used db $-msg, 0,0,0 or dd $-msg, you could load a whole dword. (To load and zero-extend a byte into a dword register, use movzx edx, byte [mem])
write() behaviour with a large length
If you give it some very large length, write will go until it reaches an unreadable page, then return the length it actually wrote. (It doesn't check the whole buffer for readability before starting to copy_from_user. And it returns the number of bytes written if that's non-zero before encountering an unreadable page. You can only get -EFAULT if an unreadable page is encountered right away, not if you pass a huge length that includes some unmapped pages later.)
e.g. with mov edx, msgLen (the label address)
$ nasm -felf32 -Fdwarf foo.asm
$ ld -melf_i386 -o foo foo.o
$ strace -o foo.tr ./foo # write trace to a file so it doesn't mix with terminal output
Zoom
foo.asmmsgmsgLen__bss_start_edata_end.symtab.strtab.shstrtab.text.data.debug_aranges.debug_info.debug_abbrev.debug_lin! ' 6& 9B_
$ cat foo.tr
execve("./foo", ["./foo"], 0x7fffdf062e20 /* 53 vars */) = 0
write(1, "Zoom\n\5\34\0\0\0\2\0\0\0\0\0\4\0\0\0\0\0\0\220\4\10\35\0\0\0\0\0"..., 134520837) = 4096
exit(1) = ?
+++ exited with 1 +++
134520837 is the length you passed, 0x804a005 (the address of msgLen). The system call writes 4096 bytes, 1 whole page, before getting to an unmapped page and stopping early.** It doesn't matter number you pass higher than that, because there's only 1 page before the end of the mapping. (And msg: is apparently right at the start of that page.)
On the terminal (where \0 prints as empty), you mostly just see the printable characters; pipe into hexdump -C if you want a better look at the binary data. It includes bits of metadata from the file, because the kernel's ELF program loader works by mmaping the file into memory (with a MAP_PRIVATE read-write no-exec mapping for that part). Use readelf -a and look at the ELF program headers: they tell the kernel which parts of the file to map into memory where, with what permissions.
Fun fact: If I redirect to /dev/null (strace ./foo > /dev/null), the kernel's write handler for that special device driver doesn't even check the buffer for permissions, so write() actually does return 134520837.
write(1, "Zoom\n\5\0\0[...]\0\0\220\4\10"..., 134520837) = 134520837
Using EQU properly
And yes, you should be using equ so you can use the actual length as an immediate. Having the assembler calculate it and then assemble that byte into the data section is less convenient. This prints exactly the right size, and is more efficient than using an absolute address to reference 1 constant byte.
mov edx, msg.len ; mov r32, imm32
...
section .rodata
msg: db 'Zoom',0xA
.len equ $-msg
(Using a NASM . local label is unrelated to using equ; I'm showing that too because I like how it helps organize the global namespace.)
Also semi-related to how ld lays out sections into ELF segments for the program-loader: recent ld pads sections for 4k page alignment or something like that to avoid getting data mapped where it doesn't need to be. Especially out of executable pages.
I am learning assembly, but I'm having trouble understanding how a program is executed by a CPU when using gnu/linux on arm. I will elaborate.
Problem:
I want my program to return 5 as it's exit status.
Assembly for this is:
.text
.align 2
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
I then assemble it with:
as prog.s -o prog.o
Everything ok up to here. I understand I then have to link my object file to the C library in order to add additional code that will make my program run. I then link with(paths are omitted for clarity):
ld crti.o crtn.o crt1.o libc.so prog.o ld-linux-aarch64.so.1 -o prog
After this, things work as expected:
./prog; echo $?
5
My problem is that I can't figure out what the C standard library is actually doing here. I more or less understand crti/n/1 are adding entry code to my program (eg the .init and .start sections), but no clue what's libc purpose.
I am interested at what would be a minimal assembly implementation of "returning 5 as exit status"
Most resources on the web focus on the instructions and program flow once you are in main. I am really interested at what are all the steps that go on once I execute with ./. I am now going through computer architecture textbooks, but I hope I can get a little help here.
The C language starts at main() but for C to work you in general need at least a minimal bootstrap. For example before main can be called from the C bootstrap
1) stack/stackpointer
2) .data initialized
3) .bss initalized
4) argc/argv prepared
And then there is C library which there are many/countless C libraries and each have their own designs and requirements to be satisfied before main is called. The C library makes system calls into the system so this starts to become system (operating system, Linux, Windows, etc) dependent, depending on the design of the C library that may be a thin shim or heavily integrated or somewhere in between.
Likewise for example assuming that the operating system is taking the "binary" (binary formats supported by the operating system and rules for that format are defined by the operating system and the toolchain must conform likewise the C library (even though you see the same brand name sometimes assume toolchain and C library are separate entities, one designed to work with the other)) from a non volatile media like a hard drive or ssd and copying the relevant parts into memory (some percentage of the popular, supported, binary file formats, are there for debug or file format and not actually code or data that is used for execution).
So this leaves a system level design option of does the binary file format indicate .data, .bss, .text, etc (note that .data, .bss, .text are not standards just convention most people know what that means even if a particular toolchain did not choose to use those names for sections or even the term sections).
If so the operating systems loader that takes the program and loads it into memory can choose to put .data in the right place and zero .bss for you so that the bootstrap does not have to. In a bare-metal situation the bootstrap would normally handle the read/write items because it is not loaded from media by some other software it is often simply in the address space of the processor on a rom of some flavor.
Likewise argv/argc could be handled by the operating systems tool that loads the binary as it had to parse out the location of the binary from the command line assuming the operating system has/uses a command line interface. But it could as easily simply pass the command line to the bootstrap and the bootstrap has to do it, these are system level design choices that have little to do with C but everything to do with what happens before main is called.
The memory space rules are defined by the operating system and between the operating system and the C library which often contains the bootstrap due to its intimate nature but I guess the C library and bootstrap could be separate. So linking plays a role as well, does this operating system support protection is it just read/write memory and you just need to spam it in there or are there separate regions for read/only (.text, .rodata, etc) and read/write (.data, .bss, etc). Someone needs to handle that, linker script and bootstrap often have a very intimate relationship and the linker script solution is specific to a toolchain not assumed to be portable, why would it, so while there are other solutions the common solution is that there is a C library with a bootstrap and linker solution that are heavily tied to the operating system and target processor.
And then you can talk about what happens after main(). I am happy to see you are using ARM not x86 to learn first, although aarch64 is a nightmare for a first one, not the instruction set just the execution levels and all the protections, you can go a long long way with this approach but there are some things and some instructions you cannot touch without going bare metal. (assuming you are using a pi there is a very good bare-metal forum with a lot of good resources).
The gnu tools are such that binutils and gcc are separate but intimately related projects. gcc knows where things are relative to itself so assuming you combined gcc with binutils and glibc (or you just use the toolchain you found), gcc knows relative to where it executed to find these other items and what items to pass when it calls the linker (gcc is to some extent just a shell that calls a preprocessor a compiler the assembler then linker if not instructed not to do these things). But the gnu binutils linker does not. While as distasteful as it feels to use, it is easier just to
gcc test.o -o test
rather than figure out for that machine that day what all you need on the ld command line and what paths and depending on design the order on the command line of the arguments.
Note you can probably get away with this as a minimum
.global main
.type main, %function
main:
mov w0, 5 //move 5 to register w0
ret //return
or see what gcc generates
unsigned int fun ( void )
{
return 5;
}
.arch armv8-a
.file "so.c"
.text
.align 2
.p2align 4,,11
.global fun
.type fun, %function
fun:
mov w0, 5
ret
.size fun, .-fun
.ident "GCC: (GNU) 10.2.0"
I am used to seeing more fluff in there:
.arch armv5t
.fpu softvfp
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun
.syntax unified
.arm
.type fun, %function
fun:
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
mov r0, #5
bx lr
.size fun, .-fun
.ident "GCC: (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609"
.section .note.GNU-stack,"",%progbits
Either way, you can look up each of the assembly language items and decide if you really need them or not, depends in part on if you feel the need to use a debugger or binutils tools to tear apart the binary (do you really need to know the size of fun for example in order to learn assembly language?)
If you wish to control all of the code and not link with a C library you are more than welcome to you need to know the memory space rules for the operating system and create a linker script (the default one may in part be tied to the C library and is no doubt overly complicated and not something you would want to use as a starting point). In this case being two instructions in main you simply need the one address space valid for the binary, however the operating system enters (ideally using the ENTRY(label), which could be main if you want but often is not _start is often found in linker scripts but is not a rule either, you choose. And as pointed out in comments you would need to make the system call to exit the program. System calls are specific to the operating system and possibly version and not specific to a target (ARM), so you would need to use the right one in the right way, very doable, your whole project linker script and assembly language could be maybe a couple dozen lines of code total. We are not here to google those for you so you would be on your own for that.
Part of your problem here is you are searching for compiler solutions when the compiler in general has absolutely nothing to do with any of this. A compiler takes one language turns it into another language. An assembler same deal but one is simple and the other usually machine code, bits. (some compilers output bits not text as well). It is equivalent to looking up the users manual for a table saw to figure out how to build a house. The table saw is just a tool, one of the tools you need, but just a generic tool. The compiler, specific gnu's gcc, is generic it does not even know what main() is. Gnu follows the Unix way so it has a separate binutils and C library, separate developments, and you do not have to combine them if you do not want to, you can use them separately. And then there is the operating system so half your question is buried in operating system details, the other half in a particular C library or other solution to connect main() to the operating system.
Being open source you can go look at the bootstrap for glibc and others and see what they do. Understanding this type of open source project the code is nearly unreadable, much easier to disassemble sometimes, YMMV.
You can search for the Linux system calls for arm aarch64 and find the one for exit, you can likely see that the open source C libraries or bootstrap solutions you find that are buried under what you are using today, will call exit but if not then there is some other call they need to make to return back to the operating system. It is unlikely it is a simple ret with a register that holds a return value, but technically that is how someone could choose to do it for their operating system.
I think you will find for Linux on arm that Linux is going to parse the command line and pass argc/argv in registers, so you can simply use them. And likely going to prep .data and .bss so long as you build the binary correctly (you link it correctly).
Here's a bare minimum example.
Run it with:
gcc -c thisfile.S && ld thisfile.o && ./a.out
Source code:
#include <sys/syscall.h>
.global _start
_start:
movq $SYS_write, %rax
movq $1, %rdi
movq $st, %rsi
movq $(ed - st), %rdx
syscall
movq $SYS_exit, %rax
movq $1, %rdi
syscall
st:
.ascii "\033[01;31mHello, OS World\033[0m\n"
ed:
I have following code
.data
result: .byte 1
.lcomm input 1
.lcomm cha 2
.text
(some other code, syscalls)
At first everything is fine. When a syscall (eg. read) is called, the value at label 'result' changed to some random trash value.
Anyone know what's wrong?
P.S. Environment
Debian x86_64 latest
Running in virtualbox
Using as -g ld emacs make latest
-----edit-----
(continue)
.global _start
_start:
mov $3,%rax
mov $0,%rbx
mov $input,%rcx
mov $1,%rdx
int $0x80
(sys_exit)
The value of 'input' was changed properly, but the value of 'result' changed to random value as well after
int $0x80
You're looking at 4 bytes starting at result, which includes input as the 2nd or 3rd byte. (That's why the value goes up by a multiple of 256 or 65536, leaving the low byte = 1 if you print (char)result). This would be more obvious if you use p /x to print as hex.
GDB's default behaviour for print result when there was no debug info was to assume int. Now, because of user errors like this, with gdb 8.1 on Arch Linux, print result says 'result' has unknown type; cast it to its declared type
GAS + ld unexpectedly (to me anyway) merge the BSS and data segments into one page, so your variables are adjacent even though you put them in different sections that you'd expect to be treated differently. (BSS being backed by anonymous zeroed pages, data being backed by a private read-write mapping of the file).
After building with gcc -nostdlib -no-pie test.S, I get:
(gdb) p &result
$1 = (<data variable, no debug info> *) 0x600126
(gdb) p &input
$2 = (<data variable, no debug info> *) 0x600128 <input>
Unlike using .section .bss and reserving space manually, .lcomm is free to pad if it wants. Presumably for alignment, maybe here so the BSS starts on an 8-byte boundary. When I built with clang, I got input in the byte after result (at different addresses).
I investigated by adding a large array with .lcomm arr, 888332. Once I realized it wasn't storing literal zeros for the BSS in the executable, I used readelf -a a.out to check further:
(related: What's the difference of section and segment in ELF file format)
...
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x0000000000000126 0x0000000000000126 R E 0x200000
LOAD 0x0000000000000126 0x0000000000600126 0x0000000000600126
0x0000000000000001 0x00000000000d8e1a RW 0x200000
NOTE 0x00000000000000e8 0x00000000004000e8 0x00000000004000e8
0x0000000000000024 0x0000000000000024 R 0x4
Section to Segment mapping:
Segment Sections...
00 .note.gnu.build-id .text
01 .data .bss
02 .note.gnu.build-id
...
So yes, the .data and .bss sections ended up in the same ELF segment.
I think what's going on here is that the ELF metadata says to map MemSize of 0xd8e1a bytes (of zeroed pages) starting at virt addr 0x600126. and LOAD 1 byte from offset 0x126 in the file to virtual address 0x600126.
So instead of just an mmap, the ELF program loader has to copy data from the file into an otherwise-zeroed page that's backing the BSS and .data sections.
It's probably a larger .data section that would be required for the linker to decide to use separate segments.
When there is a need to run a piece of code on the program startup (on Linux), how to use correctly the .init_section of an executable file (ELF32-i386)? I have the following code (GNU Assembler) which has ctor initialization function, and the address of this function is placed inside .init_array section:
.intel_syntax noprefix
.data
s1: .asciz "Init code\n"
s2: .asciz "Main code\n"
.global _start
.global ctor
.text
ctor:
mov eax, 4 # sys_write()
mov ebx, 1 # stdout
mov ecx, offset s1
mov edx, 10
int 0x80
ret
.section .init_array
.long ctor
.text
_start:
mov eax, 4
mov ebx, 1
mov ecx, offset s2
mov edx, 10
int 0x80
mov eax, 1
mov ebx, 0
int 0x80
This code is assembled with:
as -o init.o init.asm
ld -o init init.o
When the resulting executable is run, only the "Main code" string is printed. How to use properly the .init_array section?
EDIT1: I want to use .init_array because there are multiple source files with their own init code. One can call all this code 'manually' on startup and modify it every time when source files are added to or removed from the project, but .init_array seems to be designed just for this case :
Before transferring control to an application, the runtime linker
processes any initialization sections found in the application and any
loaded dependencies. The initialization sections .preinit_array,
.init_array, and .init are created by the link-editor when a dynamic
object is built.
The runtime linker executes functions whose addresses are contained in
the .preinit_array and .init_array sections. These functions are
executed in the same order in which their addresses appear in the
array.
In case when an executable is created without gcc, the linker seems to not execute the startup code. I tried to write my own standard init routine which reads function pointers in .init_array, section and calls them. It works OK for one file, where one can mark the end of the section, for example, with zero. But with multiple files this zero can be relocated in the middle of the section. How can one correctly determine the size of a section assembled from multiple source files?
If you make a statically linked bare executable the way you're doing, with your own code at the _start entry point, your code just runs from that point. If you want something to happen, your code has to make it happen. There is no magic.
Using sections can be useful to group startup code from multiple source files together, so all the startup code is cold and can potentially be paged out, or at least not need a TLB entry.
So you "properly use" sections by putting functions there and calling them from code that runs sometime after _start.
In your code example, it looks like .init_array is a list of function pointers. I assume the standard CRT startup files read the ELF file and find the length of that section, then walk through it making indirect calls to those functions. Since you're making custom code, it's going to be faster just to call an init function that does everything.
dynamic linking:
The "runtime linker" is the ELF interpreter for dynamic binaries. It runs code in your process before _start, so yes, apparently it does process that ELF section and make magic happen.
So in response to your edit, your options are: implement this processing of .init_array yourself, or create dynamic executables. I'm pretty sure this procedure has been covered in other questions, and I don't have time to research a correct command line for a dynamic executable that still doesn't link libc. (Although you might just want to use gcc -nostartfiles, or something.)
If you're stuck, leave a comment. I may update this later when I have more time anyway, or feel free to edit in a working command.
For normal C programs the .init_array is traversed by a function that is called from _start before main gets called. A good description is on this site.
So I see two ways: You can simply link against the glibc start code. Or you have to find out another mechanism to solve this issue by yourself.
I'm new here same as I'm new with assembly. I hope that you can help me to start.
I'm using 32bit (i686) Ubuntu to make programs in assembly, using gcc compiler.
I know that general-purpose-registers are 32bit (4 bytes) max, but what when I have to operate on 64 bit numbers? Intel's instruction says that higher bits are stored in %edx and lower in %eax
Great...
So how can I do something with this 2-registers number? I have to convert 64bit dec to hex, then save it to memory and show on the screen.
How to make the 64bit quadword at start of the program in .data section?
EDIT:
When I defined global variable llu (long long unsigned) in C and compiled to assembly it made:
.data
a:
.long <low bits>
.long <high bits>
It is because the parameters are saved in the stack backwards or something more?
Write a trivial C program that uses long long numbers (which have
64-bits on Linux/ix86).
Compile that program into assembly with gcc -S t.c.
Study the resulting assembly.
Modify your program to do something more complicated, and repeat steps 2 and 3.
After several iterations, you should have a good handle on what you need to do in assembly.
When I defined global variable llu (long long unsigned) in C and compiled to assembly it made:
.data
a:
.long <low bits>
.long <high bits>
It is because the parameters are saved in the stack backwards or something more?