Colon at end of section directive causes segfault - nasm

It's a semi-common error for me to accidentally include a colon (:) at the end of a section directive:
section .text:
_start:
When done to the .text section, this causes a SIGSEGV before the first instruction can be executed though, and I'm curious as to why.

maintenance info sections in GDB shows what's going on.
Without a colon:
gef➤ maintenance info sections
Exec file:
`my/path', file type elf64-x86-64.
[0] 0x00401000->0x00401005 at 0x00001000: .text ALLOC LOAD READONLY CODE HAS_CONTENTS
With a colon:
gef➤ maintenance info sections
Exec file:
`my/path', file type elf64-x86-64.
[0] 0x00401000->0x00401005 at 0x00001000: .text: ALLOC LOAD READONLY DATA HAS_CONTENTS
.text ... CODE vs .text: ... DATA.
So, it seems like the colon is taken as literally part of the section name. As a result, it isn't considered to be the .text section, and isn't marked as executable.

Related

NASM elf file size difference with uppercase and lowercase letters in section

I wrote a simple "Hello world" in assembly under debian linux:
; Define variables in the data section
SECTION .data
hello: db 'Hello world!',10
helloLen: equ $-hello
; Code goes in the text section
SECTION .text
GLOBAL _start
_start:
mov eax,4 ; 'write' system call = 4
mov ebx,1 ; file descriptor 1 = STDOUT
mov ecx,hello ; string to write
mov edx,helloLen ; length of string to write
int 80h ; call the kernel
; Terminate program
mov eax,1 ; 'exit' system call
mov ebx,0 ; exit with error code 0
int 80h ; call the kernel
After assembling
nasm -f elf64 hello.asm -o hello.o
ld -o hello hello.o.
I got a 9048 byte binary.
Then I changed two lines in the code: from .data to .DATA and .text to .TEXT:
SECTION .DATA
SECTION .TEXT
and got a 4856 byte binary.
Changing them to
SECTION .dAtA
SECTION .TeXt
produced a 4856 byte binary too.
NASM is declared to be a case-insensitive compiler. What is the difference then?
You're free to use whatever names you like for ELF sections, but if you don't use standard names, it becomes your responsibility to specify the section flags. (If you use standard names, you get to take advantage of default flag settings for those names.) Section names are case-sensitive, and .data and .text are known to NASM. .DATA, .dAta, etc. are not, and there is nothing which distinguishes these sections from each other, allowing ld to combine them into a single segment.
That automatically makes your executable smaller. With the standard flags for .text and .data, one of those is read-only and the other is read-write, which means that they cannot be placed into the same memory page. In your example program, both sections are quite small, so they could fit in a single memory page. Thus, using non-standard names makes your executable one page smaller, but one of the sections will have incorrect writability.

Why is '.shstrtab' section mandatory?

I'm compiling a static executable like this:
ld.lld out/main.o -o out/sm -Tstatic.ld -static
strip --strip-all out/sm
This is the linker script I'm using:
ENTRY(_start)
SECTIONS
{
. = 0x100e8;
.all : {
*(.bss*)
*(.text*)
*(.data*)
*(.rodata*)
*(COMMON*)
} :code
.shstrtab : {
*(.shstrtab)
}
/DISCARD/ : {
*(*)
}
}
PHDRS
{
code PT_LOAD FILEHDR PHDRS ;
}
The executable works as expected, but the strip command doesn't remove .shstrtab section from the executable.
If I remove the .shstrtab section from the linker script, I get this error:
ld.lld out/main.o -o out/.sm -Tstatic.ld -static
ld.lld: error: discarding .shstrtab section is not allowed
Why is the .shstrtab section necessary? I've replaced all the standard section names and the executable still works as expected, so the program loading code doesn't care about the section names.
As an aside, is it possible to completely exclude the section headers in a linker script, since it isn't needed for a static executable.
note: GNU linkers silently put .shstrtab in the output executable even if it is discarded.
Why is '.shstrtab' section mandatory?
Each section in the section table has a section name. It is stored as a reference to the section name table (.shstrtab).
So as long as there is at least one section in an ELF file, there must be a .shstrtab section (however, it might by named differently).
Indeed, it would be allowed to build an ELF file without any sections (but only with program headers).
However, I have never seen such an ELF file linked by a regular linker (only files that were intentionally created to be as small as possible or similar).

What sections are necessary in a minimal dynamically-linked ELF program?

I assembled a simple "Hello, world" program and linked it using TCC, after which I got 4196 bytes of an executable.
The program has 31 sections: ['', '.text', '.data', '.bss', '.symtab', '.strtab', '.rel.text', '.rodata', '.rodata.cst4', '.note.GNU-stack', '.init', '.rel.init', '.gnu.linkonce.t.__x86.get_pc_thunk.bx', '.fini', '.rel.fini', '.text.unlikely', '.text.__x86.get_pc_thunk.bx', '.eh_frame', '.rel.eh_frame', '.preinit_array', '.init_array', '.fini_array', '.interp', '.dynsym', '.dynstr', '.hash', '.dynamic', '.got', '.plt', '.rel.got', '.shstrtab']. I feel that's a real lot for such a simple binary - which ones are actually necessary here for my program to run?
Here's the source code and the way I compiled it:
extern printf
global main
section .data
msg: db "Hello World!", 0
section .text
main:
;; puts (msg)
push msg
call printf
add esp, 4
;; return 0
mov eax, 0
ret
nasm main.asm -f elf32 && tcc main.o -o main
Tested on 32bit/ubuntu:16.04 Docker image.
Note: this question is different from this one in that I'm not looking for a tensy Linux ELF, but one that allows me to call dynamic symbols. I believe that due to the nature of dynamic linking, I need some extra sections.
I believe that due to the nature of dynamic linking, I need some extra sections.
Your belief is mistaken. No section is necessary at runtime, only segments matter.
A runnable dynamically-linked ELF binary will have at least one PT_LOAD segment, a PT_INTERP segment, and PT_DYNAMIC segment.

Does GNU Assembler add its own entry point?

Say I have the following Assembly code:
.section .text
.globl _start
_start:
If I created an executable file using the following commands:
as 1.s -o 1.o
ld 1.o -o 1
Will the GNU Assembler add its own entry point to my executable which calls _start or will _start be the actual entry point?
See this question for more details.
The file crt0.o (or crt1.o or however this file is called) that contains the startup code mentioned in the other question has also been written in assembler.
So what the Linker ("ld") does is to search all object files (which are in fact all created using "as") for a symbol named "_start" which becomes the entry point.
You are of course free to add crt0.o to your assembler-written program when using "ld". In this case however you MUST NOT name your symbol "_start" but "main" in your assembler file:
.globl main
.text
main:
...
Otherwise "ld" will print an error message because it will find two symbols named "_start" and it does not know which one is the entry point!
You can check it this way:
objdump -x 1 # n.b. 1 is the name of your program
This will print, among other things:
start address 0x000000...
Take the address it gives you, and search for it elsewhere in the output. I think you will find it matches the start of the .text segment, as well as the _start symbol. If so, then _start is indeed the ELF entry point.

Specify start address in nasm?

Consider a file with only the simple 32-bit x86 assembly statement:
call 0xc1066580
If I assemble this file with nasm -f elf I get:
0: e8 7c 65 06 c1 call 0xc1066581
If I use GCC and specify -Ttext=0 and -nostdlib I get:
0: e8 7b 65 06 c1 call c1066580
-nostdlib
Do not use the standard system startup files or libraries when linking. No startup files and only the libraries you specify are passed to the linker, and options specifying linkage of the system libraries, such as -static-libgcc or -shared-libgcc, are ignored.
But what exactly does -Ttext=0 do? I use it to specify the entry address the EIP starts at when it is loaded/executed. I'm unable to find -Ttext in the manpages, when I search online I found this:
"-Ttext is an alias for "--section-start=text", which reads as:
--section-start=sectionname=org
Locate a section in the output file at the absolute address given
by org. You may use this option as many times as necessary to
locate multiple sections in the command line. org must be a single
hexadecimal integer; for compatibility with other linkers, you may
omit the leading 0x usually associated with hexadecimal values.
Note: there should be no white space between sectionname, the
equals sign ("="), and org."
From http://www.linuxquestions.org/questions/linux-general-1/gcc-creating-a-huge-executable-image-redhat-2-6-18-8-el5-x86_64-linux-759302/
However, I don't find --section or sectionname in my manpage either, and when I try to replace -Ttext with --section-name I get that this is an unrecognized argument (this is GCC 4.7.2 if it is relevant).
Could someone tell me if this explanation (of -Ttext) is accurate and where I can find it in my manual? If it is not accurate, what does -Ttext really do?
My other question is: How does one specify a similar argument as -Ttext to nasm? Or in other words, what do I need to do to make nasm produce the same output as gcc does?
I tried to execute the same assemble statements (with nasm and gcc) on both a 64-bit and 32-bit system, I get the same results.
Running ld --help gives
-Ttext ADDRESS Set address of .text section
If we assemble the following program using gcc -Ttext=8 -nostdlib -o test test.s
.globl _start
_start:
movl test,%ebx
test:
And dump the section headers (objdump -h test):
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000007 0000000000000008 0000000000000008 00200008 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
..and the code (objdump -d test):
0000000000000008 <_start>:
8: 8b 1c 25 0f 00 00 00 mov 0xf,%ebx
We can see that the .text section has a starting address of 8 and a size of 7. That is, all references to symbols within the section have been offset by the starting address we specified (8), but there was no padding involved (the section size did not grow as a result of having changed its address).
You should be able to accomplish the same thing with NASM by using the ORG directive: "NASM's ORG does exactly what the directive says: origin. Its sole function is to specify one offset which is added to all internal address references within the section".

Resources