x64 Portable Executable section order - 64-bit

Can the 3 essential sections: .data (resources), .rdata (imports), and .text (instructions) in the Portable Executable (.exe) file format be in any order as long as the 'Address of Entry Point' field points to the .text section? It seems like having the instructions (.text) be first is a big pain in the butt since you have to calculate the imports and resources sections to actually WRITE the instructions section...
This is what I'm going off of: https://i.imgur.com/LIImg.jpg
What about for run-time performance?

As already answered by Hans, the linker is free to arrange sections in any order, as seen best fit. The only exception is named sections like .text$A and .text$B, where the sections must be sorted in lexicographical order according to the suffix following the $.
The order in which the sections are written by the linker is not of great significance to how easy it is to produce the final binary, either. Typically, the binary file isn't written sequentially as the sections are computed; rather, the section contents are produced in buffers, and the references between code and data are kept symbolic (in a relocatable format) until the sections are written to the final executable.
The part of the question relating to performance has more to do with how the image loader in Windows works, rather than the linker. Because the loader does not need the sections in any particular order, there is no additional overhead (e.g. related to sorting) when unpacking the sections into the memory view of the image file. Relocations and matching between import and export tables are done in any case, and the amount of work is decided by other factors. Hence, the order decided by the linker does not in itself affect the loading time.
For normal Windows API or Native binaries (not CLR), the section names are not important either--only the characteristics of each section, which decide e.g. the access rights of the memory mapped pages in the image (whether they are read-only, writable, executable, etc.). For example, the import table may be placed in a section named .idata rather than .rdata, or the section may be named something completely different.

The format of a PE file is described in detail by the pecoff.doc document (direct link to a Word2003 file). What you are asking about is covered in chapter 4, it talks about the Section Table. The most relevant detail:
The number of entries in the Section Table is given by the NumberOfSections field in the file header. Entries in the Section Table are numbered starting from one. The code and data memory section entries are in the order chosen by the linker.
So no, this is not cast in stone, sections can appear in any order.
It seems like having the instructions (.text) be first is a big pain
As hinted by the pecoff language, it is a linker implementation detail. And to Microsoft's linker, and probably most any other linker, it is not actually a big pain. It's first and foremost job is to generate the executable code and there tends to be a lot of it. And not all of the code is used, just what is needed to resolve the dependencies. Which is a very common scenario, a static C runtime library would be a classic example. Your program does not call every possible runtime function, the linker only links in what is needed.
Details like relocations and imports are a minor detail, there are just not nearly as many of them. So it is a lot more efficient to first generate the code and keep track of the required relocations and imports to match that code in memory, to write them to the PE file later.
Your assumption that it is "better" the other way around is not accurate. To a linker anyway.

Related

Does the .so file still contain infomation about label

In what phase of compilation does the compiler replace label into actual addr
I understanding instructions like jmp abc where abc is just a note and will be replace to actual address eventually, does it ?
Does the final .so file still contain infomation about label or the label is replace to actual addr when its load in the memory ?
TL;DR - your question is hard to answer, because it is mixing a few concepts. For typical assembler labels, we use PC relative and labels are resolve at assemble time. For other 'external' labels, there are many cases and the resolution depends on the case.
There are four conceptual ways to address on almost all CPUs, and definitely on the ARM.
PC relative address. Current instruction +/- offset.
Absolute address. This is the one you are conceptually thinking of.
Register computed address. Calculated at run time. ldr pc, [rn, #xx]
Table based addressing. Global offset table, etc. Much like registers computed addresses. ldr pc, [Rbase, Rindex, lsl #2]
The first two fit in a single instruction and are very efficient. The first is most desirable as the code can execute at ANY address as long as it maintains it's original layout (ie, you don't load it by splitting the code up).
In the table above, there is also the concept of 'build time' and 'run time'. The distinction is the difference between a linker and a loader. You have tagged this 'linux' and refer to an 'so' or shared library. Also, you are referring to assembler 'labels'. They are very similar concepts, but can be different as they will be one of the four classes of addressing above.
Most often in assembler, the labels are PC relative. There is no additional structure to be implemented with PC relative, except to keep the chunk of code continuous. In the case of an assembler that is a 'module' (compilation unit, for a compile) or is processed by the assembler and produced an 'object', it will use a PC relative addressing.
The object format can be annotate with external addresses and there are many choices in how an assembler may output these address. They are generally controlled by 'psuedo-ops'. That is a note (separate section with defined format) in the object file; the instruction is semi-complete in this form. It may prepare to use an offset table, use a register based compute (like r9+constant), etc.
For the typical case of linking (done at build time), we will either use PC relative or absolute. If we fix our binary to only run at one address, the assembler can setup for absolute addressing and resolve these through linking. In this case, the binary must be loaded at a fixed address. The assembler 'modules' or object files can be completely glued together to have everything resolved. Then there is no 'load' time fix ups. Other determining factor are whether code/data are separate, and whether the system is using an MMU. It is often desirable to keep code constant, so that many processes can use the same RAM/ROM pages, but they will have separate data. As well as memory efficient, this can provide some form of security (although it is not extremely robust) it will prevent accidental code overwrites and will provide debugging help in the form of SIGSEGV.
It is possible to write a PC-relative initialization routine which will do the fix-ups to create a table in your own binary. So a 'loader' is just to determine where you are running and then make calculations. For statically shared libraries, you typically know the libraries you will run, but not where they are. For dynamically shared libraries, you might not even know at compile time what the library is that you will run.
A Linux distribution can use either. If you have some sort of standard Linux Desktop distribution, (Ubuntu/Debian, Redhat, etc). You will have something base on ARM ELF LSB and dynamic shared libraries. You need to use the assembler pseudo ops to support this type of addressing or use a compiler to do it for you. The majority of all 'labels' in a shared library will be PC relative and not show up. Some labels can show up for debugging reasons (man strip) and some are absolutely needed to resolve addresses at run time.
I have also asked a question that I find related some time ago, Using GCC pre-processor as an assembler... So the key concept is that the assembler is generally 'two pass' and needs to do these local address fix ups. Then this question asks a 2nd level Concept A/B where we are adding shared libraries. The online book Linkers and Loaders is a great resource if you want to known more.
See also:
Static linked shared libraries
Thumb start function
What is the point of busybox?
Final executable has to have all addresses, otherwise it would not work.
Thing to remember is there are static linking and dynamic linking (eg using shared libraries). In case of static linkage binary file has all addresses resolved. In case of dynamic linkage addresses are resolved during loading, while binary has relocation information that are replaced with actual addresses by dynamic linker. But by the end of a day, loaded binary in memory has all addresses.
In what phase of compilation does the compiler replace label into
actual addr
Compiler could replace with actual address when it knows destination address. For example that's a call to function in same compilation unit.
When destination address is outside of compilation unit and out of reach for compiler, compiler leaves a relocation information in object file. Linker replace that with an actual address in memory at same time.

How can I convert dynamically linked application to statically one?

I have an application, say gedit, which is dynamically linked and I don't have the source code. So I can not compile it as I like. what I want to do is to make it statically linked and move it to the system which doesn't have the necessary libraries to run that application. So is it possible to do it and how?
It is theoretically possible. You basically have to do the same job that the dynamic linker does, with some modifications, i.e.
dump all sections from the original file
resolve symbols
locate libraries
instead of loading them into memory, assemble them into a "virtual image"
resolve internal links
dump the whole thing in a independent file.
So objdump, readelf, and objcopy will be some of your friends.
The task is not easy and the result will be neither automatic, nor (probably) stable.
You may want to check out this code by someone else that tried the same, by actually intercepting the dynamic linker (i.e. all steps above, except the last) and dumping the results to disk.
It is based on this tool, so it's anyone's bet whether it works on the newest kernels.
(It probably doesn't - and you need at least to patch it to reflect the new structures. This is my attempt at doing so. Caveat emptor).

How to modify an ELF file in a way that changes data length of parts of the file?

I'm trying to modify the executable contents of my own ELF files to see if this is possible. I have written a program that reads and parses ELF files, searches for the code that it should update, changes it, then writes it back after updating the sh_size field in the section header.
However, this doesn't work. If I simply exchange some bytes, with other bytes, it works. However, if I change the size, it fails. I'm aware of that some sh_offsets are immediately adjacent to each other; however this shouldn't matter when I'm reducing the size of the executable code.
Of course, there might be a bug in my program (or more than one), but I've already painstakingly gone through it.
Instead of asking for help with debugging my program I'm just wondering, is there anything else than the sh_size field I need to update in order to make this work (when reducing the size)? Is there anything that would make changing the length fail other than that field?
Edit:
It seems that Andy Ross was perfectly correct. Even in this very simple program I have come across some indirect addressing in __libc_start_main that I cannot trivially modify to update the offset it will reach.
I was curious though, what would be the best approach to still trying to get as far as possible with this problem? I know I cannot solve this in every case, but for some simple programs, it should be possible to update what is required to make it run? Should I try writing my own virtual machine or try developing a "debugger" that would replace each suspected problem instruction with INT 3? Any ideas?
The text segment is likely internally linked with relative offsets. So one function might be trying to jump to, say, "current address plus 194 bytes". If you move things around such that the jump target is now 190 bytes, you will obviously break things. The same is true of constant data on some architectures (e.g. x86-64 but not i686). There is no simple way short of a complete disassembly to know where the internal references are, and in fact it's computationally undecidable to find them all (i.e. trying to figure out all possible jump targets of a runtime-computed branch is the Halting Problem).
Basically, this isn't solvable in the general case, so if you have an ELF binary from someone else you're trying to patch, you'll need to try other techniques. But with (great!) care it's possible to produce a library where all internal references go through the GOT/PLT which can be sliced up and relinked like this. What are you trying to accomplish?
is there anything else than the sh_size field I need to update in order to make this work
It sounds like you are patching a fully-linked binary (ET_EXEC or ET_DYN). Please note that .sh_size is not used for anything after the static link is done. You can strip the entire section table, and the binary will continue to work fine. What matters at runtime are the segments in the ELF, not sections.
ELF stands for executable and linking format, and the executable and linking form "dual nature" of the ELF. Sections are used at (static) link time to combine into segments; which are used at execution time (aka runtime, aka dynamic linking time).
Of course you haven't told us exactly what your patching strategy is when you are shrinking your binary, and in what way the result is broken. It is very likely that Andy Ross's answer is the real cause of your breakage.

"Function-level linking" (i.e. COMDAT generation) in MASM assembly?

Is there any way to make MASM generate COMDATs for functions, so that unused functions are removed by the linker?
(i.e. I'm looking for the equivalents of /Gy for MASM.)
Not straightforward, but doable; discussed here and here.
The first step involves putting each function into a separate segment with names like .text$a, .text$b, etc. This way, the assembler won't unite them into a single .text section, but the linker eventually will; there's a special rule in Microsoft linkers regarding the stuff past the $ character in the section name. The assembler will emit an .obj file with multiple code sections. I've tried that, I can confirm that it does. At least one flavor of MASM does. :)
Then they suggest running an utility over an object file that will mark your sections as COMDATs. The said utility seems to be lost to time and bit decay, but its action can be roughly deduced. It reads and parses a COFF .obj file, goes through sections and slaps a COMDAT flag on all .text sections. I assume it's just a flag; could be more. As a first step to its recreation, I'd suggest compiling a C file with /Gy then without, and comparing the two .obj files via some low-level PE/COFF browser. I didn't go this far, since my scenario was rather different.

Simple way to reorder ELF file sections

I'm looking for a simple way to reorder the ELF file sections. I've got a sequence of custom sections that I would like all to be aligned in a certain order.
The only way I've found how to do it is to use a Linker script. However the documentation indicates that specifying a custom linker script overrides the default. The default linker script has a lot of content in it that I don't want to have to duplicate in my custom script just to get three sections always together in a certain order. It does not seem very flexible to hard code the linker behavior like that.
Why do I want to do this? I have a section of data that I need to know the run-time memory location of (beginning and end). So I've created two additional sections and put sentinel variables in them. I then want to use the memory locations of those variables to know the extent of the unknown section in memory.
.markerA
int markerA;
.targetSection
... Lots of variables ...
.markerB
int markerB;
In the above example, I would know that the data in .targetSection is between the address of markerA and markerB.
Is there another way to accomplish this? Are there libraries that would let me read in the currently executing ELF image and determine section location and size?
You can obtain addresses of loaded sections by analyzing the ELF-File format. Details may be found e.g. in
Tool Interface Standard (TIS)
Portable Formats Specification,
version 1.2
(http://refspecs.freestandards.org/elf/elf.pdf)
for a short impression which information is available its worth to take a look at readelf
readelf -S <filename>
returns a list of all sections contained in .
The sections which were mapped into memory were typed PROGBITS.
The address your are looking for is displayed in the column Addr.
To obtain the memory location you have to add the load address of your
executable / shared object
There are a few ways to determine the load adress of your executable/shared object:
you may parse /proc/[pid]/maps (the first column contains the load address). [pid] is the process id
if you know one function contained in your file you can apply dlsym to receive a pointer to the function. That pointer is the input parameter for dladdr returning a Dl_info struct containing the requested load address
To get some ELF information the library
libelf
may be a helpful companian (I detected it after studying the above mentioned TIS so I only took a short look at it and I don't know deeper details)
I hope this sketch of a possible solution will help.
You may consider using GCC's initializers to reference the variables which would go into a separate section otherwise and maintain all their pointers in an array. I recommend using initializers because this works file-independently.
You may look at ELFIO library. It contains WriteObj and Writer examples. By using the library, you will be able to create/modify ELF binary file programmatically.
I'm afraid override the default link script is the simple solution.
Since you worried about it might not be flexible (even I think the link script does change that often), you could write a script to generate a link script based on host system's default ld script ("ld --verbose") and insert your special sections into it.

Resources