Does the .so file still contain infomation about label - linux

In what phase of compilation does the compiler replace label into actual addr
I understanding instructions like jmp abc where abc is just a note and will be replace to actual address eventually, does it ?
Does the final .so file still contain infomation about label or the label is replace to actual addr when its load in the memory ?

TL;DR - your question is hard to answer, because it is mixing a few concepts. For typical assembler labels, we use PC relative and labels are resolve at assemble time. For other 'external' labels, there are many cases and the resolution depends on the case.
There are four conceptual ways to address on almost all CPUs, and definitely on the ARM.
PC relative address. Current instruction +/- offset.
Absolute address. This is the one you are conceptually thinking of.
Register computed address. Calculated at run time. ldr pc, [rn, #xx]
Table based addressing. Global offset table, etc. Much like registers computed addresses. ldr pc, [Rbase, Rindex, lsl #2]
The first two fit in a single instruction and are very efficient. The first is most desirable as the code can execute at ANY address as long as it maintains it's original layout (ie, you don't load it by splitting the code up).
In the table above, there is also the concept of 'build time' and 'run time'. The distinction is the difference between a linker and a loader. You have tagged this 'linux' and refer to an 'so' or shared library. Also, you are referring to assembler 'labels'. They are very similar concepts, but can be different as they will be one of the four classes of addressing above.
Most often in assembler, the labels are PC relative. There is no additional structure to be implemented with PC relative, except to keep the chunk of code continuous. In the case of an assembler that is a 'module' (compilation unit, for a compile) or is processed by the assembler and produced an 'object', it will use a PC relative addressing.
The object format can be annotate with external addresses and there are many choices in how an assembler may output these address. They are generally controlled by 'psuedo-ops'. That is a note (separate section with defined format) in the object file; the instruction is semi-complete in this form. It may prepare to use an offset table, use a register based compute (like r9+constant), etc.
For the typical case of linking (done at build time), we will either use PC relative or absolute. If we fix our binary to only run at one address, the assembler can setup for absolute addressing and resolve these through linking. In this case, the binary must be loaded at a fixed address. The assembler 'modules' or object files can be completely glued together to have everything resolved. Then there is no 'load' time fix ups. Other determining factor are whether code/data are separate, and whether the system is using an MMU. It is often desirable to keep code constant, so that many processes can use the same RAM/ROM pages, but they will have separate data. As well as memory efficient, this can provide some form of security (although it is not extremely robust) it will prevent accidental code overwrites and will provide debugging help in the form of SIGSEGV.
It is possible to write a PC-relative initialization routine which will do the fix-ups to create a table in your own binary. So a 'loader' is just to determine where you are running and then make calculations. For statically shared libraries, you typically know the libraries you will run, but not where they are. For dynamically shared libraries, you might not even know at compile time what the library is that you will run.
A Linux distribution can use either. If you have some sort of standard Linux Desktop distribution, (Ubuntu/Debian, Redhat, etc). You will have something base on ARM ELF LSB and dynamic shared libraries. You need to use the assembler pseudo ops to support this type of addressing or use a compiler to do it for you. The majority of all 'labels' in a shared library will be PC relative and not show up. Some labels can show up for debugging reasons (man strip) and some are absolutely needed to resolve addresses at run time.
I have also asked a question that I find related some time ago, Using GCC pre-processor as an assembler... So the key concept is that the assembler is generally 'two pass' and needs to do these local address fix ups. Then this question asks a 2nd level Concept A/B where we are adding shared libraries. The online book Linkers and Loaders is a great resource if you want to known more.
See also:
Static linked shared libraries
Thumb start function
What is the point of busybox?

Final executable has to have all addresses, otherwise it would not work.
Thing to remember is there are static linking and dynamic linking (eg using shared libraries). In case of static linkage binary file has all addresses resolved. In case of dynamic linkage addresses are resolved during loading, while binary has relocation information that are replaced with actual addresses by dynamic linker. But by the end of a day, loaded binary in memory has all addresses.
In what phase of compilation does the compiler replace label into
actual addr
Compiler could replace with actual address when it knows destination address. For example that's a call to function in same compilation unit.
When destination address is outside of compilation unit and out of reach for compiler, compiler leaves a relocation information in object file. Linker replace that with an actual address in memory at same time.

Related

x64 Portable Executable section order

Can the 3 essential sections: .data (resources), .rdata (imports), and .text (instructions) in the Portable Executable (.exe) file format be in any order as long as the 'Address of Entry Point' field points to the .text section? It seems like having the instructions (.text) be first is a big pain in the butt since you have to calculate the imports and resources sections to actually WRITE the instructions section...
This is what I'm going off of: https://i.imgur.com/LIImg.jpg
What about for run-time performance?
As already answered by Hans, the linker is free to arrange sections in any order, as seen best fit. The only exception is named sections like .text$A and .text$B, where the sections must be sorted in lexicographical order according to the suffix following the $.
The order in which the sections are written by the linker is not of great significance to how easy it is to produce the final binary, either. Typically, the binary file isn't written sequentially as the sections are computed; rather, the section contents are produced in buffers, and the references between code and data are kept symbolic (in a relocatable format) until the sections are written to the final executable.
The part of the question relating to performance has more to do with how the image loader in Windows works, rather than the linker. Because the loader does not need the sections in any particular order, there is no additional overhead (e.g. related to sorting) when unpacking the sections into the memory view of the image file. Relocations and matching between import and export tables are done in any case, and the amount of work is decided by other factors. Hence, the order decided by the linker does not in itself affect the loading time.
For normal Windows API or Native binaries (not CLR), the section names are not important either--only the characteristics of each section, which decide e.g. the access rights of the memory mapped pages in the image (whether they are read-only, writable, executable, etc.). For example, the import table may be placed in a section named .idata rather than .rdata, or the section may be named something completely different.
The format of a PE file is described in detail by the pecoff.doc document (direct link to a Word2003 file). What you are asking about is covered in chapter 4, it talks about the Section Table. The most relevant detail:
The number of entries in the Section Table is given by the NumberOfSections field in the file header. Entries in the Section Table are numbered starting from one. The code and data memory section entries are in the order chosen by the linker.
So no, this is not cast in stone, sections can appear in any order.
It seems like having the instructions (.text) be first is a big pain
As hinted by the pecoff language, it is a linker implementation detail. And to Microsoft's linker, and probably most any other linker, it is not actually a big pain. It's first and foremost job is to generate the executable code and there tends to be a lot of it. And not all of the code is used, just what is needed to resolve the dependencies. Which is a very common scenario, a static C runtime library would be a classic example. Your program does not call every possible runtime function, the linker only links in what is needed.
Details like relocations and imports are a minor detail, there are just not nearly as many of them. So it is a lot more efficient to first generate the code and keep track of the required relocations and imports to match that code in memory, to write them to the PE file later.
Your assumption that it is "better" the other way around is not accurate. To a linker anyway.

How to randomize each segment of Linux ELF

We know we can randomize the code,data/stack/heap by compile the code as PIE. While the code and data always have a fixed offset on each loading.
Is there a way that by adding some compile/link flags we can set code/data offset a random value?
The ASLR randomization algorithm always rounds to the page boundary, so there is no practical ability to change this at load-time.
If you want to adjust addresses at compile time, you can try altering the function address alignment (e.g. using -falign-functions=...); but you would be leaving windows in the code that could be used as trampoline locations.
Those offsets are defined in the ELF section data, so altering the offsets would involve altering those values before they get to the linker e.g. by making modifications to the link script.
If you pass -Wl,-verbose, you'll get a dump of the linker script that's used to generate the binary, and you could make adjustments to that script - it's different if you compile a pie file rather than a regular binary.
I don't know of any convenient flags that would allow you to alter the offset of the segments. Using the -Ttext option disables ASLR for the code segment, which is exactly the opposite of what you want.
Mind you, the linked paper seems to indicate that tools could be written to perform this without too much difficulty.
Too long for a comment, and probably not the answer you're looking for.

How can I know the addresses and sizes of all functions (including shared libraries) when the program is running under Linux?

The program is already loaded into memory. And I need to know all the function addresses and their sizes within the program source code (using tools like nm is OK). All functions mean, to include loaded shared library functions like "printf", and should be the real function address, not the PLT address. How could I implement that?
I am not sure that your question always makes sense, even if Employed Russian's answer gives a practically useful clue.
First, static functions in a stripped executable or library (including static functions inside shared libraries like libc) have no visible ELF symbols
Second, some compilers are able (using cloning of functions or other techniques) when optimizing strongly to have function code which is non-contiguous, e.g. because two functions share a piece of common machine code.
In a certain sense, this also happens when the compiler is optimizing tail-calls.
And most compilers are able to inline function calls (in particular to functions which are not defined as inline). With link-time optimization (e.g. code compiled and linked with gcc -flto -O3) it may happen even between several translation units.
You could experiment with dladdr(3) & backtrace(3). You'll find out that function code might have surprising or even poorly defined "boundaries".
I need to know all the function addresses and their sizes within the program source code
(using tools like nm is OK)
You could read /proc/self/maps to find out all ELF images currently mapped into your process, and run nm on each one.
That will give you all function addresses (for shared libraries, you would need to adjust nm output by the relocation (which you also get from /proc/self/maps), and most of function sizes.

examining text segment in a statically linked executable

I have a statically linked application binary that links against multiple user libraries and the pthread library. The application only uses a limited set of functions from each of these libraries. From a previous post Size of a library and the executable and from my experiments I realize that the linker only includes functions(in the executable) that are used/needed and not the entire contents of library.
I want to find out which functions from each of the respective libraries are linked
to the executable and their addresses (VMA). Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries based on the functions (in the library) that are mapped to the text segment.
One approach to do this is to make a list of functions in the library and then look for each of these functions in the executable and the corresponding Virtual Memory address it is mapped to. But this seems rather tedious to me. Is there a simpler way to accomplish this? Thanks.
I want to find out which functions from each of the respective libraries are linked to the executable and their addresses (VMA).
Add -Wl,-Map=foo.map argument to your link line. The resulting foo.map file will tell you all of the above.
Ultimately I want to compile a list that contains the start and the end Virtual Memory Addresses (VMAs) for the each of the libraries
That assumes that the linker does not re-order functions (and thus all functions from a single library occupy continuous range of text addresses). This assumption is probably true in simple cases, but in no way is guaranteed. See e.g. this patch.

How to modify an ELF file in a way that changes data length of parts of the file?

I'm trying to modify the executable contents of my own ELF files to see if this is possible. I have written a program that reads and parses ELF files, searches for the code that it should update, changes it, then writes it back after updating the sh_size field in the section header.
However, this doesn't work. If I simply exchange some bytes, with other bytes, it works. However, if I change the size, it fails. I'm aware of that some sh_offsets are immediately adjacent to each other; however this shouldn't matter when I'm reducing the size of the executable code.
Of course, there might be a bug in my program (or more than one), but I've already painstakingly gone through it.
Instead of asking for help with debugging my program I'm just wondering, is there anything else than the sh_size field I need to update in order to make this work (when reducing the size)? Is there anything that would make changing the length fail other than that field?
Edit:
It seems that Andy Ross was perfectly correct. Even in this very simple program I have come across some indirect addressing in __libc_start_main that I cannot trivially modify to update the offset it will reach.
I was curious though, what would be the best approach to still trying to get as far as possible with this problem? I know I cannot solve this in every case, but for some simple programs, it should be possible to update what is required to make it run? Should I try writing my own virtual machine or try developing a "debugger" that would replace each suspected problem instruction with INT 3? Any ideas?
The text segment is likely internally linked with relative offsets. So one function might be trying to jump to, say, "current address plus 194 bytes". If you move things around such that the jump target is now 190 bytes, you will obviously break things. The same is true of constant data on some architectures (e.g. x86-64 but not i686). There is no simple way short of a complete disassembly to know where the internal references are, and in fact it's computationally undecidable to find them all (i.e. trying to figure out all possible jump targets of a runtime-computed branch is the Halting Problem).
Basically, this isn't solvable in the general case, so if you have an ELF binary from someone else you're trying to patch, you'll need to try other techniques. But with (great!) care it's possible to produce a library where all internal references go through the GOT/PLT which can be sliced up and relinked like this. What are you trying to accomplish?
is there anything else than the sh_size field I need to update in order to make this work
It sounds like you are patching a fully-linked binary (ET_EXEC or ET_DYN). Please note that .sh_size is not used for anything after the static link is done. You can strip the entire section table, and the binary will continue to work fine. What matters at runtime are the segments in the ELF, not sections.
ELF stands for executable and linking format, and the executable and linking form "dual nature" of the ELF. Sections are used at (static) link time to combine into segments; which are used at execution time (aka runtime, aka dynamic linking time).
Of course you haven't told us exactly what your patching strategy is when you are shrinking your binary, and in what way the result is broken. It is very likely that Andy Ross's answer is the real cause of your breakage.

Resources