As far as I understand it, the ELF format names all external symbols that are linked by a linker. What I want to do is loading an unlinked ELF and link it in memory and dynamically.
Can someone explain me some necessary details how the linking is down and what changes where in memory?
[update]
To clarify and defend the question from being to broad again here are some more information.
I understand jump addresses and can obtain jump addresses from in memory functions using C. So loading a C program I get all those addresses.
The parameter passing is known and hard wired there is nothing about it.
Within the ASM or C o-file (ELF-Format) I find a symbol table which should be related to the problem but I do not know where in the op codes I have to adapt it and how.
And please do not tell me to read the gnu libc stuff. This is not the problem to face. I neither want to dig into the so format nor do I want to fiddle down their OP code parsing stuff.
So how is the symbol table handled in memory and expressed? It should be a simple answer for someone who knows how it is done.
ELF is divided into sections onto disk, but segments in memory.
Basically the loading process means doing the section to segments mapping, and then process fixups. (including global tables like GOT)
By symbol loading is a different way of loading. It probably does the first process, and then searches for symbol in special tables.
Anyway the best free resource to get a quick insight is to read the "Linkers and Loaders" free ebook book by J. Irvine.
Related
In what phase of compilation does the compiler replace label into actual addr
I understanding instructions like jmp abc where abc is just a note and will be replace to actual address eventually, does it ?
Does the final .so file still contain infomation about label or the label is replace to actual addr when its load in the memory ?
TL;DR - your question is hard to answer, because it is mixing a few concepts. For typical assembler labels, we use PC relative and labels are resolve at assemble time. For other 'external' labels, there are many cases and the resolution depends on the case.
There are four conceptual ways to address on almost all CPUs, and definitely on the ARM.
PC relative address. Current instruction +/- offset.
Absolute address. This is the one you are conceptually thinking of.
Register computed address. Calculated at run time. ldr pc, [rn, #xx]
Table based addressing. Global offset table, etc. Much like registers computed addresses. ldr pc, [Rbase, Rindex, lsl #2]
The first two fit in a single instruction and are very efficient. The first is most desirable as the code can execute at ANY address as long as it maintains it's original layout (ie, you don't load it by splitting the code up).
In the table above, there is also the concept of 'build time' and 'run time'. The distinction is the difference between a linker and a loader. You have tagged this 'linux' and refer to an 'so' or shared library. Also, you are referring to assembler 'labels'. They are very similar concepts, but can be different as they will be one of the four classes of addressing above.
Most often in assembler, the labels are PC relative. There is no additional structure to be implemented with PC relative, except to keep the chunk of code continuous. In the case of an assembler that is a 'module' (compilation unit, for a compile) or is processed by the assembler and produced an 'object', it will use a PC relative addressing.
The object format can be annotate with external addresses and there are many choices in how an assembler may output these address. They are generally controlled by 'psuedo-ops'. That is a note (separate section with defined format) in the object file; the instruction is semi-complete in this form. It may prepare to use an offset table, use a register based compute (like r9+constant), etc.
For the typical case of linking (done at build time), we will either use PC relative or absolute. If we fix our binary to only run at one address, the assembler can setup for absolute addressing and resolve these through linking. In this case, the binary must be loaded at a fixed address. The assembler 'modules' or object files can be completely glued together to have everything resolved. Then there is no 'load' time fix ups. Other determining factor are whether code/data are separate, and whether the system is using an MMU. It is often desirable to keep code constant, so that many processes can use the same RAM/ROM pages, but they will have separate data. As well as memory efficient, this can provide some form of security (although it is not extremely robust) it will prevent accidental code overwrites and will provide debugging help in the form of SIGSEGV.
It is possible to write a PC-relative initialization routine which will do the fix-ups to create a table in your own binary. So a 'loader' is just to determine where you are running and then make calculations. For statically shared libraries, you typically know the libraries you will run, but not where they are. For dynamically shared libraries, you might not even know at compile time what the library is that you will run.
A Linux distribution can use either. If you have some sort of standard Linux Desktop distribution, (Ubuntu/Debian, Redhat, etc). You will have something base on ARM ELF LSB and dynamic shared libraries. You need to use the assembler pseudo ops to support this type of addressing or use a compiler to do it for you. The majority of all 'labels' in a shared library will be PC relative and not show up. Some labels can show up for debugging reasons (man strip) and some are absolutely needed to resolve addresses at run time.
I have also asked a question that I find related some time ago, Using GCC pre-processor as an assembler... So the key concept is that the assembler is generally 'two pass' and needs to do these local address fix ups. Then this question asks a 2nd level Concept A/B where we are adding shared libraries. The online book Linkers and Loaders is a great resource if you want to known more.
See also:
Static linked shared libraries
Thumb start function
What is the point of busybox?
Final executable has to have all addresses, otherwise it would not work.
Thing to remember is there are static linking and dynamic linking (eg using shared libraries). In case of static linkage binary file has all addresses resolved. In case of dynamic linkage addresses are resolved during loading, while binary has relocation information that are replaced with actual addresses by dynamic linker. But by the end of a day, loaded binary in memory has all addresses.
In what phase of compilation does the compiler replace label into
actual addr
Compiler could replace with actual address when it knows destination address. For example that's a call to function in same compilation unit.
When destination address is outside of compilation unit and out of reach for compiler, compiler leaves a relocation information in object file. Linker replace that with an actual address in memory at same time.
Can the 3 essential sections: .data (resources), .rdata (imports), and .text (instructions) in the Portable Executable (.exe) file format be in any order as long as the 'Address of Entry Point' field points to the .text section? It seems like having the instructions (.text) be first is a big pain in the butt since you have to calculate the imports and resources sections to actually WRITE the instructions section...
This is what I'm going off of: https://i.imgur.com/LIImg.jpg
What about for run-time performance?
As already answered by Hans, the linker is free to arrange sections in any order, as seen best fit. The only exception is named sections like .text$A and .text$B, where the sections must be sorted in lexicographical order according to the suffix following the $.
The order in which the sections are written by the linker is not of great significance to how easy it is to produce the final binary, either. Typically, the binary file isn't written sequentially as the sections are computed; rather, the section contents are produced in buffers, and the references between code and data are kept symbolic (in a relocatable format) until the sections are written to the final executable.
The part of the question relating to performance has more to do with how the image loader in Windows works, rather than the linker. Because the loader does not need the sections in any particular order, there is no additional overhead (e.g. related to sorting) when unpacking the sections into the memory view of the image file. Relocations and matching between import and export tables are done in any case, and the amount of work is decided by other factors. Hence, the order decided by the linker does not in itself affect the loading time.
For normal Windows API or Native binaries (not CLR), the section names are not important either--only the characteristics of each section, which decide e.g. the access rights of the memory mapped pages in the image (whether they are read-only, writable, executable, etc.). For example, the import table may be placed in a section named .idata rather than .rdata, or the section may be named something completely different.
The format of a PE file is described in detail by the pecoff.doc document (direct link to a Word2003 file). What you are asking about is covered in chapter 4, it talks about the Section Table. The most relevant detail:
The number of entries in the Section Table is given by the NumberOfSections field in the file header. Entries in the Section Table are numbered starting from one. The code and data memory section entries are in the order chosen by the linker.
So no, this is not cast in stone, sections can appear in any order.
It seems like having the instructions (.text) be first is a big pain
As hinted by the pecoff language, it is a linker implementation detail. And to Microsoft's linker, and probably most any other linker, it is not actually a big pain. It's first and foremost job is to generate the executable code and there tends to be a lot of it. And not all of the code is used, just what is needed to resolve the dependencies. Which is a very common scenario, a static C runtime library would be a classic example. Your program does not call every possible runtime function, the linker only links in what is needed.
Details like relocations and imports are a minor detail, there are just not nearly as many of them. So it is a lot more efficient to first generate the code and keep track of the required relocations and imports to match that code in memory, to write them to the PE file later.
Your assumption that it is "better" the other way around is not accurate. To a linker anyway.
I'm trying to modify the executable contents of my own ELF files to see if this is possible. I have written a program that reads and parses ELF files, searches for the code that it should update, changes it, then writes it back after updating the sh_size field in the section header.
However, this doesn't work. If I simply exchange some bytes, with other bytes, it works. However, if I change the size, it fails. I'm aware of that some sh_offsets are immediately adjacent to each other; however this shouldn't matter when I'm reducing the size of the executable code.
Of course, there might be a bug in my program (or more than one), but I've already painstakingly gone through it.
Instead of asking for help with debugging my program I'm just wondering, is there anything else than the sh_size field I need to update in order to make this work (when reducing the size)? Is there anything that would make changing the length fail other than that field?
Edit:
It seems that Andy Ross was perfectly correct. Even in this very simple program I have come across some indirect addressing in __libc_start_main that I cannot trivially modify to update the offset it will reach.
I was curious though, what would be the best approach to still trying to get as far as possible with this problem? I know I cannot solve this in every case, but for some simple programs, it should be possible to update what is required to make it run? Should I try writing my own virtual machine or try developing a "debugger" that would replace each suspected problem instruction with INT 3? Any ideas?
The text segment is likely internally linked with relative offsets. So one function might be trying to jump to, say, "current address plus 194 bytes". If you move things around such that the jump target is now 190 bytes, you will obviously break things. The same is true of constant data on some architectures (e.g. x86-64 but not i686). There is no simple way short of a complete disassembly to know where the internal references are, and in fact it's computationally undecidable to find them all (i.e. trying to figure out all possible jump targets of a runtime-computed branch is the Halting Problem).
Basically, this isn't solvable in the general case, so if you have an ELF binary from someone else you're trying to patch, you'll need to try other techniques. But with (great!) care it's possible to produce a library where all internal references go through the GOT/PLT which can be sliced up and relinked like this. What are you trying to accomplish?
is there anything else than the sh_size field I need to update in order to make this work
It sounds like you are patching a fully-linked binary (ET_EXEC or ET_DYN). Please note that .sh_size is not used for anything after the static link is done. You can strip the entire section table, and the binary will continue to work fine. What matters at runtime are the segments in the ELF, not sections.
ELF stands for executable and linking format, and the executable and linking form "dual nature" of the ELF. Sections are used at (static) link time to combine into segments; which are used at execution time (aka runtime, aka dynamic linking time).
Of course you haven't told us exactly what your patching strategy is when you are shrinking your binary, and in what way the result is broken. It is very likely that Andy Ross's answer is the real cause of your breakage.
We have a need to access kernel global vars in net/ipv4/af_inet.c that are not exported explicitly from a loadable kernel module. We are using 2.6.18 kernel currently.
kallsyms_lookup_name doesn't appear to be available anymore (not exported)
__symbol_get returns NULL (upon further reading, symbol_get/__symbol_get looks through the kernel and existing modules' symbol tables that contains only exported symbol, and it is there to make sure the module from which a symbol is exported is actually loaded)
Is there anyway to access symbols that are not exported from a kernel module?
After doing a lot of reading and looking at answers people provided, it appears that it would be very hard to find one method across many kernel versions since the kAPI changes significantly over time.
You can use the method you mentioned before by getting it from /proc/kallsyms or just use the address given in the System.map (which is the same thing), it may seem hackish but this is how I've seen it done before (never really had to do it myself). Either this or you can build your own custom kernel where you actually do EXPORT_SYMBOL on whatever it is you want exported but this is not as portable.
If performance is not a big concern, you can traverse the whole list of symbols with kallsyms_on_each_symbol() (exported by the kernel for GPL'd modules) and check the names to get the ones you need. I would not recommend doing so unless there is no other choice though.
If you would like to go this way, here is an example from one of our projects. See the usage of kallsyms_on_each_symbol() as well as the code of symbol_walk_callback(), the other parts are irrelevant to this question.
I'm looking for a simple way to reorder the ELF file sections. I've got a sequence of custom sections that I would like all to be aligned in a certain order.
The only way I've found how to do it is to use a Linker script. However the documentation indicates that specifying a custom linker script overrides the default. The default linker script has a lot of content in it that I don't want to have to duplicate in my custom script just to get three sections always together in a certain order. It does not seem very flexible to hard code the linker behavior like that.
Why do I want to do this? I have a section of data that I need to know the run-time memory location of (beginning and end). So I've created two additional sections and put sentinel variables in them. I then want to use the memory locations of those variables to know the extent of the unknown section in memory.
.markerA
int markerA;
.targetSection
... Lots of variables ...
.markerB
int markerB;
In the above example, I would know that the data in .targetSection is between the address of markerA and markerB.
Is there another way to accomplish this? Are there libraries that would let me read in the currently executing ELF image and determine section location and size?
You can obtain addresses of loaded sections by analyzing the ELF-File format. Details may be found e.g. in
Tool Interface Standard (TIS)
Portable Formats Specification,
version 1.2
(http://refspecs.freestandards.org/elf/elf.pdf)
for a short impression which information is available its worth to take a look at readelf
readelf -S <filename>
returns a list of all sections contained in .
The sections which were mapped into memory were typed PROGBITS.
The address your are looking for is displayed in the column Addr.
To obtain the memory location you have to add the load address of your
executable / shared object
There are a few ways to determine the load adress of your executable/shared object:
you may parse /proc/[pid]/maps (the first column contains the load address). [pid] is the process id
if you know one function contained in your file you can apply dlsym to receive a pointer to the function. That pointer is the input parameter for dladdr returning a Dl_info struct containing the requested load address
To get some ELF information the library
libelf
may be a helpful companian (I detected it after studying the above mentioned TIS so I only took a short look at it and I don't know deeper details)
I hope this sketch of a possible solution will help.
You may consider using GCC's initializers to reference the variables which would go into a separate section otherwise and maintain all their pointers in an array. I recommend using initializers because this works file-independently.
You may look at ELFIO library. It contains WriteObj and Writer examples. By using the library, you will be able to create/modify ELF binary file programmatically.
I'm afraid override the default link script is the simple solution.
Since you worried about it might not be flexible (even I think the link script does change that often), you could write a script to generate a link script based on host system's default ld script ("ld --verbose") and insert your special sections into it.