When ld-linux (Linux's loader) loads an application, it loads its ELF data structures to memory, builds some structures (e.g., GOT), and passes the execution to the entry point of the loaded application.
Is the loading of this application's code and data done into the loader's address space? Does the execution of the application's code occur in the loader's address space?
If not, what is the mechanism ld-linux uses to pass the execution to the loaded instructions?
Answer (EDIT): The application's code is loaded into the loader's address space. The application code and loader are ran on the same address space.
http://grahamwideman.wordpress.com/2009/02/09/the-linux-loader-and-how-it-finds-libraries/ http://www.tenouk.com/ModuleW.html basically there are assemblers and linkers too.The hiearchy of ld-linux (loader's linux is very well explained in the second url.
Thanks & Regards,
Alok
Related
When shared library was not complied as PIC, it still can be linked with the executable thorugh load-time relocation.
If I understand correctly, dynamic loader would look for entries listed in relocation table, and modifies them according to the memory mapping. That is, the code of shared library was adapted for the current process during load time.
My question is, how can another process uses the same shared library at the same time, dose the loader guarantee the memory mapping of the two processes are consistent? Or the library cannot be shared, and the OS would just load another copy of shared library into the memory?
When shared library was not complied as PIC, it still can be linked with the executable thorugh load-time relocation.
This statement needs qualifiers: this is only possible on some platforms (e.g. ELF32 ix86), but not others (e.g. ELF64 x86_64 with default medium memory model).
dynamic loader would look for entries listed in relocation table, and modifies them according to the memory mapping. That is, the code of shared library was adapted for the current process during load time.
Correct. Note that any memory pages that the loader had to update become unshared.
how can another process uses the same shared library at the same time
The other process will have its own copies of the loader-updated pages, but will share any unmodified pages with the first process.
dose the loader guarantee the memory mapping of the two processes are consistent?
No.
Or the library cannot be shared, and the OS would just load another copy of shared library into the memory?
Not quite: depending on how many pages need to be modified by the loader, some sharing can still happen.
P.S. When you build the shared library with -fPIC, the number of pages that need updating by the loader is minimized (all the places to be updated are grouped together in the .got section, instead of having these places spread throughout the .text section).
It appears that there is no information in the executable about where vdso should be. Assuming that I can control how the program is compiled, linked and written, how can I force it to be at the address I want it to be?
i86-32 bits system:
Is there a way to reserve a particular range of virtual address space in a process memory map to stop ld.so (dynamic linker) from loading any shared objects into that range?
I want to use at least 2 1G virtual memory to map the two 1G huge pages, however, ld.so load the shared library in the middle, so I can't map the 1G huge pages.
Compiler can't do this job. linker scripts can't as well. ld.so is loaded into the executable by the loader, then ld.so loads other shared libraries. however, ld.so itself even in the middle of the mapped space.
entry point of ld.so and libc.so are at a higher address, which can't be changed for our application.
Entry point address: 0x46c38810
Thanks,
Jiangtao
ld.so is loaded into the executable by the loader,
No: ld.so is the loader, and it is loaded into the process by the kernel.
You do have a few choices:
the easiest solution is link the binary fully-statically. Note that on Linux such binary could still dlopen other shared libraries, although this is not a well-supported or well-tested thing to do.
harder solution is to build your own patched ld.so, and make your application use that ld.so (using -Wl,--dynamic-linker=... flag).
if you don't want to do that, rtldi may help (it will run before ld.so).
The entry point address in the shared libs are edited by the prelink.
prelink is to avoid conflicts of load address of shared libraries, to optimize and speed-up run-time loader. By default it's on in our system.
prelink is a program that modifies ELF shared libraries and ELF dynamically linked binaries and assigns a unique virtual address space slot to each libs. In such a way that the time needed for the dynamic linker to perform relocations at startup significantly decreases. Due to fewer relocations, the run-time memory consumption decreases as well.
/usr/sbin/prelink -avmR
prelinks all binaries found in directories specified in /etc/prelink.conf and all their dependent libraries, assigning libraries unique virtual address space slots
By disabling the prelink, the entry point is not in the middle of the lib. so we can get another 1G memory mmaped.
I read some documents that share library comiled with -fPIC argument,
the .text seqment of the .so will be shared at process fork's dynamic linking stage
(eq. the process will map the .so to the same physical address)
i am interested in who (the kernel or ld.so ) and how to accomplish this?
maybe i should trace the code, but i dont know where to start it.
Nevertheless, i try to verify the statement.
I decide to check the function address like printf which is in the libc.so that all c program will link.
I get the printf virtual address of the process and need to get the physical address. Tried to write a kernel module and pass the address value to kernel, then call virt_to_phys. But it did not work cause the virt_to_phys only works for kmalloc address.
So, process page table look-at might be the solution to find the virtual address map to physical address. Were there any ways to do page table look-at? Or othere ways can fit the verify experiment?
thanks in advance!
The dynamic loader uses mmap(2) with MAP_PRIVATE and appropriate permissions. You can see what it does exactly by running a command from strace -e file,mmap. For instance:
strace -e file,mmap ls
All the magic comes from mmap(2). mmap(2) creates mappings in the calling process, they are usually backed either by a file or by swap (anonymous mappings). In a file-backed mapping, MAP_PRIVATE means that writes to the memory don't update the file, and cause that page to be backed by swap from that point on (copy-on-write).
The dynamic loader gets the info it needs from ELF's program headers, which you can view with:
readelf -l libfoo.so
From these, the dynamic loader determines what to map as code, read-only data, data and bss (zero-filled segment with zero size in file, non-zero size in memory, and a name only matched in crypticness by Lisp's car and cdr).
So, in fact, code and also data is shared, until a write causes copy-on-write. That is why marking constant data as constant is a potentially important space optimization (see DSO howto).
You can get more info on the mmap(2) manpage, and in Documentation/nommu-mmap.txt (the MMU case, no-MMU is for embedded devices, like ADSL routers and the Nintendo DS).
Shared libraries just a particular use of mapped files.
The address which a file is mapped at in a process's address space has nothing to do with whether it is shared or not.
Pages can be shared even if they are mapped at different addresses.
To find out if pages are being shared, do the following:
Find the address that the file(s) are mapped at by examining /proc/pid/maps
There is a tool which extracts data from /proc/pid/pagemap - find it and use it. This gives you info as to exactly which page(s) of a mapping are present and what physical location they are at
If two processes have a page mapped in at the same physical address, it is of course, shared.
I would like to collect here what happens when you run an executable on Windows, Linux and OSX. In particular, I would like to understand exactly the order of the operations: my guess is that the executable file format (PE, ELF or Mach-O) is loaded by the kernel (but I ignore the various sections of the ELF(Executable and Linkable Format) and their meaning), and then you have the dynamic linker that resolves the references, then the __init part of the executable is run, then the main, then the __fini, and then the program is completed, but I am sure it's very rough, and probably wrong.
Edit: the question is now CW. I am filling up for linux. If anyone wants to do the same for Win and OSX it would be great.
This is just at a very high and abstract level of course!
Executable - No Shared Libary:
Client request to run application
->Shell informs kernel to run binary
->Kernel allocates memory from the pool to fit the binary image into
->Kernel loads binary into memory
->Kernel jumps to specific memory address
->Kernel starts processing the machine code located at this location
->If machine code has stop
->Kernel releases memory back to pool
Executable - Shared Library
Client request to run application
->Shell informs kernel to run binary
->Kernel allocates memory from the pool to fit the binary image into
->Kernel loads binary into memory
->Kernel jumps to specific memory address
->Kernel starts processing the machine code located at this location
->Kernel pushes current location into an execution stack
->Kernel jumps out of current memory to a shared memory location
->Kernel executes code from this shared memory location
->Kernel pops back the last memory location and jumps to that address
->If machine code has stop
->Kernel releases memory back to pool
JavaScript/.NET/Perl/Python/PHP/Ruby (Interpretted Languages)
Client request to run application
->Shell informs kernel to run binary
->Kernel has a hook that recognises binary images needs a JIT
->Kernel calls JIT
->JIT loads the code and jumps to a specific address
->JIT reads the code and compiles the instruction into the
machine code that the interpretter is running on
->Interpretture passes machine code to the kernel
->kernel executes the required instruction
->JIT then increments the program counter
->If code has a stop
->Jit releases application from its memory pool
As routeNpingme says, registers are set inside the CPU and the magic happens!
Update: Yeah, I cant speell properly today!
Ok, Answering my own question. This will be done progressively, and only for Linux (and maybe Mach-O). Feel free to add more stuff to your personal answers, so that they get upvoted (and you can get badges, since it's now CW).
I'll start halfway, and build the rest as I find out. This document has been made with a x86_64, gcc (GCC) 4.1.2.
Opening the file, initialization
In this section, we describe what happens when the program is invoked, from the kernel point of view, until the program is ready to be executed.
The ELF is opened.
the kernel looks for the .text section and loads it into memory. Marks it as readonly
the kernel loads the .data section
the kernel loads the .bss section, and initializes all the content to zero.
the kernel transfers the control to the dynamic linker (whose name is inside the ELF file, in the .interp section). The dynamic linker resolves all the shared library calls.
the control is transferred to the application
Execution of the program
the function _start gets invoked, as the ELF header specifies it as the entry point for the executable
_start calls __libc_start_main in glibc (through the PLT) passing the following information to it
the address of the actual main function
the argc address
the argv address
the address of the _init routine
the address of the _fini routine
a function pointer for the atexit() registration
the highest stack address available
_init gets called
calls call_gmon_start to initialize gmon profiling. not really related to execution.
calls frame_dummy, which wraps __register_frame_info(eh_frame section address, bss section address) (FIXME: what does this function do? initializes global vars from the BSS section apparently)
calls __do_global_ctors_aux, whose role is to call all the global constructors listed in the .ctors section.
main gets called
main ends
_fini gets called, which in turns calls __do_global_dtors_aux to run all destructors as specified in the .dtors section.
the program exits.
On Windows, first the image is loaded into memory. The kernel analizes which libraries (read "DLL") it is going to require and loads them up too.
It then edits the program image to insert the memory addresses of each of the library functions it requires. These addresses have a space in the .EXE binary already, but they are just filled with zeros.
Each DLL's DllMain() procedure then gets executed, one by one, from the most required DLL to the last, like following an order of dependences.
Once all libraries were loaded and got ready, finally the image is started, and whatever happens now will depend on language used, compiler used, and the program routine itself.
As soon as the image is loaded into memory, magic takes over.
Well, depending on your exact definition you have to account for JIT compilers for languages like .Net and Java. When you run a .Net "exe" which isn't technically "executable", the JIT compiler steps in and compiles it.