how to reserve a particular range of virtual memory from a linux process - linux

i86-32 bits system:
Is there a way to reserve a particular range of virtual address space in a process memory map to stop ld.so (dynamic linker) from loading any shared objects into that range?
I want to use at least 2 1G virtual memory to map the two 1G huge pages, however, ld.so load the shared library in the middle, so I can't map the 1G huge pages.
Compiler can't do this job. linker scripts can't as well. ld.so is loaded into the executable by the loader, then ld.so loads other shared libraries. however, ld.so itself even in the middle of the mapped space.
entry point of ld.so and libc.so are at a higher address, which can't be changed for our application.
Entry point address: 0x46c38810
Thanks,
Jiangtao

ld.so is loaded into the executable by the loader,
No: ld.so is the loader, and it is loaded into the process by the kernel.
You do have a few choices:
the easiest solution is link the binary fully-statically. Note that on Linux such binary could still dlopen other shared libraries, although this is not a well-supported or well-tested thing to do.
harder solution is to build your own patched ld.so, and make your application use that ld.so (using -Wl,--dynamic-linker=... flag).
if you don't want to do that, rtldi may help (it will run before ld.so).

The entry point address in the shared libs are edited by the prelink.
prelink is to avoid conflicts of load address of shared libraries, to optimize and speed-up run-time loader. By default it's on in our system.
prelink is a program that modifies ELF shared libraries and ELF dynamically linked binaries and assigns a unique virtual address space slot to each libs. In such a way that the time needed for the dynamic linker to perform relocations at startup significantly decreases. Due to fewer relocations, the run-time memory consumption decreases as well.
/usr/sbin/prelink -avmR
prelinks all binaries found in directories specified in /etc/prelink.conf and all their dependent libraries, assigning libraries unique virtual address space slots
By disabling the prelink, the entry point is not in the middle of the lib. so we can get another 1G memory mmaped.

Related

How can process share shared library with load-time relocation?

When shared library was not complied as PIC, it still can be linked with the executable thorugh load-time relocation.
If I understand correctly, dynamic loader would look for entries listed in relocation table, and modifies them according to the memory mapping. That is, the code of shared library was adapted for the current process during load time.
My question is, how can another process uses the same shared library at the same time, dose the loader guarantee the memory mapping of the two processes are consistent? Or the library cannot be shared, and the OS would just load another copy of shared library into the memory?
When shared library was not complied as PIC, it still can be linked with the executable thorugh load-time relocation.
This statement needs qualifiers: this is only possible on some platforms (e.g. ELF32 ix86), but not others (e.g. ELF64 x86_64 with default medium memory model).
dynamic loader would look for entries listed in relocation table, and modifies them according to the memory mapping. That is, the code of shared library was adapted for the current process during load time.
Correct. Note that any memory pages that the loader had to update become unshared.
how can another process uses the same shared library at the same time
The other process will have its own copies of the loader-updated pages, but will share any unmodified pages with the first process.
dose the loader guarantee the memory mapping of the two processes are consistent?
No.
Or the library cannot be shared, and the OS would just load another copy of shared library into the memory?
Not quite: depending on how many pages need to be modified by the loader, some sharing can still happen.
P.S. When you build the shared library with -fPIC, the number of pages that need updating by the loader is minimized (all the places to be updated are grouped together in the .got section, instead of having these places spread throughout the .text section).

I am wondering if pie does anything if the aslr is turned off on the system? Or is pie dependant on aslr?

I am also wondering what does pie and what does aslr effect in memory as I understand, aslr randomizes addresses of libc base,stack and heap. And pie randomizes elf base and with that .text,.data,.bss,.rodata...
Is that correct or am I getting it wrong?
PIE requires position-independent code, costing a small amount of performance. (Or a large amount like 15% on ISAs like 32-bit x86 that don't easily support PC-relative addressing). See 32-bit absolute addresses no longer allowed in x86-64 Linux?.
With ASLR disabled, e.g. when running under GDB or with it disabled system-wide, Linux chooses a base address of 0x0000555555555000 to map the executable, so objdump -d addresses relative to the file start like 0x4000 end up at that high virtual address.
A PIE executable is an ELF shared object (like a .so), as opposed to an ELF "executable". An ELF executable has a base address in the ELF headers, set by the linker, so absolute addresses can be hard-coded into the machine code and data without needing relocations for fixups. (That's why there's no way to ASLR a proper ELF executable, only a PIE).
The mechanism that supports PIE was originally just a fun hack of putting an entry point in a library. Later people realized it would be useful for ASLR for static code/data in executables to be possible, so this existing support became official. (Or something like that; I haven't read up on the history.)
But anyway, ASLR is enabled by PIE, but PIE is a thing even with ASLR disabled, if you want the most general non-technical description.

Which libraries appear in /proc/$PID/pmaps?

On Linux you can inspect /proc/$PID/pmaps to see the libraries loaded by a particular program, and a program can open /proc/self/pmaps to examine the libraries it itself has loaded.
I know pmaps will only contain dynamic libraries, and obviously the kernel can't predict which libraries we might dlopen at a later point, so I expect those aren't included in /proc/self/maps. But I'm unsure of a few other other scenarios:
Are libraries that have been linked at build time but we haven't called any function in yet included? My understanding is the Linux delays linking symbols until the first time they are used, so I'm not sure if they'll show up.
Does pmaps contain all the libraries used recursively? E.g. if I look at each library in pmaps and run ldd on it, and then run ldd on those, ad nauseum, I shouldn't find any new libraries that weren't in the original pmaps? I tried this on a couple binaries and it appears to be so but maybe I'm getting lucky.
Are libraries that have been linked at build time but we haven't called any function in yet included?
Yes: the runtime loader will mmap every library that your executable directly depends on, before your program starts running.
You can find the list of such libraries by running
readelf -d a.out | grep NEEDED
Does pmaps contain all the libraries used recursively?
Yes: if a library that you directly depend on itself depends on some other library, the runtime loader will mmap the recursive dependencies as well.
My understanding is the Linux delays linking symbols until the first time they are used
That is mosty correct for function symbols, but false for data symbols, which can't be resolved lazily.
Also, whether the symbols are resolved lazily or not depends on LD_BIND_NOW environment variable, and on an equivalent setting in the executable dynamic section, controlled by -znow linker flag.
None of that changes the mmap pciture though; if you have a DT_NEEDED entry for foo.so in your dynamic section, then foo.so will be mmaped (and will show in /proc/self/*map*) independent of lazy or non-lazy resolution.
/proc/$pid/maps is not only going to list libraries that are loaded, but also ALL other mapped memory segments.
Read this thread and the article in there:
Understanding Linux /proc/id/maps

does dynamic library shared global variable in linux

As we know, linux call ldconfig to load all *.so libraries and then link the applications who use the shared library. However, I am confused how the global variable is working in this case. Since there is only one copy of shared library across all these application, do they share the global variables in the shared library? If yes, then how they synchronize?
Thanks,
No it is not shared - the code/text section of the library is shared - the data portion is unique to each process that uses the library
As I commented:
Levine's book on linkers and loaders is a useful reference.
Linux dynamic linker ld.so is free software, part of GNU libc and you can study and improve its source code
the dynamic linker is ld.so not ldconfig (which just updated cached information used by ld.so).
the ld.so linker is using the mmap(2) system call to project some .so segments into the virtual address space of the process; the "text" segment (for code and read-only constants) uses MAP_SHARED with PROT_READ. The "data" segment (for global or static variables in C or C++) uses MAP_PRIVATE with PROT_WRITE
you would learn a lot by strace-ing your program to get a feeling of the involved system calls.

How the share library be shared by different processes?

I read some documents that share library comiled with -fPIC argument,
the .text seqment of the .so will be shared at process fork's dynamic linking stage
(eq. the process will map the .so to the same physical address)
i am interested in who (the kernel or ld.so ) and how to accomplish this?
maybe i should trace the code, but i dont know where to start it.
Nevertheless, i try to verify the statement.
I decide to check the function address like printf which is in the libc.so that all c program will link.
I get the printf virtual address of the process and need to get the physical address. Tried to write a kernel module and pass the address value to kernel, then call virt_to_phys. But it did not work cause the virt_to_phys only works for kmalloc address.
So, process page table look-at might be the solution to find the virtual address map to physical address. Were there any ways to do page table look-at? Or othere ways can fit the verify experiment?
thanks in advance!
The dynamic loader uses mmap(2) with MAP_PRIVATE and appropriate permissions. You can see what it does exactly by running a command from strace -e file,mmap. For instance:
strace -e file,mmap ls
All the magic comes from mmap(2). mmap(2) creates mappings in the calling process, they are usually backed either by a file or by swap (anonymous mappings). In a file-backed mapping, MAP_PRIVATE means that writes to the memory don't update the file, and cause that page to be backed by swap from that point on (copy-on-write).
The dynamic loader gets the info it needs from ELF's program headers, which you can view with:
readelf -l libfoo.so
From these, the dynamic loader determines what to map as code, read-only data, data and bss (zero-filled segment with zero size in file, non-zero size in memory, and a name only matched in crypticness by Lisp's car and cdr).
So, in fact, code and also data is shared, until a write causes copy-on-write. That is why marking constant data as constant is a potentially important space optimization (see DSO howto).
You can get more info on the mmap(2) manpage, and in Documentation/nommu-mmap.txt (the MMU case, no-MMU is for embedded devices, like ADSL routers and the Nintendo DS).
Shared libraries just a particular use of mapped files.
The address which a file is mapped at in a process's address space has nothing to do with whether it is shared or not.
Pages can be shared even if they are mapped at different addresses.
To find out if pages are being shared, do the following:
Find the address that the file(s) are mapped at by examining /proc/pid/maps
There is a tool which extracts data from /proc/pid/pagemap - find it and use it. This gives you info as to exactly which page(s) of a mapping are present and what physical location they are at
If two processes have a page mapped in at the same physical address, it is of course, shared.

Resources