Why are shared files mapped 4 times in a process - linux

I was trying to understand the maps in the /proc file system in Linux. I observed that every shared file was mapped 4 times, with different offsets and permissions. I concluded that these must be the different sections in the ELF and hence are mapped differently (.text, .data, .rodata, etc).
But what was surprising is that two of the mappings always had the same offset in the file. Consider the example -
7fb8eebd6000-7fb8eebe0000 r-xp 00000000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eebe0000-7fb8eeddf000 ---p 0000a000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eeddf000-7fb8eede0000 r--p 00009000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eede0000-7fb8eede1000 rw-p 0000a000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
The 2nd and the 4th entry is mapped at the same offset in the file with different permissions.
Upon running objdump --headers on the mentioned .so file, file offset 0xa000 seems to the .got.plt section.
24 .got.plt 00000160 000000000020a000 000000000020a000 0000a000 2**3
CONTENTS, ALLOC, LOAD, DATA
Can some one throw light on why it is mapped twice?
I know about the PLT table that it is patched the first time the function is visited, and hence might need a write permission, but why another mapping without any read/write permissions?
Edit: I checked a few other shared library mappings and it is not the .got.plt section that is mapped twice. But there is always one section that is mapped twice and the double mapping always has the ---p permissions.

Related

What is this memory section in /proc/maps

I am debugging a memory increase issue by dumping periodic snap of /proc//maps to get summary of leaked areas. While certain sections are obvious, I could not comprehend below one. Here is diff of memory area of process in observation in 1 minute interval. Does this represent stack area (since downward growing) or something else?
7f0ffe072000-7f0ffeb72000 rw-p 00000000 00:00 0 --> (before)
7f0ffdff2000-7f0ffeb72000 rw-p 00000000 00:00 0 --> (after)
Stack has proper TAG in maps, hence I believe the above section is not stack. Could someone guide on this memory area?

Why data segment starts at a non-page boundary?

I am trying to understand the relation between a compiled program in Linux and how it is loaded in the main memory.
I understand that when a program is loaded in memory, all its virtual pages go in some 'page frames' of the main memory.
Below is the snippet of readelf output for my program.
readelf --segments a.out
Elf file type is DYN (Shared object file)
Entry point 0x1060
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
......
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000600 0x0000000000000600 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000fc5 0x0000000000000fc5 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x0000000000000190 0x0000000000000190 R 0x1000
LOAD 0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
0x00000000000002cc 0x00000000000002d0 RW 0x1000
.......
Here,
(1) First segment (R) starts at the virtual address '0x0000000000000000' and has size '0x600'. Consumes 1 Page.
(2) Second segment (R,E, text segment) starts at the virtual address '0x0000000000001000' and has size '0xfc5'. Consumes 1 Page.
(3) Third segment (R) starts at the virtual address '0x0000000000002000' and has size '0x190'.
Consumes 1 Page.
(4) Fourth segment (RW, data segment) stars at address '0x0000000000003db8', why?
The fourth segment should have started at '0x0000000000003000' as the third segment has size of only '0x190' bytes and also the Align for this segment is 0x1000(4096), which is a page boundary.
LOAD 0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
0x00000000000002cc 0x00000000000002d0 RW 0x1000
After loading this program, the physical memory mappings are,
cat /proc/4125/maps
56028da61000-56028da62000 r--p 00000000 08:05 1442315 a.out
56028da62000-56028da63000 r-xp 00001000 08:05 1442315 a.out
56028da63000-56028da64000 r--p 00002000 08:05 1442315 a.out
56028da64000-56028da65000 r--p 00002000 08:05 1442315 a.out
56028da65000-56028da66000 rw-p 00003000 08:05 1442315 a.out
The linker has put segment 4 starting in a page shared with segment 3, to save 0x0248 bytes of file space. That page is to be mapped into virtual memory in two different places, once at relative address 0x2000 (read-only) and again at relative address 0x3000 (read-write). Since the zeroth page of virtual memory stays unmapped to catch null pointer dereferences, the base address for the program is 0x1000 and the program's virtual memory will contain:
0x3000 - 0x3190: read-only version of segment 3
0x3db8 - 0x4000: read-only version of part of segment 4 (will not be used)
0x4000 - 0x4190: read-write version of segment 3 (will not be used)
0x4db8 - 0x5084: read-write version of segment 4
Now, even though I said the page at 0x4000 would be read-write, the output from /proc/NNN/maps shows it as read-only. It looks like something in the C startup code actually calls mprotect at runtime to change the permissions on that page; you can see it happening with strace.
If you dump the section headers (not segment), you should see that the address range 0x4db8-0x5000 corresponds to the sections .init_array, .fini_array, .dynamic and .got, with the normal writable .data section starting at 0x5000. I haven't looked into it, but I presume that for some reason .init_array and such need to be writable for initialization, but can be read-only during the rest of the program's execution.
The alignment field doesn't refer to the alignment of the first byte of segment 4. Rather, the starting address of segment 4 is rounded down to a multiple of the alignment (to 0x3000), and that address from the file is placed in virtual memory aligned to 0x1000 bytes. As explained at What is p_align in elf header?, the rule is that Offset and VirtAddr must be congruent mod Align. Which they are: when either one is divided by 0x1000, the remainder is 0x02db.

Is it possible to add a customized name for the (non file-backed) mmap region?

Just curious whether it is possible to specify a name for the non file-backed mmap region? Some thing like the [New VMA area] in the following example:
$ cat /proc/12345/maps
...
7fc062ef2000-7fc062f84000 r-xp 00000000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f85000-7fc062f86000 r--p 00092000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f86000-7fc062f87000 rw-p 00093000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f87000-7fc062f8a000 rw-p 00000000 00:00 0 [New VMA area]
7fff6c384000-7fff6c3a5000 rw-p 00000000 00:00 0 [stack]
7fff6c3bd000-7fff6c3c0000 r--p 00000000 00:00 0 [vvar]
The content of maps comes from the show_map_vma function in fs/proc/task_mmu.c. Looking at it, if you want a custom name for a non-file-backed mapping, it'd need to come from either vma->vm_ops->name or arch_vma_name. arch_vm_name is architecture-specific (as you'd expect from the name) and serves only to add a few hardcoded names for certain regions, so it's not useful to you. That leaves vma->vm_ops->name as your only possibility, but when you call mmap with MAP_ANONYMOUS, vma_set_anonymous sets vma->vm_ops to NULL. Thus, strictly what you asked for is impossible without custom kernel code.
However, it may still be possible to do what you want. If you actually just want the memory to not be backed by disk, you can create an FD with the memfd_create syscall or a file in a non-disk-backed filesystem such as a tmpfs (e.g., /dev/shm, like shm_open uses). Either of these approaches will give you control over the filename that is used (e.g., /memfd:some_name (deleted) or /dev/shm/some_name).

GDB find value within memory

I am having trouble with the find command in gdb. What I am trying to do is, finding a specific value e.g. (964) within the memory of a process. I have been successfull but my solution takes way to long!
What I did:
cat /proc/16617/maps
Than getting the libc_malloc parts, I know the value is within those.
7e43c000-7e451000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e452000-7e45b000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e470000-7e47c000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e47d000-7e490000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e4cc000-7e4dc000 rw-p 00000000 00:00 0 [anon:libc_malloc]
Than I could use the examine command to investigate each value. This one is working but it takes so much time. I guess the find command within gdb is way faster doing this.
0x78b0e070: 0 829055599 57 59
0x78b0e080: 2 2024857820 2024857860 2024857900
0x78b0e090: 1970810756 4587520 71 0
0x78b0e0a0: 0 2024857756 2024857756 0
0x78b0e0b0: 0 27 1970675312 1
0x78b0e0c0: 1 2024857728 0 43
0x78b0e0d0: 23 23 0 1936029041
E.g addresses do not match due to several runs, but it is working :)
x/dw 0x78B0E19c --> Result 964
My find command was sth like this:
find 0x419a1000, 0x7e5b6000, 964
But I get as result these
0x419a3058
0x419a30c0
0x419a348c
0x419a3e7d
0x419a3fec
5 patterns found.
And this is not ture. It can t be possible that it is just within 5 addresses. It should be way more. It also seems that its not searching correctly. Because it just within the 0x419a3xxx range.
I am searching an int value within an arm architecture. But that point shouldn t matter. What I am doing wrong? Could you please provide me some examples to find all int values (964) within the address space of a process? And this pretty fast? :)
Thank you!

Why do Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff?

According to Assembly Primer For Hackers (Part 2) Virtual Memory Organization, Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff. What is the significance of these numbers? Why not start .text at 0x0000000 (or 0x0000020 or 0x0000040 to go the next 32 or 64 bits past NULL)? Why not start the top of the stack at 0xfffffff?
Let's start by saying this: most of the time, the various sections do not need to be placed in a specific location, what matters more is the layout. Nowadays, the stack top is actually randomised, see here.
0x08048000 is the default address on which ld starts the first PT_LOAD segment on Linux/x86. On Linux/amd64 the default is 0x400000 and you can change the default by using a custom linker script. You can also change where .text section starts with the -Wl,-Ttext,0xNNNNNNNN flag to gcc. To understand why .text is not mapped at address 0, keep in mind that the NULL pointer is usually mapped to ((void *) 0) for convenience. It is useful, then, that the zero page is mapped inaccessible to trap uses of NULL pointers. The memory before the start of .text is actually used by a lot of things; take cat /proc/self/maps as an example:
$ cat /proc/self/maps
001c0000-00317000 r-xp 00000000 08:01 245836 /lib/libc-2.12.1.so
00317000-00318000 ---p 00157000 08:01 245836 /lib/libc-2.12.1.so
00318000-0031a000 r--p 00157000 08:01 245836 /lib/libc-2.12.1.so
0031a000-0031b000 rw-p 00159000 08:01 245836 /lib/libc-2.12.1.so
0031b000-0031e000 rw-p 00000000 00:00 0
00376000-00377000 r-xp 00000000 00:00 0 [vdso]
00852000-0086e000 r-xp 00000000 08:01 245783 /lib/ld-2.12.1.so
0086e000-0086f000 r--p 0001b000 08:01 245783 /lib/ld-2.12.1.so
0086f000-00870000 rw-p 0001c000 08:01 245783 /lib/ld-2.12.1.so
08048000-08051000 r-xp 00000000 08:01 2244617 /bin/cat
08051000-08052000 r--p 00008000 08:01 2244617 /bin/cat
08052000-08053000 rw-p 00009000 08:01 2244617 /bin/cat
09ab5000-09ad6000 rw-p 00000000 00:00 0 [heap]
b7502000-b7702000 r--p 00000000 08:01 4456455 /usr/lib/locale/locale-archive
b7702000-b7703000 rw-p 00000000 00:00 0
b771b000-b771c000 r--p 002a1000 08:01 4456455 /usr/lib/locale/locale-archive
b771c000-b771e000 rw-p 00000000 00:00 0
bfbd9000-bfbfa000 rw-p 00000000 00:00 0 [stack]
What we see here is the C library, the dynamic loader ld.so and the kernel VDSO (kernel mapped dynamic code library that provides some interfaces to the kernel). Note that the start of the heap is also randomised.
There's not much of significance.
The stack typically grows downwards (to the lower addresses) and so it's somewhat reasonable (but not mandatory) to place it at high addresses and have some room for its expansion towards the lower addresses.
As for not using address 0 for program sections, there's some logic here. First, a lot of software uses 0 for NULL, a legal invalid pointer in C and C++, which should not be dereferenced. A lot of software has bugs in that it actually attempts to read or write memory at address 0 without proper pointer validation. If you make the memory area around address 0 inaccessible to the program, you can spot some of these bugs (the program will crash or stop in the debugger). Also, since NULL is a legal invalid pointer, there should be no data or code at that address (if there is, you are unable to distinguish a pointer to it from NULL).
On the x86 platform the memory around address 0 is typically made inaccessible by means of virtual to physical address translation. The page tables get set up in such a way that the entry for virtual address 0 is not backed up by a page of physical memory, and a page is usually 4 KB in size and not just a handful of bytes. That's why if you take out address 0, you take out addresses 1 through 4095 as well. It's also reasonable to take out more than 4 KB of the address space at address 0. The reason for that is pointers to structures in C and C++. You can have a NULL pointer to a structure and when you dereference it, the attempted memory access occurs at the address contained in the pointer (0) plus the distance between the structure member you're trying to access and the beginning of the structure (0 for the first member, greater than 0 for the rest).
There may be some other considerations for choosing specific ranges of addresses for programs, but I cannot speak for all of them. The OS may want to keep some program-related stuff (data structures) within the program itself, so why not use a fixed location for that near one of the ends of the accessible portion of the address space?

Resources