GDB find value within memory

GDB find value within memory - search

I am having trouble with the find command in gdb. What I am trying to do is, finding a specific value e.g. (964) within the memory of a process. I have been successfull but my solution takes way to long!
What I did:
cat /proc/16617/maps
Than getting the libc_malloc parts, I know the value is within those.
7e43c000-7e451000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e452000-7e45b000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e470000-7e47c000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e47d000-7e490000 rw-p 00000000 00:00 0 [anon:libc_malloc]
7e4cc000-7e4dc000 rw-p 00000000 00:00 0 [anon:libc_malloc]
Than I could use the examine command to investigate each value. This one is working but it takes so much time. I guess the find command within gdb is way faster doing this.
0x78b0e070: 0 829055599 57 59
0x78b0e080: 2 2024857820 2024857860 2024857900
0x78b0e090: 1970810756 4587520 71 0
0x78b0e0a0: 0 2024857756 2024857756 0
0x78b0e0b0: 0 27 1970675312 1
0x78b0e0c0: 1 2024857728 0 43
0x78b0e0d0: 23 23 0 1936029041
E.g addresses do not match due to several runs, but it is working :)
x/dw 0x78B0E19c --> Result 964
My find command was sth like this:
find 0x419a1000, 0x7e5b6000, 964
But I get as result these
0x419a3058
0x419a30c0
0x419a348c
0x419a3e7d
0x419a3fec
5 patterns found.
And this is not ture. It can t be possible that it is just within 5 addresses. It should be way more. It also seems that its not searching correctly. Because it just within the 0x419a3xxx range.
I am searching an int value within an arm architecture. But that point shouldn t matter. What I am doing wrong? Could you please provide me some examples to find all int values (964) within the address space of a process? And this pretty fast? :)
Thank you!

Related

What is this memory section in /proc/maps

I am debugging a memory increase issue by dumping periodic snap of /proc//maps to get summary of leaked areas. While certain sections are obvious, I could not comprehend below one. Here is diff of memory area of process in observation in 1 minute interval. Does this represent stack area (since downward growing) or something else?
7f0ffe072000-7f0ffeb72000 rw-p 00000000 00:00 0 --> (before)
7f0ffdff2000-7f0ffeb72000 rw-p 00000000 00:00 0 --> (after)
Stack has proper TAG in maps, hence I believe the above section is not stack. Could someone guide on this memory area?

Is it possible to add a customized name for the (non file-backed) mmap region?

Just curious whether it is possible to specify a name for the non file-backed mmap region? Some thing like the [New VMA area] in the following example:
$ cat /proc/12345/maps
...
7fc062ef2000-7fc062f84000 r-xp 00000000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f85000-7fc062f86000 r--p 00092000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f86000-7fc062f87000 rw-p 00093000 08:01 22688328 /usr/local/musl/lib/libc.so
7fc062f87000-7fc062f8a000 rw-p 00000000 00:00 0 [New VMA area]
7fff6c384000-7fff6c3a5000 rw-p 00000000 00:00 0 [stack]
7fff6c3bd000-7fff6c3c0000 r--p 00000000 00:00 0 [vvar]

The content of maps comes from the show_map_vma function in fs/proc/task_mmu.c. Looking at it, if you want a custom name for a non-file-backed mapping, it'd need to come from either vma->vm_ops->name or arch_vma_name. arch_vm_name is architecture-specific (as you'd expect from the name) and serves only to add a few hardcoded names for certain regions, so it's not useful to you. That leaves vma->vm_ops->name as your only possibility, but when you call mmap with MAP_ANONYMOUS, vma_set_anonymous sets vma->vm_ops to NULL. Thus, strictly what you asked for is impossible without custom kernel code.
However, it may still be possible to do what you want. If you actually just want the memory to not be backed by disk, you can create an FD with the memfd_create syscall or a file in a non-disk-backed filesystem such as a tmpfs (e.g., /dev/shm, like shm_open uses). Either of these approaches will give you control over the filename that is used (e.g., /memfd:some_name (deleted) or /dev/shm/some_name).

Determining the source of Qemu guest instructions when using in_asm

I'm trying to gather statistics about the percentage of library code that is used vs executed. To do this I'm invoking Qemu-user with the -d in_asm flag. I log this to a file and get a sizeable file listing the translated instructions that looks like this
----------------
IN:
0x4001a0f1e9: 48 83 c4 30 addq $0x30, %rsp
0x4001a0f1ed: 85 c0 testl %eax, %eax
0x4001a0f1ef: 74 b7 je 0x4001a0f1a8
----------------
IN:
0x4001a0f1f1: 49 8b 0c 24 movq (%r12), %rcx
0x4001a0f1f5: 48 83 7c 24 50 00 cmpq $0, 0x50(%rsp)
0x4001a0f1fb: 0f 84 37 01 00 00 je 0x4001a0f338
----------------
To map blocks to associated files, I extract the /proc/pid/maps for the qemu process and compare the address of instructions executed to the address ranges of files within the guest program. This appears to work reasonably well, however the majority of the instructions executed appear to be outside of any of the files contained within the map file. The bottom of the guest address space is listed as follows
.
.
.
40020a0000-4002111000 r--p 00000000 103:02 2622381 /lib/x86_64-
linux-gnu/libpcre.so.3.13.3
4002111000-4002112000 r--p 00070000 103:02 2622381 /lib/x86_64-linux-gnu/libpcre.so.3.13.3
4002112000-4002113000 rw-p 00071000 103:02 2622381 /lib/x86_64-linux-gnu/libpcre.so.3.13.3
4002113000-4002115000 rw-p 00000000 00:00 0
555555554000-5555555a1000 r--p 00000000 103:02 12462104 /home/name/Downloads/qemu-5.2.0/exe/bin/qemu-x86_64
the guest program appears to end at 0x4002115000, with a sizeable gap between the guest, and Qemu which begins at 0x555555554000. I can match instructions in the libraries to the actual binaries, so the approach isn't entirely faulty. However there are almost 60,000 blocks executed whose origin is between 0x400aa20000 and 0x407c8ae138. This region of memory is nominally unmapped, however Qemu seems to be translating, and succesfully executing code here. The program appears to run correctly, so I am unsure where these instructions originate. I had initially thought it might be the vDSO, but the range appears to be much too large, and there are too many separate addresses. I looked at the preceding code for a couple of these blocks and it was in ld.so but I can't say if all the calls are generated there. I think it's possible that this is kernel code, but I'm not sure how to validate whether or not this is true. I'm at a loss as to how to approach this problem.
Is there a way to trace the providence of these instructions? perhaps using the gdb stub or some other logging functionality?"

When you are searching in /proc/pid/maps the corresponding modules may be already unloaded. Running LD_DEBUG=files <your qemu command line> will print module loading info, including their load address and size. Search there for missing code addresses.

Why are shared files mapped 4 times in a process

I was trying to understand the maps in the /proc file system in Linux. I observed that every shared file was mapped 4 times, with different offsets and permissions. I concluded that these must be the different sections in the ELF and hence are mapped differently (.text, .data, .rodata, etc).
But what was surprising is that two of the mappings always had the same offset in the file. Consider the example -
7fb8eebd6000-7fb8eebe0000 r-xp 00000000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eebe0000-7fb8eeddf000 ---p 0000a000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eeddf000-7fb8eede0000 r--p 00009000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
7fb8eede0000-7fb8eede1000 rw-p 0000a000 08:06 3285700 /lib/x86_64-linux-gnu/libnss_files-2.19.so
The 2nd and the 4th entry is mapped at the same offset in the file with different permissions.
Upon running objdump --headers on the mentioned .so file, file offset 0xa000 seems to the .got.plt section.
24 .got.plt 00000160 000000000020a000 000000000020a000 0000a000 2**3
CONTENTS, ALLOC, LOAD, DATA
Can some one throw light on why it is mapped twice?
I know about the PLT table that it is patched the first time the function is visited, and hence might need a write permission, but why another mapping without any read/write permissions?
Edit: I checked a few other shared library mappings and it is not the .got.plt section that is mapped twice. But there is always one section that is mapped twice and the double mapping always has the ---p permissions.

Why do Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff?

According to Assembly Primer For Hackers (Part 2) Virtual Memory Organization, Linux program .text sections start at 0x0804800 and stack tops start at 0xbffffff. What is the significance of these numbers? Why not start .text at 0x0000000 (or 0x0000020 or 0x0000040 to go the next 32 or 64 bits past NULL)? Why not start the top of the stack at 0xfffffff?

Let's start by saying this: most of the time, the various sections do not need to be placed in a specific location, what matters more is the layout. Nowadays, the stack top is actually randomised, see here.
0x08048000 is the default address on which ld starts the first PT_LOAD segment on Linux/x86. On Linux/amd64 the default is 0x400000 and you can change the default by using a custom linker script. You can also change where .text section starts with the -Wl,-Ttext,0xNNNNNNNN flag to gcc. To understand why .text is not mapped at address 0, keep in mind that the NULL pointer is usually mapped to ((void *) 0) for convenience. It is useful, then, that the zero page is mapped inaccessible to trap uses of NULL pointers. The memory before the start of .text is actually used by a lot of things; take cat /proc/self/maps as an example:
$ cat /proc/self/maps
001c0000-00317000 r-xp 00000000 08:01 245836 /lib/libc-2.12.1.so
00317000-00318000 ---p 00157000 08:01 245836 /lib/libc-2.12.1.so
00318000-0031a000 r--p 00157000 08:01 245836 /lib/libc-2.12.1.so
0031a000-0031b000 rw-p 00159000 08:01 245836 /lib/libc-2.12.1.so
0031b000-0031e000 rw-p 00000000 00:00 0
00376000-00377000 r-xp 00000000 00:00 0 [vdso]
00852000-0086e000 r-xp 00000000 08:01 245783 /lib/ld-2.12.1.so
0086e000-0086f000 r--p 0001b000 08:01 245783 /lib/ld-2.12.1.so
0086f000-00870000 rw-p 0001c000 08:01 245783 /lib/ld-2.12.1.so
08048000-08051000 r-xp 00000000 08:01 2244617 /bin/cat
08051000-08052000 r--p 00008000 08:01 2244617 /bin/cat
08052000-08053000 rw-p 00009000 08:01 2244617 /bin/cat
09ab5000-09ad6000 rw-p 00000000 00:00 0 [heap]
b7502000-b7702000 r--p 00000000 08:01 4456455 /usr/lib/locale/locale-archive
b7702000-b7703000 rw-p 00000000 00:00 0
b771b000-b771c000 r--p 002a1000 08:01 4456455 /usr/lib/locale/locale-archive
b771c000-b771e000 rw-p 00000000 00:00 0
bfbd9000-bfbfa000 rw-p 00000000 00:00 0 [stack]
What we see here is the C library, the dynamic loader ld.so and the kernel VDSO (kernel mapped dynamic code library that provides some interfaces to the kernel). Note that the start of the heap is also randomised.

There's not much of significance.
The stack typically grows downwards (to the lower addresses) and so it's somewhat reasonable (but not mandatory) to place it at high addresses and have some room for its expansion towards the lower addresses.
As for not using address 0 for program sections, there's some logic here. First, a lot of software uses 0 for NULL, a legal invalid pointer in C and C++, which should not be dereferenced. A lot of software has bugs in that it actually attempts to read or write memory at address 0 without proper pointer validation. If you make the memory area around address 0 inaccessible to the program, you can spot some of these bugs (the program will crash or stop in the debugger). Also, since NULL is a legal invalid pointer, there should be no data or code at that address (if there is, you are unable to distinguish a pointer to it from NULL).
On the x86 platform the memory around address 0 is typically made inaccessible by means of virtual to physical address translation. The page tables get set up in such a way that the entry for virtual address 0 is not backed up by a page of physical memory, and a page is usually 4 KB in size and not just a handful of bytes. That's why if you take out address 0, you take out addresses 1 through 4095 as well. It's also reasonable to take out more than 4 KB of the address space at address 0. The reason for that is pointers to structures in C and C++. You can have a NULL pointer to a structure and when you dereference it, the attempted memory access occurs at the address contained in the pointer (0) plus the distance between the structure member you're trying to access and the beginning of the structure (0 for the first member, greater than 0 for the rest).
There may be some other considerations for choosing specific ranges of addresses for programs, but I cannot speak for all of them. The OS may want to keep some program-related stuff (data structures) within the program itself, so why not use a fixed location for that near one of the ends of the accessible portion of the address space?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string