How to get address space layout from Intel Pin on Linux?

How to get address space layout from Intel Pin on Linux? - linux

I want to get the address space layout from Intel Pin on Linux.
At first, I try to read file - /proc/PID/maps and get the address space layout. But when do you execute such part of code?
If you put it before PIN_StartProgram, the maps file will not contain some regions, like heap;
If you put it in the Fini, and hook it with PIN_AddFiniFunction(Fini, 0);, it should be good. However, when you just trace one ls execution, you cannot see any output related address space layout. That's wired.

Perhaps not the best solution, but it worked for me. The main problem is that when the tool starts, the address space is not prepared yet. You can wait until all of the images are loaded and then read the contents of procfs.
So you should add an instrumentation function for each image. For example, add the following statement to the main function:
IMG_AddInstrumentFunction(Image, 0);
Then you should read procfs, every time an image is loaded. This is because you do not know which image is the last image loaded (Of course, if you know which image is the last one, you can simply read the file only once, after that image is loaded):
VOID Image(IMG img, VOID *v)
{
...
/* open /proc/PID/maps and read its contents */
...
}
During the execution of the program, you always have the latest mappings of the address space and everything will be fine. Albeit, you should always be careful with runtime layout modifications, situations such as heap size increase using brk() system call.

Pin has a more fine-grained approach to address space layout. You can get callbacks for image loads using IMG_AddInstrumentFunction(), and get callbacks for heap allocations by instrumenting malloc() and free() calls using RTN_Replace() or even instrumenting mmap(), brk() and other syscalls for heap allocation with PIN_AddSyscallEntryFunction().
You can find examples for the use of these APIs in the Pin tutorial and the examples in the Pin kit.

Related

Where accessing swap space is handled in Linux kernel?

I am working as research assistant and this question is really vital for our group:
We are looking for a way to inject delays when a process (i.e. a python program) is using swap space. For example, if in a normal way, it is swapping-in pages from swap space to main memory to work on it, we need to make a delay before that. (I know it doesn't make sense in real world to make delay in kernel when accessing swap space or main memory, but with doing that we will be able to simulate something which is important for us to do so.)
What I did before:
I already tried adding a printk statement to the following sections in the kernel but none of them seem to be the exact location of handling swap-in and swap-out for a process.
In the memory.c file, in the function do_swap_page(). It didn't work always.
In the memcontrol.c file, in the function mem_cgroup_swapin_charge_page. It just works when the application is newly allocating the swap space. (And not whenever it is using the swap actually)
In the swap_state.c file, in the function swapin_readahead() and also in the function swap_vma_readahead(). None of them works even when the process is using swap space.
Note 1: I am using the latest version of kernel (6.0.9). But I don't think it will be different with different versions of Linux kernel.
Note 2: I am measuring the amount of swap used with the command free -m. Also, more information is available in the /proc/meminfo file. So, I am sure that my process is using swap space, but I don't know how to catch it in the kernel sorce code.
It has been weeks that I a looking for an answer but no success. Any help is so appretiated. Thank you.

Control V4L2/VB2 Buffer Allocation?

I am trying to write a V4L2 compliant driver for a special camera device i have, but the device doesn't seem particularly friendly with V4L2's buffer system. Instead of the separately allocated buffers, it wants a single contiguous block of memory capable of holding a set # of buffers (usually 4), and it then provides a status register telling you which is the latest (updated after each frame is DMA'ed to the host). So it basically needs only a single large DMA allocated memory chunk to work with, not 4 most-likely separated.
How can I use this with V4L? Everything I see about VIDIOC_CREATE_BUFS, VIDIOC_REQBUFS and such does internal allocation of the buffers, and I can't get anything V4L-based (like qv4l2 to work without a successful QBUF and DQBUF that uses their internal structure.
How can this be done?

Just for completion, I finally found a solution in the "meye" driver. I removed everything VB2 and wrote my own reqbuf, querybuf, qbuf, and dqbuf, along with my own mmap routines to handle the allocation. And it all works!

How to portably extend a file accessed using mmap()

We're experimenting with changing SQLite, an embedded database system,
to use mmap() instead of the usual read() and write() calls to access
the database file on disk. Using a single large mapping for the entire
file. Assume that the file is small enough that we have no trouble
finding space for this in virtual memory.
So far so good. In many cases using mmap() seems to be a little faster
than read() and write(). And in some cases much faster.
Resizing the mapping in order to commit a write-transaction that
extends the database file seems to be a problem. In order to extend
the database file, the code could do something like this:
ftruncate(); // extend the database file on disk
munmap(); // unmap the current mapping (it's now too small)
mmap(); // create a new, larger, mapping
then copy the new data into the end of the new memory mapping.
However, the munmap/mmap is undesirable as it means the next time each
page of the database file is accessed a minor page fault occurs and
the system has to search the OS page cache for the correct frame to
associate with the virtual memory address. In other words, it slows
down subsequent database reads.
On Linux, we can use the non-standard mremap() system call instead
of munmap()/mmap() to resize the mapping. This seems to avoid the
minor page faults.
QUESTION: How should this be dealt with on other systems, like OSX,
that do not have mremap()?
We have two ideas at present. And a question regarding each:
1) Create mappings larger than the database file. Then, when extending
the database file, simply call ftruncate() to extend the file on
disk and continue using the same mapping.
This would be ideal, and seems to work in practice. However, we're
worried about this warning in the man page:
"The effect of changing the size of the underlying file of a
mapping on the pages that correspond to added or removed regions of
the file is unspecified."
QUESTION: Is this something we should be worried about? Or an anachronism
at this point?
2) When extending the database file, use the first argument to mmap()
to request a mapping corresponding to the new pages of the database
file located immediately after the current mapping in virtual
memory. Effectively extending the initial mapping. If the system
can't honour the request to place the new mapping immediately after
the first, fall back to munmap/mmap.
In practice, we've found that OSX is pretty good about positioning
mappings in this way, so this trick works there.
QUESTION: if the system does allocate the second mapping immediately
following the first in virtual memory, is it then safe to eventually
unmap them both using a single big call to munmap()?

2 will work but you don't have to rely on the OS happening to have space available, you can reserve your address space beforehand so your fixed mmapings will always succeed.
For instance, To reserve one gigabyte of address space. Do a
mmap(NULL, 1U << 30, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
Which will reserve one gigabyte of continuous address space without actually allocating any memory or resources. You can then perform future mmapings over this space and they will succeed. So mmap the file into the beginning of the space returned, then mmap further sections of the file as needed using the fixed flag. The mmaps will succeed because your address space is already allocated and reserved by you.
Note: linux also has the MAP_NORESERVE flag which is the behavior you would want for the initial mapping if you were allocating RAM, but in my testing it is ignored as PROT_NONE is sufficient to say you don't want any resources allocated yet.

I think #2 is the best currently available solution. In addition to this, on 64bit systems you may create your mapping explicitly at an address that OS would never choose for an mapping (for example 0x6000 0000 0000 0000 in Linux) to avoid the case that OS cannot place the new mapping immediatly after the first one.
It is always safe to unmap mutiple mappinsg with a single munmap call. You can even unmap a part of the mapping if you wish to do so.

Use fallocate() instead of ftruncate() where available. If not, just open file in O_APPEND mode and increase file by writing some amount of zeroes. This greatly reduce fragmentation.
Use "Huge pages" if available - this greatly reduce overhead on big mappings.
pread()/pwrite()/pwritev()/preadv() with not-so-small block size is not slow really. Much faster than IO can actually be performed.
IO errors when using mmap() will generate just segfault instead of EIO or so.
The most of SQLite WRITE performance problems is concentrated in good transactional use (i.e. you should debug when COMMIT actually performed).

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option

I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

Can I write-protect every page in the address space of a Linux process?

I'm wondering if there's a way to write-protect every page in a Linux
process' address space (from inside of the process itself, by way of
mprotect()). By "every page", I really mean every page of the
process's address space that might be written to by an ordinary
program running in user mode -- so, the program text, the constants,
the globals, and the heap -- but I would be happy with just constants,
globals, and heap. I don't want to write-protect the stack -- that
seems like a bad idea.
One problem is that I don't know where to start write-protecting
memory. Looking at /proc/pid/maps, which shows the sections of memory
in use for a given pid, they always seem to start with the address
0x08048000, with the program text. (In Linux, as far as I can tell,
the memory of a process is laid out with the program text at the
bottom, then constants above that, then globals, then the heap, then
an empty space of varying size depending on the size of the heap or
stack, and then the stack growing down from the top of memory at
virtual address 0xffffffff.) There's a way to tell where the top of
the heap is (by calling sbrk(0), which simply returns a pointer to the
current "break", i.e., the top of the heap), but not really a way to
tell where the heap begins.
If I try to protect all pages from 0x08048000 up to the break, I
eventually get an mprotect: Cannot allocate memory error. I don't know why mprotect would be
allocating memory anyway -- and Google is not very helpful. Any ideas?
By the way, the reason I want to do this is because I want to create a
list of all pages that are written to during a run of the program, and
the way that I can think of to do this is to write-protect all pages,
let any attempted writes cause a write fault, then implement a write
fault handler that will add the page to the list and then remove the write
protection. I think I know how to implement the handler, if only I could
figure out which pages to protect and how to do it.
Thanks!

You recieve ENOMEM from mprotect() if you try to call it on pages that aren't mapped.
Your best bet is to open /proc/self/maps, and read it a line at a time with fgets() to find all the mappings in your process. For each writeable mapping (indicated in the second field) that isn't the stack (indicated in the last field), call mprotect() with the right base address and length (calculated from the start and end addresses in the first field).
Note that you'll need to have your fault handler already set up at this point, because the act of reading the maps file itself will likely cause writes within your address space.

Start simple. Write-protect a few page and make sure your signal handler works for these pages. Then worry about expanding the scope of the protection. For example, you probably do not need to write-protect the code-section: operating systems can implement write-or-execute protection semantics on memory that will prevent code sections from ever being written to:
http://en.wikipedia.org/wiki/Self-modifying_code#Operating_systems

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string