Where accessing swap space is handled in Linux kernel? - linux

I am working as research assistant and this question is really vital for our group:
We are looking for a way to inject delays when a process (i.e. a python program) is using swap space. For example, if in a normal way, it is swapping-in pages from swap space to main memory to work on it, we need to make a delay before that. (I know it doesn't make sense in real world to make delay in kernel when accessing swap space or main memory, but with doing that we will be able to simulate something which is important for us to do so.)
What I did before:
I already tried adding a printk statement to the following sections in the kernel but none of them seem to be the exact location of handling swap-in and swap-out for a process.
In the memory.c file, in the function do_swap_page(). It didn't work always.
In the memcontrol.c file, in the function mem_cgroup_swapin_charge_page. It just works when the application is newly allocating the swap space. (And not whenever it is using the swap actually)
In the swap_state.c file, in the function swapin_readahead() and also in the function swap_vma_readahead(). None of them works even when the process is using swap space.
Note 1: I am using the latest version of kernel (6.0.9). But I don't think it will be different with different versions of Linux kernel.
Note 2: I am measuring the amount of swap used with the command free -m. Also, more information is available in the /proc/meminfo file. So, I am sure that my process is using swap space, but I don't know how to catch it in the kernel sorce code.
It has been weeks that I a looking for an answer but no success. Any help is so appretiated. Thank you.

Related

Linux kernel : logging to a specific file

I am trying to edit the linux kernel. I want some information to be written out to a file as a part of the debugging process. I have read about the printk function. But i would like to add text to a particular file (file other from the default files that keep debug logs).
To cut it short: I would kind of like to specify the "destination" in the printk function (or at least some work-around it)
How can I achieve this? Will using fwrite/fopen work (if yes, will it work without causing much overhead compared to printk, since they are implemented differently)?
What other options do i have?
Using fopen and fwrite will certainly not work. Working with files in kernel space is generally a bad idea.
It all really depends on what you are doing in the kernel though. In some configurations, there may not even be a hard disk for you to write to. If however, you are working at a stage where you can have certain assumptions about the running kernel, you probably actually want to write a kernel module rather than edit the kernel itself. For all you care, a kernel module is just as good as any other part of the kernel, but they are inserted when the kernel is already up and running.
You may also be thinking of doing so for debugging, or have output of a kernel-level application (e.g. an application that you are forced to run at kernel level for real-time constraints etc). In that case, kio may be of interest to you, but if you want to use it, do make sure you understand why.
kio is a library I wrote just for those "kernel-level applications", which makes a kernel module see a /proc file as if it's a user of it (rather than a provider). To make it work, you should have a user-space application also opening that virtual file and redirect it to wherever you want to write your log. Something along the lines of opening the file with kopen in write mode and in user space tell cat /proc/your_file > ~/log_file.
Note: I still recommend printk unless you really know what you are doing. Since you are thinking of fopen in kernel space, I don't think you really know what you are doing.

Write module of kernel (Linux), which to save the page of process from removing to the swap

Need to save the page of process (the user part!) from removing to the swap.
I need to do it in the kernel, only. (language C I know)
(Maybe insert hook in shrink_page_list?)
I have IDs of processes, which need to save and threshold amount of physical memory in the system (We fill, while it isn't filled). IDs and threshold write in /proc, /dev or /sys.
How to approach this?
What files to look at?
What tutorials to read?
Maybe there are examples that are somehow are related with this task.
Info: I compilling kernel of Debian Lenny, use Qemu for start it on my Ubuntu.
See get_user_pages. http://www.makelinux.net/ldd3/chp-15-sect-3.
Use get_user_pages, you can get whatever page you want and keep it locked in memory.
Even better, look at the comments on the source at
http://lxr.free-electrons.com/source/mm/gup.c#L637

Address space identifiers using qemu for i386 linux kernel

Friends, I am working on an in-house architectural simulator which is used to simulate the timing-effect of a code running on different architectural parameters like core, memory hierarchy and interconnects.
I am working on a module takes the actual trace of a running program from an emulator like "PinTool" and "qemu-linux-user" and feed this trace to the simulator.
Till now my approach was like this :
1) take objdump of a binary executable and parse this information.
2) Now the emulator has to just feed me an instruction-pointer and other info like load-address/store-address.
Such approaches work only if the program content is known.
But now I have been trying to take traces of an executable running on top of a standard linux-kernel. The problem now is that the base kernel image does not contain the code for LKM(Loadable Kernel Modules). Also the daemons are not known when starting a kernel.
So, my approach to this solution is :
1) use qemu to emulate a machine.
2) When an instruction is encountered for the first time, I will parse it and save this info. for later.
3) create a helper function which sends the ip, load/store address when an instruction is executed.
i am stuck in step2. how do i differentiate between different processes from qemu which is just an emulator and does not know anything about the guest OS ??
I can modify the scheduler of the guest OS but I am really not able to figure out the way forward.
Sorry if the question is very lengthy. I know I could have abstracted some part but felt that some part of it gives an explanation of the context of the problem.
In the first case, using qemu-linux-user to perform user mode emulation of a single program, the task is quite easy because the memory is linear and there is no virtual memory involved in the emulator. The second case of whole system emulation is a lot more complex, because you basically have to parse the addresses out of the kernel structures.
If you can get the virtual addresses directly out of QEmu, your job is a bit easier; then you just need to identify the process and everything else functions just like in the single-process case. You might be able to get the PID by faking a system call to get_pid().
Otherwise, this all seems quite a bit similar to debugging a system from a physical memory dump. There are some tools for this task. They are probably too slow to run for every instruction, though, but you can look for hints there.

Minimal core dump (stack trace + current frame only)

Can I configure what goes into a core dump on Linux? I want to obtain something like the Windows mini-dumps (minimal information about the stack frame when the app crashed). I know you can set a max size for the core files using ulimit, but this does not allow me to control what goes inside the core (i.e. there is no guarantee that if I set the limit to 64kb it will dump the last 16 pages of the stack, for example).
Also, I would like to set it in a programmatic way (from code), if possible.
I have looked at the /proc/PID/coredump_filter file mentioned by man core, but it seems too coarse grained for my purposes.
To provide a little context: I need tiny core files, for multiple reasons: I need to collect them over the network, for numerous (thousands) of clients; furthermore, these are embedded devices with little SD cards, and GPRS modems for the network connection. So anything above ~200k is out of question.
EDIT: I am working on an embedded device which runs linux 2.6.24. The processor is PowerPC. Unfortunately, powerpc-linux is not supported in breakpad at the moment, so google breakpad is not an option
I have "solved" this issue in two ways:
I installed a signal handler for SIGSEGV, and used backtrace/backtrace_symbols to print out the stack trace. I compiled my code with -rdynamic, so even after stripping the debug info I still get a backtrace with meaningful names (while keeping the executable compact enough).
I stripped the debug info and put it in a separate file, which I will store somewhere safe, using strip; from there, I will use add22line with the info saved from the backtrace (addresses) to understand where the problem happened. This way I have to store only a few bytes.
Alternatively, I found I could use the /proc/self/coredump_filter to dump no memory (setting its content to "0"): only thread and proc info, registers, stacktrace etc. are saved in the core. See more in this answer
I still lose information that could be precious (global and local variable(s) content, params..). I could easily figure out which page(s) to dump, but unfortunately there is no way to specify a "dump-these-pages" for normal core dumps (unless you are willing to go and patch the maydump() function in the kernel).
For now, I'm quite happy with there 2 solutions (it is better than nothing..) My next moves will be:
see how difficult would be to port Breakpad to powerpc-linux: there are already powerpc-darwin and i386-linux so.. how hard can it be? :)
try to use google-coredumper to dump only a few pages around the current ESP (that should give me locals and parameters) and around "&some_global" (that should give me globals).

How do I measure net used disk space change due to activity by a given process in Linux?

I'd like to monitor disk space requirements of a running process. Ideally, I want to be able to point to a process and find out the net change in used disk space attributable to it. Is there an easy way of doing this in Linux? (I'm pretty sure it would be feasible, though maybe not very easy, to do this in Solaris with DTrace)
Probably you'll have to ptrace it (or get strace to do it for you and parse the output), and then try to work out what disc is being used.
This is nontrivial, as your tracing process will need to understand which file operations use disc space - and be free of race conditions. However, you might be able to do an approximation.
Quite a lot of things can use up disc space, because most Linux filesystems support "holes". I suppose you could count holes as well for accounting purposes.
Another problem is knowing what filesystem operations free up disc space - for example, opening a file for writing may, in some cases, truncate it. This clearly frees up space. Likewise, renaming a file can free up space if it's renamed over an existing file.
Another issue is processes which invoke helper processes to do stuff - for example if myprog does a system("rm -rf somedir").
Also it's somewhat difficult to know when a file has been completely deleted, as it might be deleted from the filesystem but still open by another process.
Happy hacking :)
If you know the PID of the process to monitor, you'll find plenty of information about it in /proc/<PID>.
The file /proc/<PID>/io contains statistics about bytes read and written by the process, it should be what you are seeking for.
Moreover, in /proc/<PID>/fd/ you'll find links to all the files opened by your process, so you could monitor them.
there is Dtrace for linux is available
http://librenix.com/?inode=13584
Ashitosh

Resources