Compare a running process in memory with an executable in disk - linux

I have a big project which will load an executable (let's call it greeting) into memory, but for some reason (e.g. there are many files called greeting under different directories), I need to know if the process in memory is exactly the one I want to use.
I know how to compare two files: diff, cmp, cksum and so on. But is there any way to compare a process in memory with an executable in hard disk?

According this answer you can get the contents of the memory version of the binary from the proc file system. I think you can cksum the original and the in memory version.
According to the man page of /proc, under Linux 2.2 and later, the
file is a symbolic link containing the actual pathname of the executed
command. Apparently, the binary is loaded into memory, and
/proc/[pid]/exe points to the content of the binary in memory.

Related

If the size of the file exceeds the maximum size of the file system, what happens?

For example, In FAT32 partition, The maximum file size is 4GB. but I was able to create a 5GB file with vim and I saved the file and opened it again, the console output was broken like a staircase. I have three questions.
If the size of the file exceeds the maximum size of the file system, what happens?
In my case, Why break?
In Unix system call, stat() can succeed up to a 2GB(2^31 - 1). Does this have anything to do with the file system? Is there a relationship between the limits of data in stat() and the limits of each feature in the file system?
If the size of the file exceeds the maximum size of the file system, what happens?
By definition, that can never happens. What really happens is that some system call (probably write(2) ...) is failing, and the code doing that should take care of that case.
Notice that FAT32 filesystems restrict the maximal size of files to 2Gigabytes. Use a better file system on your USB key if you want more (or split(1) large files in smaller chunks before copying them to your FAT32-formatted USB key).
If using <stdio.h> notice that fflush(3), fprintf(3), fclose(3) (and most other standard functions) can fail (e.g. because they will do some failing write(2)).
the console output was broken like a staircase
probably because your pseudoterminal was in some broken state. See stty(1), reset(1), termios(3) and read the tty demystified.
In Unix system call, stat() can succeed up to a 2GB(2^31 - 1)
You are misunderstanding stat(2). Read again its documentation
Read Advanced Linux Programming then syscalls(2).
I was able to create a 5GB file with vim
To understand the behavior of vim read first its documentation then study its source code (it is free software, and you can and perhaps should study its code).
You could also use strace(1) to understand what system calls are done by some command or process.

Does timestamp of a file in /proc filesystem indicate the time at which parameter sample is taken?

Does timestamp of a file in /proc filesystem indicate the time at which parameter sample is taken ? if not, how to get timestamp at which a parameter is updated in a file under /proc ?
In general, you should not trust that much the result of stat(2) on /proc/ pseudo-files, and that includes both the apparent file size and its modification or access times. It is well known that many /proc/ files have a zero-size (as returned by stat) but a non-empty content (so should be read sequentially, like you would read a pipe), e.g. /proc/self/maps
Read more about proc(5), and also Documentation/filesystems/proc.txt in your kernel source tree.
Indeed, some files appear to have a timestamp related to the (current) time of open-ing or stat-ing (e.g. /proc/self/maps or /proc/$$/mounts or /proc/sys/kernel/randomize_va_space) but other files have a timestamp related to boot time, e.g /proc/interrupts
The /proc/ file system is really arcane, the ultimate reference about it is the kernel source code. But I won't rely on the fact that parameter-like files (I guess that you speak of files like /proc/sys/kernel/randomize_va_space ...) have a meaningful mtime.
Of course, the kernel does not keep track about modification times of many of its parameters.

Linux Implement open file descriptors C

1) Is it any alternative to looping through /proc in order to get the total number of open file descriptors?
I used the following dirs:
/proc/PID/fd/*
/proc/PID/maps
/proc/PID/cwd
/proc/PID/root
/proc/PID/exe
2) The number is different from lsof | wc -l and cat /proc/sys/fs/file-nr
3) Loaded dynamically linked libraries and current working directories can be counted as open file descriptors?
Implementation all open file descriptors in C for Linux
How you count this depends on what information you are interested in.
Looking through /proc/PID/fd/* will give you the number of open file descriptors. However, one caveat is that two processes may actually share a file descriptor, if you fork then the child process inherits the file descriptor from its parent, and this method will then count it twice, once for each process.
/proc/PID/maps will show you the memory map of the process, which can include the loaded executable itself and dynamically linked libraries, but also includes things that don't correspond to files like the heap, the stack, the vdso section which is a virtual shared object exported by the kernel, and so on.
lsof will list a variety of ways that files can be in use, which includes more than just file descriptors; it also includes the executable and shared libraries, but does not include the memory regions that don't correspond to files that show up in /proc/PID/maps like the stack, heap, vdso section, etc.
/proc/sys/fs/file-nr will report the number of open kernel file handles. A kernel file handle is different than a file descriptor; there can be more than one file descriptor open that point to the same file handle, for instance, by calling dup or dup2.
These differences explain why you're getting different numbers from these different ways of counting. The question is, what purpose are you using this count for? That will help answer which way of counting you should actually use.
1) no, but it seems you are confused as to what constitutes an open file descriptor, as suggested by your second question
2) see http://codingtragedy.blogspot.com/2015/04/nofile-ulimit-n-rlimitnofile-most.html - while it explains handling of a resource limits which may seem irrelevant, it also explains the difference between a file descriptor and a 'struct file' which you most likely want, and it even covers your lsof usage.
3) Again, it is unclear what is your actual question. current working directory is not a file descriptor and is only represented with an inode. A process may or may not keep fd for a linked library around, but mapping itself occupies a 'struct file'.

Does executing a binary on a RAMDisk reload the executable into memory?

Let's say I have two copies of the same 10MB binary executable, A and B.
If I have plenty of available memory and run ./A, my understanding is that A will be loaded into memory and run from there. This will take around 10MB of RAM to accomplish.
If I have plenty of available memory, create a RAMDisk, copy B to the RAMDisk, and run ./B from the RAMDisk, my understanding is that B will be (re)loaded into memory and run from there. This will take around 10MB of RAM for the executable, plus the memory in use for the RAMDisk.
Is this correct? Is a RAMDisk smart enough to say "oh, I already have that binary executable in memory, let's just run it in place?" Even if it was, wouldn't the loader have to do its magic to run the thing?
I'm using QNX and running ELF without COFF binaries, but I would appreciate answers for any *Nix system.
I would really expect it to be loaded, typical ELF binaries are really not an "execute in place" format.
There are things you need to do, like relocating any position-independent code and of course dynamic library loading, which the file system on the RAM disk knows nothing about.

Code segment sharing between two processes

Suppose we run two processes back to back say :-
$ grep abc abc.txt ==> pid 100
$ grep def def.txt ==> pid 101
I read in the book "Beginning Linux programming" chapter# 11 that the code section of the processes would be shared, as it is read only. Is it so? I think if grep is compiled as shared library only then the code section would be shared.
One more question, in case of shared libraries how does the OS knows that the library has already been loaded or not? Suppose if 2 processes are simultaneously calling a shared library function then how does the virtual address of two processes be converted to physical address pointing the same location in RAM?
The OS doesn't load files into memory anymore. Instead, files are memory mapped. This means an inode and an offset of a file on disk will be connected to a page in memory. This makes it pretty simple to find out if some part of a file has already been loaded. Also, you can keep only part of a file in RAM (after setup, you don't need the setup code anymore, so you can "forget" about it and reuse those pages for something more useful).
The libraries and executables are not loaded, but mapped into memory with mmap(2). Basically, when you mmap() something with MAP_SHARED flag, others who map the same file will get the same memory pages.

Resources