Code segment sharing between two processes - linux

Suppose we run two processes back to back say :-
$ grep abc abc.txt ==> pid 100
$ grep def def.txt ==> pid 101
I read in the book "Beginning Linux programming" chapter# 11 that the code section of the processes would be shared, as it is read only. Is it so? I think if grep is compiled as shared library only then the code section would be shared.
One more question, in case of shared libraries how does the OS knows that the library has already been loaded or not? Suppose if 2 processes are simultaneously calling a shared library function then how does the virtual address of two processes be converted to physical address pointing the same location in RAM?

The OS doesn't load files into memory anymore. Instead, files are memory mapped. This means an inode and an offset of a file on disk will be connected to a page in memory. This makes it pretty simple to find out if some part of a file has already been loaded. Also, you can keep only part of a file in RAM (after setup, you don't need the setup code anymore, so you can "forget" about it and reuse those pages for something more useful).

The libraries and executables are not loaded, but mapped into memory with mmap(2). Basically, when you mmap() something with MAP_SHARED flag, others who map the same file will get the same memory pages.

Related

Compare a running process in memory with an executable in disk

I have a big project which will load an executable (let's call it greeting) into memory, but for some reason (e.g. there are many files called greeting under different directories), I need to know if the process in memory is exactly the one I want to use.
I know how to compare two files: diff, cmp, cksum and so on. But is there any way to compare a process in memory with an executable in hard disk?
According this answer you can get the contents of the memory version of the binary from the proc file system. I think you can cksum the original and the in memory version.
According to the man page of /proc, under Linux 2.2 and later, the
file is a symbolic link containing the actual pathname of the executed
command. Apparently, the binary is loaded into memory, and
/proc/[pid]/exe points to the content of the binary in memory.

What is a memory image in *nix systems?

In the book Advanced Programming in the Unix Environment 3rd Edition, Chapter 10 -- Signals, Page 315, when talking about the actions taken by the processes that receive a signal , the author says
When the default action is labeled "terminate+core", it means that a memory image of the process is left in the file named core of the current working directory of the process.
What is a memory image? When is this created, what's the content of it, and what is it used for?
A memory image is simply a copy of the process's virtual memory, saved in a file. It's used when debugging the program, as you can examine the values of the program's variables and determine which functions were being called at the time of the failure.
As the documentation you quoted says, this file is created when the process is terminated due to a signal that has the "terminate+core" default action.'
A memory image is often called a core image. See core(5) and the core dump wikipage.
Grossly speaking, a core image describes the process virtual address space (and content) at time of crash (including call stacks of each active thread and writable data segments for global data and heaps, but often excluding text or code segments which are read-only and given in the executable ELF file or in shared libraries). It also contains the register state (for each thread).
The name core is understandable only by old guys like me (having seen computers built in the 1960 & 1970-s like IBM/360, PDP-10 and early PDP-11, both used for developing the primordial Unix), since long time ago (1950-1970) random access memory was made with magnetic core memory.
If you have compiled all your source code with debug information (e.g. using gcc -g -Wall) you can do some post-mortem debugging (after yourprogram crashed and dumped a core file!) using gdb as
gdb yourprogram core
and the first gdb command you'll try is probably bt to get the backtrace.
Don't forget to enable core dumps, with the setrlimit(2) syscall generally done in your shell with e.g. ulimit  -c
Several signals can dump core, see signal(7). A common cause is a segmentation violation, like when you dereference a NULL or bad pointer, which gives a SIGSEGV signal which (often) dumps a core file in the current directory.
See also gcore(1).

Linux Implement open file descriptors C

1) Is it any alternative to looping through /proc in order to get the total number of open file descriptors?
I used the following dirs:
/proc/PID/fd/*
/proc/PID/maps
/proc/PID/cwd
/proc/PID/root
/proc/PID/exe
2) The number is different from lsof | wc -l and cat /proc/sys/fs/file-nr
3) Loaded dynamically linked libraries and current working directories can be counted as open file descriptors?
Implementation all open file descriptors in C for Linux
How you count this depends on what information you are interested in.
Looking through /proc/PID/fd/* will give you the number of open file descriptors. However, one caveat is that two processes may actually share a file descriptor, if you fork then the child process inherits the file descriptor from its parent, and this method will then count it twice, once for each process.
/proc/PID/maps will show you the memory map of the process, which can include the loaded executable itself and dynamically linked libraries, but also includes things that don't correspond to files like the heap, the stack, the vdso section which is a virtual shared object exported by the kernel, and so on.
lsof will list a variety of ways that files can be in use, which includes more than just file descriptors; it also includes the executable and shared libraries, but does not include the memory regions that don't correspond to files that show up in /proc/PID/maps like the stack, heap, vdso section, etc.
/proc/sys/fs/file-nr will report the number of open kernel file handles. A kernel file handle is different than a file descriptor; there can be more than one file descriptor open that point to the same file handle, for instance, by calling dup or dup2.
These differences explain why you're getting different numbers from these different ways of counting. The question is, what purpose are you using this count for? That will help answer which way of counting you should actually use.
1) no, but it seems you are confused as to what constitutes an open file descriptor, as suggested by your second question
2) see http://codingtragedy.blogspot.com/2015/04/nofile-ulimit-n-rlimitnofile-most.html - while it explains handling of a resource limits which may seem irrelevant, it also explains the difference between a file descriptor and a 'struct file' which you most likely want, and it even covers your lsof usage.
3) Again, it is unclear what is your actual question. current working directory is not a file descriptor and is only represented with an inode. A process may or may not keep fd for a linked library around, but mapping itself occupies a 'struct file'.

How the share library be shared by different processes?

I read some documents that share library comiled with -fPIC argument,
the .text seqment of the .so will be shared at process fork's dynamic linking stage
(eq. the process will map the .so to the same physical address)
i am interested in who (the kernel or ld.so ) and how to accomplish this?
maybe i should trace the code, but i dont know where to start it.
Nevertheless, i try to verify the statement.
I decide to check the function address like printf which is in the libc.so that all c program will link.
I get the printf virtual address of the process and need to get the physical address. Tried to write a kernel module and pass the address value to kernel, then call virt_to_phys. But it did not work cause the virt_to_phys only works for kmalloc address.
So, process page table look-at might be the solution to find the virtual address map to physical address. Were there any ways to do page table look-at? Or othere ways can fit the verify experiment?
thanks in advance!
The dynamic loader uses mmap(2) with MAP_PRIVATE and appropriate permissions. You can see what it does exactly by running a command from strace -e file,mmap. For instance:
strace -e file,mmap ls
All the magic comes from mmap(2). mmap(2) creates mappings in the calling process, they are usually backed either by a file or by swap (anonymous mappings). In a file-backed mapping, MAP_PRIVATE means that writes to the memory don't update the file, and cause that page to be backed by swap from that point on (copy-on-write).
The dynamic loader gets the info it needs from ELF's program headers, which you can view with:
readelf -l libfoo.so
From these, the dynamic loader determines what to map as code, read-only data, data and bss (zero-filled segment with zero size in file, non-zero size in memory, and a name only matched in crypticness by Lisp's car and cdr).
So, in fact, code and also data is shared, until a write causes copy-on-write. That is why marking constant data as constant is a potentially important space optimization (see DSO howto).
You can get more info on the mmap(2) manpage, and in Documentation/nommu-mmap.txt (the MMU case, no-MMU is for embedded devices, like ADSL routers and the Nintendo DS).
Shared libraries just a particular use of mapped files.
The address which a file is mapped at in a process's address space has nothing to do with whether it is shared or not.
Pages can be shared even if they are mapped at different addresses.
To find out if pages are being shared, do the following:
Find the address that the file(s) are mapped at by examining /proc/pid/maps
There is a tool which extracts data from /proc/pid/pagemap - find it and use it. This gives you info as to exactly which page(s) of a mapping are present and what physical location they are at
If two processes have a page mapped in at the same physical address, it is of course, shared.

how does kernel handle new file creation

I wish to understand the way kernel works when a user/app tries to create a file in a directorty.
The background - We have a java applicaiton which consumes messages over JMS, processes it and then writes the XML to an outbound queue+a local directory. Yesterday we obeserved unsual delays in writing to the directory. On 'ls|wc -l' we found >300,000 files in there. Did a quick strace on the process and found it full of mutex calls (More than 3/4 calls in the strace were mutex).
So i thought that new file creation is taking time becasue the system has to every time check certain things (e.g name of files to make sure that the new file with a specific name can be created) amongst 300,000 files and then create a file.
I cleared the directory and the applicaiton resumed to normal service levels.
My questions
Was my analysis correct (It seems cuz the app started working fine after a clear down)?
More imporatant, how does the kernel work when you try to creat a new file in directory.
Can the abnormal number of mutex calls be attributed to the high number of files in the directory?
Many thanks
J
Please read about the Linux Filesystem, i-nodes and d-nodes.
http://en.wikipedia.org/wiki/Inode_pointer_structure
The file system is organized into fixed-sized blocks. If your directory is relatively small, it fits in the direct blocks and things are fast. If your directory is not too big, it fits in the direct blocks and some indirect blocks, and is still reasonably fast. If your directory becomes too big, it spills into double indirect blocks and becomes slow.
Actual sizes depend on file system and kernel configuration.
Rule of thumb is to keep the directory under 12 blocks, depending on your block size. Many systems use 8K blocks; a fast directory is under 98,304 bytes.
A file entry is something like 16*4 bytes in size (IIRC), so plan on no more than 1500 files per directory as a practical upper limit.
Directories with large numbers of entries are often slow - how slow depends on the underlying filesystem.
The common solution is to create a hierarchy of directories, so each dir only has a few hundred entries.
Mutex system calls are a result of the application (probably something in the JVM or the Java libraries) making mutex calls.
Synchronisation internal to the kernel you will not see via strace, as this only examines system calls themselves.
A directory with lots of files should not become inefficient if you are using a filesystem which uses directory indexes; most now do (ext3 does optionally but it's normally enabled nowadays).
Non-indexed directories (like those used on the bad old filesystems - ext2, vfat etc) get really bad with lots of files, and you'll see the "open" system call taking a lot longer.

Resources