How to avoid shared memory leaks - linux

I'm using shared memory between 2 processes on Suse Linux and I'm wondering how can I avoid the shared memory leaks in case one process crashes or both. Does a leak occur in this case? If yes, how can I avoid it?

You could allocate space for two counters in the shared memory region: one for each process. Every few seconds, each process increments its counter, and checks that the other counter has been incremented as well. That makes it easy for these two processes, or an external watchdog, to tear down the shared memory if somebody crashes or exits.

If the subprocess is a simple fork() from the parent process, then mmap() with MAP_SHARED should work.
If the subprocess does an exec() to start a different executable, you can often pass file descriptors from shm_open() or a similar non-portable system call (see Is there anything like shm_open() without filename?) On many operating systems, including Linux, you can shm_unlink() the file descriptor from shm_open() so it doesn't leak memory when your processes die, and use fcntl() to clear the close-on-exec flag on the shm file descriptor so that your child process can inherit it across exec. This is not well defined in the POSIX standard but it appears to be very portable in practice.
If you need to use a filename instead of just a file descriptor number to pass the shared memory object to an unrelated process, then you have to figure out some way to shm_unlink() the file yourself when it's no longer needed; see John Zwinck's answer for one method.

Related

Time waste of execv() and fork()

I am currently learning about fork() and execv() and I had a question regarding the efficiency of the combination.
I was shown the following standard code:
pid = fork();
if(pid < 0){
//handle fork error
}
else if (pid == 0){
execv("son_prog", argv_son);
//do father code
I know that fork() clones the entire process (copying the entire heap, etc) and that execv() replaces the current address space with that of the new program. With this in mind, doesn't it make it very inefficient to use this combination? We are copying the entire address space of a process and then immediately overwrite it.
So my question:
What is the advantage that is achieved by using this combo (instead of some other solution) that makes people still use this, even though we have waste?
What is the advantage that is achieved by using this combo (instead of some other solution) that makes people still use this even though we have waste?
You have to create a new process somehow. There are very few ways for a userspace program to accomplish that. POSIX used to have vfork() alognside fork(), and some systems may have their own mechanisms, such as Linux-specific clone(), but since 2008, POSIX specifies only fork() and the posix_spawn() family. The fork + exec route is more traditional, is well understood, and has few drawbacks (see below). The posix_spawn family is designed as a special purpose substitute for use in contexts that present difficulties for fork(); you can find details in the "Rationale" section of its specification.
This excerpt from the Linux man page for vfork() may be illuminating:
Under Linux, fork(2) is implemented using copy-on-write pages, so the only penalty incurred by fork(2) is the time and memory required to duplicate the parent’s page tables, and to create a unique task structure for the child. However, in the bad old days a fork(2) would require making a complete copy of the caller’s data space, often needlessly, since usually immediately afterwards an exec(3) is done. Thus, for greater efficiency, BSD introduced the vfork() system call, which did not fully copy the address space of the parent process, but borrowed the parent’s memory and thread of control until a call to execve(2) or an exit occurred. The parent process was suspended while the child was using its resources. The use of vfork() was tricky: for example, not modifying data in the parent process depended on knowing which variables are held in a register.
(Emphasis added)
Thus, your concern about waste is not well-founded for modern systems (not limited to Linux), but it was indeed an issue historically, and there were indeed mechanisms designed to avoid it. These days, most of those mechanisms are obsolete.
Another answer states:
However, in the bad old days a fork(2) would require making a complete copy of the caller’s data space, often needlessly, since usually immediately afterwards an exec(3) is done.
Obviously, one person's bad old days are a lot younger than others remember.
The original UNIX systems did not have the memory for running multiple processes and they did not have an MMU for keeping several processes in physical memory ready-to-run at the same logical address space: they swapped out processes to disk that it wasn't currently running.
The fork system call was almost entirely the same as swapping out the current process to disk, except for the return value and for not replacing the remaining in-memory copy by swapping in another process. Since you had to swap out the parent process anyway in order to run the child, fork+exec was not incurring any overhead.
It's true that there was a period of time when fork+exec was awkward: when there were MMUs that provided a mapping between logical and physical address space but page faults did not retain enough information that copy-on-write and a number of other virtual-memory/demand-paging schemes were feasible.
This situation was painful enough, not just for UNIX, that page fault handling of the hardware was adapted to become "replayable" pretty fast.
Not any longer. There's something called COW (Copy On Write), only when one of the two processes (Parent/Child) tries to write to a shared data, it is copied.
In the past:
The fork() system call copied the address space of the calling process (the parent) to create a new process (the child).
The copying of the parent's address space into the child was the most expensive part of the fork() operation.
Now:
A call to fork() is frequently followed almost immediately by a call to exec() in the child process, which replaces the child's memory with a new program. This is what the the shell typically does, for example. In this case, the time spent copying the parent's address space is largely wasted, because the child process will use very little of its memory before calling exec().
For this reason, later versions of Unix took advantage of virtual memory hardware to allow the parent and child to share the memory mapped into their respective address spaces until one of the processes actually modifies it. This technique is known as copy-on-write. To do this, on fork() the kernel would copy the address space mappings from the parent to the child instead of the contents of the mapped pages, and at the same time mark the now-shared pages read-only. When one of the two processes tries to write to one of these shared pages, the process takes a page fault. At this point, the Unix kernel realizes that the page was really a "virtual" or "copy-on-write" copy, and so it makes a new, private, writable copy of the page for the faulting process. In this way, the contents of individual pages aren't actually copied until they are actually written to. This optimization makes a fork() followed by an exec() in the child much cheaper: the child will probably only need to copy one page (the current page of its stack) before it calls exec().
It turns out all those COW page faults are not at all cheap when the process has a few gigabytes of writable RAM. They're all gonna fault once even if the child has long since called exec(). Because the child of fork() is no longer allowed to allocate memory even for the single threaded case (you can thank Apple for that one), arranging to call vfork()/exec() instead is hardly more difficult now.
The real advantage to the vfork()/exec() model is you can set the child up with an arbitrary current directory, arbitrary environment variables, and arbitrary fs handles (not just stdin/stdout/stderr), an arbitrary signal mask, and some arbitrary shared memory (using the shared memory syscalls) without having a twenty-argument CreateProcess() API that gets a few more arguments every few years.
It turned out the "oops I leaked handles being opened by another thread" gaffe from the early days of threading was fixable in userspace w/o process-wide locking thanks to /proc. The same would not be in the giant CreateProcess() model without a new OS version, and convincing everybody to call the new API.
So there you have it. An accident of design ended up far better than the directly designed solution.
It's not that expensive (relatively to spawning a process directly), especially with copy-on-write forks like you find in Linux , and it's kind of elegant for:
when you really just want to fork off a clone of the current process (I find this to be very useful for testing)
for when you need to do something just before loading the new executable
(redirect filedescriptors, play with signal masks/dispositions, uids, etc.)
POSIX now has posix_spawn that effectively allows you to combine fork/and-exec (possibly more efficiently than fork+exec; if it is more efficient, it'll usually be implemented through some cheaper but less robust fork (clone/vfork) followed by exec), but the way it achieves #2 is through a ton of relatively messy options, which can never be as complete and powerful and clean as just allowing you to run arbitrary code just before the new process image is loaded.
A process created by exec() et al, will inherit its file handles from the parent process (including stdin, stdout, stderr). If the parent changes these after calling fork() but before calling exec() then it can control the child's standard streams.

linux: munmap shared memory in on single call

If a process calls mmap(...,MAP_ANONYMOUS | MAP_SHARED,...) and forks N children, is it possible for any one of these processes (parent or descendants) to munmap() the memory for all processes in one go, thus releasing the physical memory, or does every of these processes have to munmap() individually?
(I know the memory will be unmapped on process exit, but the children won't exit yet).
Alternatively, is there a way to munmap memory from another process? I'm thinking of a call something like munmap(pid,...).
Or is there a way to achieve what I am looking for using non-anonymous mappings and performing an operation on the related file descriptor (e.g closing the file)?
My processes are performance sensitive, and I would like to avoid performing lots of IPC when it becomes known that the shared memory will no longer be used by anyone.
No, there is no way to unmap memory in one go.
If you don't need mapped memory in child processes at all, you may mark mappings with madvise(MADV_DONTFORK) before forking.
In emergency situations, you may invoke syscalls from inside external processes by using gdb:
Figure out PID of target process
List mapped memory with cat /proc/<PID>/maps
Attach to process using gdb: gdb -p <PID> (it will suspend execution of target process)
Run from gdb: call munmap(0x<address>, 0x<size>) for each region you need to unmap
Exit gdb (execution of process is resumed)
It must be obvious that if your process tries to access unmapped memory, it will receive SIGSEGV. So, you must be 100% sure what you are doing.

Is linux fork insecure

I was reading this article
It says that the fork create a copy of itself and fork man also says so
. The entire virtual address space of the parent is replicated in the child
Does this mean child process can read all my process memory state ?
Can child process dump the entire parent memory state and it can be analysed to extract parent variable and its value. ?
But the article also says that two process cannot ready each other data.
So i am confused ?
Yes, the child process can read a pristine copy of all of the parent process state (but when writing, only its own address space is affected) just after a fork(2). However, most of the time, the child would eventually use execve(2) to start a new program, and that would "clear" and replace the copy of the original parent's address space (by a fresh address space). Notice that execve and mmap(2) (see also shared memory in shm_overview(7)...) are the common ways to change the address space in virtual memory of some process (and how the kernel handles page faults).
The kernel uses (and sets up the MMU for) lazy copy on write machinery to make the child's address space a copy of the parent's one, so fork is quite efficient in practice.
Read also proc(5), then type the follow commands:
cat /proc/self/maps
cat /proc/$$/maps
sudo cat /proc/1/maps
and understand what is happening
Read also the wikipage on fork, and the Advanced Linux Programming book.
There is no insecurity, because if the child is changing some data (e.g. a variable, a heap or stack location, ...) it does not affect the parent process.
If the program doing the fork is keeping some password in some virtual memory location, the child process would be able to read that location as long as it is executing the same program. Once the child did a successful execve (which is the common situation, and what any shell is doing) the previous address space is gone and replaced by a new one, described in the ELF executable of that exec-ed program.
There is no "lie" or "insecurity" in that Unix model. But contrarily to several other operating systems, Unix & POSIX have two separate system calls for creating a new process (fork) and executing a new program (execve). Other systems might have some single spawn operation mixing the two abilities. posix_spawn is often implemented by a mixture of fork & execve (and so are system(3) & popen(3), also using waitpid(2) & /bin/sh....).
The advantage of that Unix approach (having separated fork & execve) is that after the fork and before the execve in the child you can do a lot of useful things (e.g. closing useless file descriptors, ...). Operating Systems not separating the two features may need to have a quite complex spawning primitive.
There are rare occasions where a fork is not followed by some execve. Some MPI implementations might do that, and you might also do that. But then you know that you are able to read all the parent's address space thru your own copy - so what you felt was an insecurity is becoming a useful feature. In the old days you had the obsolete vfork which blocked the parents. There is not need to use it today; actually, fork is often implemented thru clone(2) which you should not use directly in practice (see futex(7)...) but only thru POSIX pthreads. But thinking of fork as a magical cloner of your process might help.
When coding (even in C) don't forget to test against failure of fork and of execve. See perror(3)
PS. the fork syscall is as difficult to understand as the multiverse idea. Both are "forking" the time!
When you call fork(), the new process will get access to the copy of the parent process memory (i.e. variables, file descriptors etc).
This is in contrast with threads, where all threads share the same memory space, i.e. variable modified in one thread will get a new value in all other threads.
So if, after forking, parent process modifies memory, the child process will not see that change - because the memory has been copied, the child process' memory would not get altered.

Linux Kernel Procfs multiple read/writes

How does the Linux kernel handle multiple reads/writes to procfs? For instance, if two processes write to procfs at once, is one process queued (i.e. a kernel trap actually blocks one of the processes), or is there a kernel thread running for each core?
The concern is if you have a buffer used within a function (static to the global space), do you have to protect it or will the code be run sequentially?
It depends on each and every procfs file implementation. No one can even give you a definite answer because each driver can implement its own procfs folder and files (you didn't specify any specific files. Quick browsing in http://lxr.free-electrons.com/source/fs/proc/ shows that some files do use locks).
In either way you can't use the global buffer because a context switch can always occur, if not in the kernel then it can catch your reader thread right after it finishes the read syscall and before it started to process the read data.

Cause a Linux poll event on a shared memory file

Two Linux processes open and mmap the same /dev/shm/ shared memory file and use it as common memory. Question: what is the simplest and best way for one process to "wake up" the other process to notify that it should look in the memory?
For example, can one process cause a poll() event for the other process's file descriptor?
The solution doesn't need to be portable but I would like it to be simple.
That's why POSIX has condition variables.
Define a shared POSIX condition variable and its associated mutex in the shared memory region.
Then have one thread wait on the condition variable and the other signal the condition variable event when it wants the other thread to look in the memory.
There's a lot of material on the web on condition variables.
Here is one pretty good short one: https://computing.llnl.gov/tutorials/pthreads/#ConditionVariables
You may please consider using a semaphore (POSIX named semaphore) also to solve this.
One simple example, using shared Memory (In the example it is in System V, but you can use it with POSIX too) and POSIX semaphore is in the link ,
How can 2 processes talk to each other without pipe()?

Resources