linux: munmap shared memory in on single call - linux

If a process calls mmap(...,MAP_ANONYMOUS | MAP_SHARED,...) and forks N children, is it possible for any one of these processes (parent or descendants) to munmap() the memory for all processes in one go, thus releasing the physical memory, or does every of these processes have to munmap() individually?
(I know the memory will be unmapped on process exit, but the children won't exit yet).
Alternatively, is there a way to munmap memory from another process? I'm thinking of a call something like munmap(pid,...).
Or is there a way to achieve what I am looking for using non-anonymous mappings and performing an operation on the related file descriptor (e.g closing the file)?
My processes are performance sensitive, and I would like to avoid performing lots of IPC when it becomes known that the shared memory will no longer be used by anyone.

No, there is no way to unmap memory in one go.
If you don't need mapped memory in child processes at all, you may mark mappings with madvise(MADV_DONTFORK) before forking.
In emergency situations, you may invoke syscalls from inside external processes by using gdb:
Figure out PID of target process
List mapped memory with cat /proc/<PID>/maps
Attach to process using gdb: gdb -p <PID> (it will suspend execution of target process)
Run from gdb: call munmap(0x<address>, 0x<size>) for each region you need to unmap
Exit gdb (execution of process is resumed)
It must be obvious that if your process tries to access unmapped memory, it will receive SIGSEGV. So, you must be 100% sure what you are doing.

Related

What happens to allocated memory of other threads when forking

I have a huge application that needs to fork itself at some point. The application is multithreaded and has about 200MB of allocated memory. What I want to do now to ensure that the data allocated by the process wont get duplicated is to start a new thread and fork inside of this thread. From what I have read, only the thread that calls fork will be duplicated, but what will happen to the allocated memory? Will that still be there? The purpose of this is to restart the application with other startup parameters, when its forked, it will call main with my new parameters, thus getting hopefully a new process of the same program. Now before you ask: I cannot assure that the binary of that process will still be in the same place as when I started the process, otherwise I could just fork and exec whats in /proc/self/exe.
Threads are execution units inside the big bag of resources that a process is. A process is the whole thing that you can access from any thread in the process: all the threads, all the file descriptors, all the other resources. So memory is absolutely not tied to a thread, and forking from a thread has no useful effect. Everything still needs to be copied over since the point of forking is creating a new process.
That said, Linux has some tricks to make it faster. Copying 2 gigabytes worth of RAM is neither fast or efficient. So when you fork, Linux actually gives the new process the same memory (at first), but it uses the virtual memory system to mark it as copy-on-write: as soon as one process needs to write to that memory, the kernel intercepts it and allocates distinct memory so that the other process isn't affected.

Is linux fork insecure

I was reading this article
It says that the fork create a copy of itself and fork man also says so
. The entire virtual address space of the parent is replicated in the child
Does this mean child process can read all my process memory state ?
Can child process dump the entire parent memory state and it can be analysed to extract parent variable and its value. ?
But the article also says that two process cannot ready each other data.
So i am confused ?
Yes, the child process can read a pristine copy of all of the parent process state (but when writing, only its own address space is affected) just after a fork(2). However, most of the time, the child would eventually use execve(2) to start a new program, and that would "clear" and replace the copy of the original parent's address space (by a fresh address space). Notice that execve and mmap(2) (see also shared memory in shm_overview(7)...) are the common ways to change the address space in virtual memory of some process (and how the kernel handles page faults).
The kernel uses (and sets up the MMU for) lazy copy on write machinery to make the child's address space a copy of the parent's one, so fork is quite efficient in practice.
Read also proc(5), then type the follow commands:
cat /proc/self/maps
cat /proc/$$/maps
sudo cat /proc/1/maps
and understand what is happening
Read also the wikipage on fork, and the Advanced Linux Programming book.
There is no insecurity, because if the child is changing some data (e.g. a variable, a heap or stack location, ...) it does not affect the parent process.
If the program doing the fork is keeping some password in some virtual memory location, the child process would be able to read that location as long as it is executing the same program. Once the child did a successful execve (which is the common situation, and what any shell is doing) the previous address space is gone and replaced by a new one, described in the ELF executable of that exec-ed program.
There is no "lie" or "insecurity" in that Unix model. But contrarily to several other operating systems, Unix & POSIX have two separate system calls for creating a new process (fork) and executing a new program (execve). Other systems might have some single spawn operation mixing the two abilities. posix_spawn is often implemented by a mixture of fork & execve (and so are system(3) & popen(3), also using waitpid(2) & /bin/sh....).
The advantage of that Unix approach (having separated fork & execve) is that after the fork and before the execve in the child you can do a lot of useful things (e.g. closing useless file descriptors, ...). Operating Systems not separating the two features may need to have a quite complex spawning primitive.
There are rare occasions where a fork is not followed by some execve. Some MPI implementations might do that, and you might also do that. But then you know that you are able to read all the parent's address space thru your own copy - so what you felt was an insecurity is becoming a useful feature. In the old days you had the obsolete vfork which blocked the parents. There is not need to use it today; actually, fork is often implemented thru clone(2) which you should not use directly in practice (see futex(7)...) but only thru POSIX pthreads. But thinking of fork as a magical cloner of your process might help.
When coding (even in C) don't forget to test against failure of fork and of execve. See perror(3)
PS. the fork syscall is as difficult to understand as the multiverse idea. Both are "forking" the time!
When you call fork(), the new process will get access to the copy of the parent process memory (i.e. variables, file descriptors etc).
This is in contrast with threads, where all threads share the same memory space, i.e. variable modified in one thread will get a new value in all other threads.
So if, after forking, parent process modifies memory, the child process will not see that change - because the memory has been copied, the child process' memory would not get altered.

Prevent fork() from duplicating memory mapping of the process (mmap'ed)

I Have a Linux device driver that implements mmap sets of operations (vm_operations),
And a process which memory maps the device driver memory space using mmap calls.
The process sometimes call fork() to perform a task and then destroy the child process.
this is causing extensive use of mmap calls on the child processes to duplicate the memory maps of the parent.
I want to avoid these duplications and actually make all the memory maps private to the parent only.
Is this possible on Linux ?
The duplication of a mapping can be avoided by calling madvise(address, length, MADV_DONTFORK) on the memory mapping in question before calling fork.

How to avoid shared memory leaks

I'm using shared memory between 2 processes on Suse Linux and I'm wondering how can I avoid the shared memory leaks in case one process crashes or both. Does a leak occur in this case? If yes, how can I avoid it?
You could allocate space for two counters in the shared memory region: one for each process. Every few seconds, each process increments its counter, and checks that the other counter has been incremented as well. That makes it easy for these two processes, or an external watchdog, to tear down the shared memory if somebody crashes or exits.
If the subprocess is a simple fork() from the parent process, then mmap() with MAP_SHARED should work.
If the subprocess does an exec() to start a different executable, you can often pass file descriptors from shm_open() or a similar non-portable system call (see Is there anything like shm_open() without filename?) On many operating systems, including Linux, you can shm_unlink() the file descriptor from shm_open() so it doesn't leak memory when your processes die, and use fcntl() to clear the close-on-exec flag on the shm file descriptor so that your child process can inherit it across exec. This is not well defined in the POSIX standard but it appears to be very portable in practice.
If you need to use a filename instead of just a file descriptor number to pass the shared memory object to an unrelated process, then you have to figure out some way to shm_unlink() the file yourself when it's no longer needed; see John Zwinck's answer for one method.

Can I coredump a process that is blocking on disk activity (preferably without killing it)?

I want to dump the core of a running process that is, according to /proc/<pid>/status, currently blocking on disk activity. Actually, it is busy doing work on the GPU (should be 4 hours of work, but it has taken significantly longer now). I would like to know how much of the process's work has been done, so it'd be good to be able to dump the process's memory. However, as far as I know, "blocking on disk activity" means that it's not possible to interrupt the process in any way, and coredumping a process e.g. using gdb requires interrupting and temporarily stopping the process in order to attach via ptrace, right?
I know that I could just read /proc/<pid>/{maps,mem} as root to get the (maybe inconsistent) memory state, but I don't know any way to get hold of the process's userspace CPU register values... they stay the same while the process is inside the kernel, right?
You can probably run gcore on your program. It's basically a wrapper around GDB that attaches, uses the gcore command, and detaches again.
This might interrupt your IO (as if it received a signal, which it will), but your program can likely restart it if written correctly (and this may occur in any case, due to default handling).

Resources