How the parent is restored after vfork()

How the parent is restored after vfork() - linux

As vfork creates the child process in the same address space as that of the parent, and when execv() is called on the child then how is the parent process restored, as exec loads the file and runs it in the same address space of the parent and hence the child?

When execv follows a true vfork, it does some of the work of fork: it allocates a new memory space into which to load the new program image and copies inheritable things like environment variables into it. Meanwhile, even vfork saves a bit of the parent’s state on the side, so that execv can restore the parent’s stack and instruction pointers once the child is separated.
For example, on Linux vfork calls common process-copying code via _do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, ...). copy_mm reacts to the CLONE_VM and just reuses the memory space with no call to dup_mm. _do_fork meanwhile reacts to the CLONE_VFORK, marks the child vfork_done, and suspends the caller until the memory space is no longer in use; if this is via execve, it goes through exec_mmap and mm_release, which sees the vfork_done and wakes the parent.
So, really, execve (which also calls copy_strings) is always "allocating a new memory space and copying environment variables into it"; after a normal fork, however, this is not observable because it happens at the same time as releasing the non-shared space created by the fork.

Related

How is it that a child process that calls exec() right after forking wouldn't need a separate copy of the parent's address space?

I am reading "Linux Kernel Development, Second Edition" by Robert Love. (Yes, it's a bit outdated). I understand from Chapter 3: Process Management that in COW (copy-on-write), the parent and child processes share the parent's address space until one of the processes writes to the address space. This is to prevent the unnecessary duplication of the parent's address space when it is not even being written to.
But then, it mentions that if the child process calls exec() right after fork(), the parent's address space and pages DON'T need to be copied and given to the child as a separate copy. That's where I'm lost.
According to the manual, "the exec() family of functions replaces the current process image with a new process image." The manual doesn't say anything about exec() creating a new address space for the new process image. So if the child process is sharing address space with its parent, wouldn't this mean that exec() would load an executable image into the parent's address space (which is shared with the child)?
Since that means the parent's address space would be overwritten, I don't understand how a child process that executes exec() after fork() WOULDN'T need a separate copy of its parent's address space to write to. Is there something I'm missing here?

Copy-on-Write mechanism implies, that none modification in child process will affect on parent.
Calling exec by the child is not an exception: it changes address space only for child, not for the parent.

You can even read on vfork() which doesn't have copy on write mechanism. It shares the address space of the parent process and parent process is suspected untill child process exists. It is interesting and makes things much more clearer.

forking in linux about COW

In linux, I know it's implemented by COW because of wasting. But, in the book says, when child calls exec() right after fork(), address spaces are never copied.
But I think if child use exec(), it means making new data or codes in the address space which is not yet copied. So when exec() is called, then address spaced is copied(Copy on Write), and new data or codes are written in here.
Am I wrong? Why exec() calls never copy parent's things?
Or If child calls exec(), then child just make his own mm_struct and write new data in his own address space which is newly made?(not copied from parent)

exec is library wrapper around the execve kernel call. there's going to be some stack activity before the execve starts (even if execve is called directly), so there will be at-least one stack block copied on write before the exec kicks in disconnects from the process context.
meanwhile the parent process will have been doing lots of copy on write before the child disconnects.

if a process mallocs memory and then forks, will the child process have a proper malloced memory

For the following code :
main() {
int *p = (int *)malloc(2*sizeof(int)) ;
if(fork()) wait() ;
else *p = 10 ;
}
I want to know that when we fork does the child process receive the malloced block in its process space too. That is, in the above code is it safe to say --:
*p = 10 ;

Yes the child will have proper malloc()ed memory.
First, know that there are two memory managers in place:
One is the Linux kernel, which allocates memory pages to processes. This is done through the sbrk() system call.
On the other hand, malloc() uses sbrk() to request memory from the kernel and then manages it, by breaking it in chunks, remembering how the memory has been divided and later mark them as available when free()ed (and at times perform something similar to garbage collection).
That said, what malloc() does with memory is completely transparent to the Linux kernel. It's effectively just a linked list or two, which you could have implemented yourself. What the Linux kernel sees as your memory are the pages assigned to your process and their contents.
When you call fork() (emphasis mine):
The child process is created with a single thread--the one that called fork(). The entire virtual address space of the parent is replicated in the child, including the states of mutexes, condition variables, and other pthreads objects; the use of pthread_atfork(3) may be helpful for dealing with problems that this can cause.
The child inherits copies of the parent's set of open file descriptors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes (see the description of F_SETOWN and F_SETSIG in fcntl(2)).
The child inherits copies of the parent's set of open message queue descriptors (see mq_overview(7)). Each descriptor in the child refers to the same open message queue description as the corresponding descriptor in the parent. This means that the two descriptors share the same flags (mq_flags).
The child inherits copies of the parent's set of open directory streams (see opendir(3)). POSIX.1-2001 says that the corresponding directory streams in the parent and child may share the directory stream positioning; on Linux/glibc they do not.
So fork() not only copy the entire virtual address space, but also all mutexes, file descriptors and basically every kind of resource the parent has opened. Part of the virtual address space copied is the linked list(s) of malloc(). So after a fork(), the malloc()ed memories of both processes are equal and the information malloc() keeps and what memory is allocated is also the same. However, they now live on separate memory pages.
Side information: One might think that fork() is a very expensive operation. However (from the man page):
Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.
This basically says that on fork()ing, no actual copying is done, but the pages are marked to be copied if the child tries to modify them. Effectively, if the child only reads from that memory, or completely ignore it, there is no copy overhead. This is very important for the common fork()/exec() pattern.

Function Fork in operating systems

Here is my question:
if a process (father) create a new process (child) with fork(),which of these data structure do not share between father and the son??
-process ID
-heap
-code
-stack

Relation for Process ID
Upon successful completion, fork() returns a value of 0 to the child
process and returns the process ID of the child process to the parent
process. Otherwise, a value of -1 is returned to the parent process, no
child process is created, and the global variable errno is set to indi-
cate the error
Relation of heap or memory space
The child gets an exact copy of the parents address space, which in many cases is likely to be laid out in the same format as the parent address space. I have to point out that each one will have it's own virtual address space for it's memory, such that each could have the same data at the same address, yet in different address spaces. Also, linux uses copy on write when creating child processes. This means that the parent and child will share the parent address space until one of them does a write, at which point the memory will be physically copied to the child. This eliminates unneeded copies when execing a new process. Since you're just going to overwrite the memory with a new executable, why bother copying it?
Relation for code
There is no object-oriented inheritence in C.
Fork'ing in C is basically the process being stopped while it is running, and an entire copy of it being made in (effectively) a different memory space, then both processes being told to continue. They will both continue from where the parent was paused. The only way you can tell which process you are in is to check the return value of the fork() call.
In such a situation the child doesn't really inherit everything from the parent process, it's more like it gets a complete copy of everything the parent had.
Stack
child process gets separate instance of global variable declared in parent process".
The point of separate processes is to separate memory. So you can't share variables between the parent and the child process once the fork occured.

The difference between fork(), vfork(), exec() and clone()

I was looking to find the difference between these four on Google and I expected there to be a huge amount of information on this, but there really wasn't any solid comparison between the four calls.
I set about trying to compile a kind of basic at-a-glance look at the differences between these system calls and here's what I got. Is all this information correct/am I missing anything important ?
Fork : The fork call basically makes a duplicate of the current process, identical in almost every way (not everything is copied over, for example, resource limits in some implementations but the idea is to create as close a copy as possible).
The new process (child) gets a different process ID (PID) and has the PID of the old process (parent) as its parent PID (PPID). Because the two processes are now running exactly the same code, they can tell which is which by the return code of fork - the child gets 0, the parent gets the PID of the child. This is all, of course, assuming the fork call works - if not, no child is created and the parent gets an error code.
Vfork: The basic difference between vfork() and fork() is that when a new process is created with vfork(), the parent process is temporarily suspended, and the child process might borrow the parent's address space. This strange state of affairs continues until the child process either exits, or calls execve(), at which point the parent
process continues.
This means that the child process of a vfork() must be careful to avoid unexpectedly modifying variables of the parent process. In particular, the child process must not return from the function containing the vfork() call, and it must not call exit() (if it needs to exit, it should use _exit(); actually, this is also true for the child of a normal fork()).
Exec: The exec call is a way to basically replace the entire current process with a new program. It loads the program into the current process space and runs it from the entry point. exec() replaces the current process with a the executable pointed by the function. Control never returns to the original program unless there is an exec() error.
Clone: clone(), as fork(), creates a new process. Unlike fork(), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.
When the child process is created with clone(), it executes the function application fn(arg) (This differs from fork(), where execution continues in the child from the point of the original fork() call.) The fn argument is a pointer to a function that is called by the child process at the beginning of its execution. The arg argument is passed to the fn function.
When the fn(arg) function application returns, the child process terminates. The integer returned by fn is the exit code for the child process. The child process may also terminate explicitly by calling exit(2) or after receiving a fatal signal.
Information gotten from:
Differences between fork and exec
http://www.allinterview.com/showanswers/59616.html
http://www.unixguide.net/unix/programming/1.1.2.shtml
http://linux.about.com/library/cmd/blcmdl2_clone.htm
Thanks for taking the time to read this ! :)

vfork() is an obsolete optimization. Before good memory management, fork() made a full copy of the parent's memory, so it was pretty expensive. since in many cases a fork() was followed by exec(), which discards the current memory map and creates a new one, it was a needless expense. Nowadays, fork() doesn't copy the memory; it's simply set as "copy on write", so fork()+exec() is just as efficient as vfork()+exec().
clone() is the syscall used by fork(). with some parameters, it creates a new process, with others, it creates a thread. the difference between them is just which data structures (memory space, processor state, stack, PID, open files, etc) are shared or not.

execve() replaces the current executable image with another one loaded from an executable file.
fork() creates a child process.
vfork() is a historical optimized version of fork(), meant to be used when execve() is called directly after fork(). It turned out to work well in non-MMU systems (where fork() cannot work in an efficient manner) and when fork()ing processes with a huge memory footprint to run some small program (think Java's Runtime.exec()). POSIX has standardized the posix_spawn() to replace these latter two more modern uses of vfork().
posix_spawn() does the equivalent of a fork()/execve(), and also allows some fd juggling in between. It's supposed to replace fork()/execve(), mainly for non-MMU platforms.
pthread_create() creates a new thread.
clone() is a Linux-specific call, which can be used to implement anything from fork() to pthread_create(). It gives a lot of control. Inspired on rfork().
rfork() is a Plan-9 specific call. It's supposed to be a generic call, allowing several degrees of sharing, between full processes and threads.

fork() - creates a new child process, which is a complete copy of the parent process. Child and parent processes use different virtual address spaces, which is initially populated by the same memory pages. Then, as both processes are executed, the virtual address spaces begin to differ more and more, because the operating system performs a lazy copying of memory pages that are being written by either of these two processes and assigns an independent copies of the modified pages of memory for each process. This technique is called Copy-On-Write (COW).
vfork() - creates a new child process, which is a "quick" copy of the parent process. In contrast to the system call fork(), child and parent processes share the same virtual address space. NOTE! Using the same virtual address space, both the parent and child use the same stack, the stack pointer and the instruction pointer, as in the case of the classic fork()! To prevent unwanted interference between parent and child, which use the same stack, execution of the parent process is frozen until the child will call either exec() (create a new virtual address space and a transition to a different stack) or _exit() (termination of the process execution). vfork() is the optimization of fork() for "fork-and-exec" model. It can be performed 4-5 times faster than the fork(), because unlike the fork() (even with COW kept in the mind), implementation of vfork() system call does not include the creation of a new address space (the allocation and setting up of new page directories).
clone() - creates a new child process. Various parameters of this system call, specify which parts of the parent process must be copied into the child process and which parts will be shared between them. As a result, this system call can be used to create all kinds of execution entities, starting from threads and finishing by completely independent processes. In fact, clone() system call is the base which is used for the implementation of pthread_create() and all the family of the fork() system calls.
exec() - resets all the memory of the process, loads and parses specified executable binary, sets up new stack and passes control to the entry point of the loaded executable. This system call never return control to the caller and serves for loading of a new program to the already existing process. This system call with fork() system call together form a classical UNIX process management model called "fork-and-exec".

The fork(),vfork() and clone() all call the do_fork() to do the real work, but with different parameters.
asmlinkage int sys_fork(struct pt_regs regs)
{
return do_fork(SIGCHLD, regs.esp, &regs, 0);
}
asmlinkage int sys_clone(struct pt_regs regs)
{
unsigned long clone_flags;
unsigned long newsp;
clone_flags = regs.ebx;
newsp = regs.ecx;
if (!newsp)
newsp = regs.esp;
return do_fork(clone_flags, newsp, &regs, 0);
}
asmlinkage int sys_vfork(struct pt_regs regs)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.esp, &regs, 0);
}
#define CLONE_VFORK 0x00004000 /* set if the parent wants the child to wake it up on mm_release */
#define CLONE_VM 0x00000100 /* set if VM shared between processes */
SIGCHLD means the child should send this signal to its father when exit.
For fork, the child and father has the independent VM page table, but since the efficiency, fork will not really copy any pages, it just set all the writeable pages to readonly for child process. So when child process want to write something on that page, an page exception happen and kernel will alloc a new page cloned from the old page with write permission. That's called "copy on write".
For vfork, the virtual memory is exactly by child and father---just because of that, father and child can't be awake concurrently since they will influence each other. So the father will sleep at the end of "do_fork()" and awake when child call exit() or execve() since then it will own new page table. Here is the code(in do_fork()) that the father sleep.
if ((clone_flags & CLONE_VFORK) && (retval > 0))
down(&sem);
return retval;
Here is the code(in mm_release() called by exit() and execve()) which awake the father.
up(tsk->p_opptr->vfork_sem);
For sys_clone(), it is more flexible since you can input any clone_flags to it. So pthread_create() call this system call with many clone_flags:
int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGNAL | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | CLONE_SYSVSEM);
Summary: the fork(),vfork() and clone() will create child processes with different mount of sharing resource with the father process. We also can say the vfork() and clone() can create threads(actually they are processes since they have independent task_struct) since they share the VM page table with father process.

in fork(), either child or parent process will execute based on cpu selection..
But in vfork(), surely child will execute first. after child terminated, parent will execute.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string