How does copy-on-write in fork() handle multiple fork? - linux

According to wikipedia (which could be wrong)
When a fork() system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process. But this is not needed in certain cases. Consider the case when a child executes an "exec" system call (which is used to execute any executable file from within a C program) or exits very soon after the fork(). When the child is needed just to execute a command for the parent process, there is no need for copying the parent process' pages, since exec replaces the address space of the process which invoked it with the command to be executed.
In such cases, a technique called copy-on-write (COW) is used. With this technique, when a fork occurs, the parent process's pages are not copied for the child process. Instead, the pages are shared between the child and the parent process. Whenever a process (parent or child) modifies a page, a separate copy of that particular page alone is made for that process (parent or child) which performed the modification. This process will then use the newly copied page rather than the shared one in all future references. The other process (the one which did not modify the shared page) continues to use the original copy of the page (which is now no longer shared). This technique is called copy-on-write since the page is copied when some process writes to it.
It seems that when either of the process tries to write to the page. A new copy of the page is allocated and assigned to the process that generated the page fault. The original page is marked writable afterwards.
My question is: what happens if the fork is called multiple times before any of the process made an attempt to write to a shared page?

If fork is called multiple times from the original parent process, then each of the children and parent will have their pages marked as read-only. When a child process attempts to write data then the page from the parent process is copied to its address space and the copied page is marked as writeable in the child but not in the parent.
If fork is called from the child process and the grand-child attempts to write, the page from the original parent is copied to the first child, and then to the grand child, and all is marked as writeable.

The original page is only marked writeable if it belongs to a single process, which might not be the case if there were multiple forks. The new page is always marked as writeable because it only belongs to the process which attempted to write it.

Related

How is it that a child process that calls exec() right after forking wouldn't need a separate copy of the parent's address space?

I am reading "Linux Kernel Development, Second Edition" by Robert Love. (Yes, it's a bit outdated). I understand from Chapter 3: Process Management that in COW (copy-on-write), the parent and child processes share the parent's address space until one of the processes writes to the address space. This is to prevent the unnecessary duplication of the parent's address space when it is not even being written to.
But then, it mentions that if the child process calls exec() right after fork(), the parent's address space and pages DON'T need to be copied and given to the child as a separate copy. That's where I'm lost.
According to the manual, "the exec() family of functions replaces the current process image with a new process image." The manual doesn't say anything about exec() creating a new address space for the new process image. So if the child process is sharing address space with its parent, wouldn't this mean that exec() would load an executable image into the parent's address space (which is shared with the child)?
Since that means the parent's address space would be overwritten, I don't understand how a child process that executes exec() after fork() WOULDN'T need a separate copy of its parent's address space to write to. Is there something I'm missing here?
Copy-on-Write mechanism implies, that none modification in child process will affect on parent.
Calling exec by the child is not an exception: it changes address space only for child, not for the parent.
You can even read on vfork() which doesn't have copy on write mechanism. It shares the address space of the parent process and parent process is suspected untill child process exists. It is interesting and makes things much more clearer.

Does fork() in linux copy all the parent's memory pages to the child?

I read about copy on write but I don't understand it. Does this mean that the memory pages are not copied until the child process has written something?
As long as no write operation is performed on a child's memory page, they are identical to a parent's memory page to a user process. Therefore, as long as the page is not written to, it can be used for both the parent and child.
If, however, a write operation is performed the parent's and child's versions differ. At this point, the parent's page is copied and assigned to the child in place of the parent page. This copy is called "copy on write" because the copy is performed when the page is written to.
Note that "copy on write" is just an optimization of the forking operation. A naive implementation simply duplicates the pages of the parent for the child instantly. By noticing that pages not yet written to do not require that duplication, that copy is postponed until the child actually writes something (the term "laziness" is often used for that delay), which might not happen at all.

Fork creates a new process that is exactly the same as its parent

From the assumption made in the title of my question "Fork create a new process that is exactly the same as its parent". I am wondering how a fork is really made by the operating system.
Considering a heavy process (huge RAM footprint) that fork itself to accomplish a small task (list the files into a directory). From the assumption, I expect that the child process will be as big as the first one. However my common sense tells me that it cannot be the case.
How does it work in the real world?
As others mentioned in the comments, a technique called Copy-On-Write mitigates the heavy cost of copying the entire memory space of the parent process. Copy-on-write means that memory pages are shared read-only between parent and child until either of them decides to write -- at which point the page is copied and each process gets its own private copy. This technique easily prevents a huge amount of copying that in a lot of cases would be a waste of time because the child will exec() or do something simple and exit.
Here's what happens in detail:
When you call fork(2), the only immediate cost you incur is the cost of allocating a new unique process descriptor and the cost of copying the parent's page tables. In Linux, fork(2) is implemented by the clone(2) syscall, which is a more general syscall that allows the caller to control which parts of the new process are shared with the parent. When called from fork(2), a set of flags are passed to indicate that nothing is to be shared (you can choose to share memory, file descriptors, etc - this is how threads are implemented: by calling clone(2) with CLONE_VM, which means "share the memory space").
Under the hood, each process's memory page has a bit flag that is the copy-on-write flag that indicates whether that page should be copied before being written to. fork(2) marks every writeable page in a process with that bit. Each page also maintains a reference count.
So, when a process forks, the kernel sets the copy-on-write bit on every non-private, writeable page of that process and increments the reference count by one. The child process has pointers to these same pages.
Then, every page is marked read-only so that an attempt to write to the page generates a page fault - this is needed to wake up the kernel so that it has a chance of seeing what happened and what needs to be done.
When either of the processes writes to a page that is still being shared, and thus is marked read-only, the kernel wakes up and attempts to figure out why there is a page fault. Assuming the parent / child process is writing to a legit location, the kernel eventually sees that the page fault was generated because the page is marked copy-on-write and there is more than one reference to that page.
The kernel then allocates memory, copies the page into the new location, and the write can proceed.
What is different across forks
You said that fork(2) creates a new process that is exactly the same as its parent. This is not quite true. There are several differences between the parent and the child:
The process ID is different
The parent process ID is different
The child's resource usage (CPU time, etc) are set to 0
File locks owned by the parent are not inherited
The set of pending signals on the child is cleared
Pending alarms are cleared on the child
About vfork
The vfork(2) syscall is very similar to fork(), but it does absolutely no copying - it doesn't even copy the parent's page tables. With the introduction of copy-on-write, it's not as widely used anymore, but historically it was used by processes that would call exec() after forking.
Naturally, attempting to write to memory in the child process after a vfork() results in chaos.
When a process forked, the child process use the same page table as its parent util parent or child write to its space!So

Function Fork in operating systems

Here is my question:
if a process (father) create a new process (child) with fork(),which of these data structure do not share between father and the son??
-process ID
-heap
-code
-stack
Relation for Process ID
Upon successful completion, fork() returns a value of 0 to the child
process and returns the process ID of the child process to the parent
process. Otherwise, a value of -1 is returned to the parent process, no
child process is created, and the global variable errno is set to indi-
cate the error
Relation of heap or memory space
The child gets an exact copy of the parents address space, which in many cases is likely to be laid out in the same format as the parent address space. I have to point out that each one will have it's own virtual address space for it's memory, such that each could have the same data at the same address, yet in different address spaces. Also, linux uses copy on write when creating child processes. This means that the parent and child will share the parent address space until one of them does a write, at which point the memory will be physically copied to the child. This eliminates unneeded copies when execing a new process. Since you're just going to overwrite the memory with a new executable, why bother copying it?
Relation for code
There is no object-oriented inheritence in C.
Fork'ing in C is basically the process being stopped while it is running, and an entire copy of it being made in (effectively) a different memory space, then both processes being told to continue. They will both continue from where the parent was paused. The only way you can tell which process you are in is to check the return value of the fork() call.
In such a situation the child doesn't really inherit everything from the parent process, it's more like it gets a complete copy of everything the parent had.
Stack
child process gets separate instance of global variable declared in parent process".
The point of separate processes is to separate memory. So you can't share variables between the parent and the child process once the fork occured.

How do parent and child share pages after fork() with Copy-On-Write in Linux?

I am interested in the fork() use case.
1) After fork, the kernel creates a new PCB for the child process. It changes the permission of all page table entries in the parent to be read-only. It then copies the page directories and table of the parent in the new child PCB.
Is this correct? Or do the parent and child process share them?
2) How does the kernel know the actual permission of the pages if it changes all them to be read only after copying them?
I think the opinions in the 1) is correct, during fork(), kernel will create a new task_struct for the child process, and copy the mm_struct and page table, it shares resources with parent process which depends on the clone_flags. And it will invalidate the MMU entries which is_cow_mapping() and marks the related page table entries to read only.
The answer to 2): The kernel only change page table entries which is_cow_mapping() to read only, when child or parent want to access these address ranges, kernel will duplicate and marked to read/write.
If exec() is called after fork(), a totally new virtual space will be setup, the older one which is copied from parent is discarded.

Resources