What functions are used when the system call fork() is called - linux

I've been searching through the net to find what functions are used inside the fork.c and in what order, however I can't seem to find the answer. All I see is what fork.c does. I know that fork.c uses _do_fork() but how it gets there I don't know.

When fork() system call is made it creates a new process by duplicating the calling process. And new process will be called the child process.
Look at this code its basic overview.
fork()->sys_fork()->do_fork()
sys_fork()
{
1. First it will validate the arguments.
2. Invoke do_fork.
3. return pid. (child pid)
}
do_fork()
{
1. First it will Allocate new address space.
2. Copy Segments of Caller address space to new address space.
3. allocate new task_struct instance. (PCB)
4. copy caller task_struct entries to new task_struct.
5. return.
}
On success, the PID of the child process is returned in the parent, and 0 is returned in the child.
Note: Their Are some few more calls but these two are most important and if you want to know more look into kernel source. If Still need help let me know.

Related

Where does task_struct get initialized in the Linux kernel?

Is there a function that gets called to initialize (at least some) values of task_struct? Or is there any other function that gets called upon task (specifically, user-space process) creation?
Since the only way to create a new process in Linux is through the clone() syscall (or other variants like fork()), there is no real function to "create a new task" from scratch, but there sure is a function to duplicate an existing task, applying the needed modifications. The function used for this is copy_process(), which uses dup_task_struct() to duplicate the associated struct task_struct.
There is however one special exception to this rule: the init process (the first process created after booting) is created by the kernel itself (every other process is then created by init or by some child of init through clone() + execve()). The task_struct for the init task is statically defined at compile time (see here). You can look at this other answer if you want to know more.

After clone/fork/vfork, parent and child processes have different return addresses

I work on centos 6.6, and want to add a post-clone hook for clone. I have changed the syscall_table[__NR_clone] to my function, where I change the return address on the stack to my post-clone function, and then make it jump to the actual clone syscall so that after the actual syscall, the program will return back to my post-clone function. Since I change the return address on the stack before the actual clone occurs, both parent and child processes are supposed to have the same return address. However, only the parent process returns back to my post-clone, while child process returns to the actual return address. Hope someone could help me figure out why it behave like this.
I finally figured out why the parent process and child process have different return addresses after clone/fork/vfork. The system call clone will invoke copy_process() in kernel/fork.c. The copy_process() will duplicate the current task structure to a variable p. Then p is passed to copy_thread(), where the instruction pointer (ip) of p is assigned by ret_from_fork, which is a assembly function in entry.S (or entry_32.S or entry_64.S). The 64 bits version varies a little bit but has the same idea.
The child process's ip is changed to ret_from_fork, while the parent process's ip remains the same. Therefore, after the system call clone/fork/vfork, parent and child processes will have different return addresses.

The difference between wait_queue_head and wait_queue in linux kernel

I can find many examples regarding wait_queue_head.
It works as a signal, create a wait_queue_head, someone
can sleep using it until someother kicks it up.
But I can not find a good example of using wait_queue itself, supposedly very related to it.
Could someone gives example, or under the hood of them?
From Linux Device Drivers:
The wait_queue_head_t type is a fairly simple structure, defined in
<linux/wait.h>. It contains only a lock variable and a linked list
of sleeping processes. The individual data items in the list are of
type wait_queue_t, and the list is the generic list defined in
<linux/list.h>.
Normally the wait_queue_t structures are allocated on the stack by
functions like interruptible_sleep_on; the structures end up in the
stack because they are simply declared as automatic variables in the
relevant functions. In general, the programmer need not deal with
them.
Take a look at A Deeper Look at Wait Queues part.
Some advanced applications, however, can require dealing with
wait_queue_t variables directly. For these, it's worth a quick look at
what actually goes on inside a function like interruptible_sleep_on.
The following is a simplified version of the implementation of
interruptible_sleep_on to put a process to sleep:
void simplified_sleep_on(wait_queue_head_t *queue)
{
wait_queue_t wait;
init_waitqueue_entry(&wait, current);
current->state = TASK_INTERRUPTIBLE;
add_wait_queue(queue, &wait);
schedule();
remove_wait_queue (queue, &wait);
}
The code here creates a new wait_queue_t variable (wait, which gets
allocated on the stack) and initializes it. The state of the task is
set to TASK_INTERRUPTIBLE, meaning that it is in an interruptible
sleep. The wait queue entry is then added to the queue (the
wait_queue_head_t * argument). Then schedule is called, which
relinquishes the processor to somebody else. schedule returns only
when somebody else has woken up the process and set its state to
TASK_RUNNING. At that point, the wait queue entry is removed from the
queue, and the sleep is done
The internals of the data structures involved in wait queues:
Update:
for the users who think the image is my own - here is one more time the link to the Linux Device Drivers where the image is taken from
Wait queue is simply a list of processes and a lock.
wait_queue_head_t represents the queue as a whole. It is the head of the waiting queue.
wait_queue_t represents the item of the list - a single process waiting in the queue.

Fork()-ing a new process

Fork()-ing a process will end up calling do_fork() inside kernel, making an exact copy of itself. When I read through books, it says that child of fork will call exec to create the new process.
example:
ls command on a shell, will create this way.
sh(Parent)
|
sh(Child)
|
ls(New Process)
But, I am not able to understand how & where the exec*() is called?
Because, All I can see is the shell(child) is what created in fork.
But, when and where will the new process be created/executed?
You have to exec() if you actually want a new program running in one of the processes (usually the child but not absolutely necessary). In your specific case where the shell executes ls, the shell first forks, then the child process execs. But it's important to realise that this is two distinct operations.
All fork() does is give you two (nearly) identical processes and you can then use the return code from fork() to decide if you're the parent (you get the positive PID of the child, or -1 if the fork() failed) or child (you get 0).
See this answer for a description on how fork() and exec() work together (under your control) and how they can be used without each other.
Similar to do_fork(), the exec stuff all boils down to calls to do_execve, located in exec.c.

unix fork() understanding

int main(){
fork();
}
I know this is a newbie question, but my understanding is that the parent process now will fork a new child process exactly as the parent one, which means that the child should also fork a child process and so on... In reality, this only generates one child process. I cant understand what code will the child be executing?
The child process begins executing at the exact point where the last one left off - after the fork statement. If you wanted to fork forever, you'd have to put it in a while loop.
As everybody mentioned, the child also starts executing after fork() has finished. Thus, it doesn't call fork again.
You could see it clearly in the very common usage like this:
int main()
{
if (fork())
{
// you are in parent. The return value of fork was the pid of the child
// here you can do stuff and perhaps eventually `wait` on the child
}
else
{
// you are in the child. The return value of fork was 0
// you may often see here an `exec*` command
}
}
You missed a semi-colon.
But the child (and also the parent) is continuing just after the fork happenned. From the point of view of application programming, fork (like all system calls) is "atomic".
The only difference between the two processes (which after the fork have conceptually separate memory spaces) is the result of the fork.
If the child went on to call fork, the child would have two forks (the one that created it and the one that it then made) while the parent would only have one (the one that gave it a child). The nature of fork is that one process calls it and two processes return from it.

Resources