I am looking for a way on Linux to fork a child process that runs exactly the same binary as its parent.
The basic mechanism that I see available is fork() and execv(). The trouble is what parameter to supply to execv() to run exactly the same code?
The nearest solution I have found is to run /proc/$$/exe. The trouble is that this is only a symlink and so if the executable had been modified on disk after my process started then the child would end up running the new version. I am looking for a more robust solution such that in practice I can depend on the parent and child processes to be running the same code even if software is being reinstalled in the background.
Related
Does the Linux scheduler prefer to run the child process after fork() to the father process?
Usually, the forked process will execute exec of some kind so, it is better to let child process to run before father process(to prevent copy on write).
I assume that the child will execute exec as first operation after it will be created.
Is my assumption (that the scheduler will prefer child process) correct. If not, why? If yes, is there more reasons to run child first?
To quote The Linux Programming Interface (pg. 525) for a general answer:
After a fork(), it is indeterminate which process - the parent or the child - next has access to the CPU. (On a multiprocessor system, they may both simultaneously get access to a CPU.)
The book goes on about the differences in kernel versions and also mentions CFS / Linux 2.6.32:
[...] since Linux 2.6.32, it is once more the parent that is, by default, run first after a fork(). This default can be changed by assigning a nonzero value to the Linux-specific /proc/sys/kernel/sched_child_runs_first file.
This behaviour is still present with CFS although there are some concerns for the future of this feature. Looking at the CFS implementation, it seems to schedule the parent before the child.
The way to go for you would be to set /proc/sys/kernel/sched_child_runs_first to a non-zero value.
Edit: This answer analyzes the default behaviour and compares it to sched_child_runs_first.
For the case where the child calls exec at the first opportunity you can use vfork instead of fork. vfork suspends the parent until the child calls _exit or exec*. However, once it calls exec the child will be suspended if code has to be loaded from disk. In this case, the parent has a good chance to continue before the child does.
I have a simple Shell script in which multiple executables will run sequentially. Every time a new executable starts to run, a new process with a new PID will start. Is it possible to run them with the same PID? I know for a shell script, we can use "source". But I do not know how to handle executables.
In principle, I believe it should be possible, but in practice it would be very complicated and brittle.
The exec family of system calls in Linux allows a process to replace itself with an entirely new process, which holds on to the same PID. The tricky part would be to somehow "return" from the second process back to the first. When exec is called, the OS loads everything it needs to start running the new process, and wipes out every piece of state related to the current process (the one being replaced). And when the new process terminates, the OS releases all resources (including the PID) associated with that process.
So if you really wanted to do this, you would have to hijack how processes terminate in order to restart your original process rather than let the OS clean everything up. How can you do this? Well, execle and execvpe functions allow a program to specify the environment of the new process before starting the process. Since every process depends on libc (or equivalent) to bring up/tear down a process, you should be able to provide a custom libc which would re-start execution of your script, or exec another process. The great difficulty would be hacking such a libc. Additionally, you would have to figure out a good way for your master program to keep state even though the OS wipes away any memory it might have been using when it called exec. You can probably accomplish this with temporary files.
So long story short, don't do it. While it's kind of fun for me to sit here and think about the massive hacks that would be required to pull this off, it would be a huge pain and I'm sure there is a much more elegant solution to whatever problem you are actually trying to solve.
The PID is assigned by OS when shell creates a new process. There is no way to tell the OS to use some specific PID. So it's not possible.
I am learning how to program in Linux OS platform and what is the implementation to run my app in background process.
Like for example in this scenario: When executing my app in the shell it will automatically run in background process. Take note that I don't need the "&" in the shell when I run my app. What standard Linux function to do this implementation?
And how can I kill or terminate the app that was run in background in the code? What I mean is I don't need to execute the kill shell command to terminate my app in background? Or if the app meets a condition then it will kill itself or terminate itself.
Many thanks.
You want to daemonize your program. This is generally done by fork() and a few other system calls.
There are more details here
Background applications can be killed by using kill. It is good practice for a daemon to write its process ID (PID) in a well-known file so it can be located easily.
While you should learn about fork() exec() wait() and kill(), its sometimes more convenient to just use daemon(3) if it exists.
Caveats:
Not in POSIX.1-2001
Not present in all BSD's (may be named something else, however)
If portability is not a major concern, it is rather convenient. If portability is a major concern, you can always write your own implementation and use that.
From the manpage:
SYNOPSIS
#include <unistd.h>
int daemon(int nochdir, int noclose);
DESCRIPTION
The daemon() function is for programs wishing to detach themselves from the
controlling terminal and run in the background as system daemons.
If nochdir is zero, daemon() changes the calling process's current working directory
to the root directory ("/"); otherwise, the current working directory is left
unchanged.
If noclose is zero, daemon() redirects standard input, standard output and standard
error to /dev/null; otherwise, no changes are made to these file descriptors.
fork(2) gives you a new process. In the child you run one of the exec(3) functions to replace it with a new executable. The parent can use one of the wait(2) functions to wait for the child to terminate. kill(2) can be used to send a signal to another process.
I've read about fork and from what I understand, the process is cloned but which process? The script itself or the process that launched the script?
For example:
I'm running rTorrent on my machine and when a torrent completes, I have a script run against it. This script fetches data from the web so it takes a few seconds to complete. During this time, my rtorrent process is frozen. So I made the script fork using the following
my $pid = fork();
if ($pid == 0) { blah blah blah; exit 0; }
If I run this script from the CLI, it comes back to the shell within a second while it runs in the background, exactly as I intended. However, when I run it from rTorrent, it seems to be even slower than before. So what exactly was forked? Did the rtorrent process clone itself and my script ran in that, or did my script clone itself? I hope this makes sense.
The fork() function returns TWICE! Once in the parent process, and once in the child process. In general, both processes are IDENTICAL in every way, as if EACH one had just returned from fork(). The only difference is that in one, the return value from fork() is 0, and in the other it is non-zero (the PID of the child process).
So whatever process was running your Perl script (if it is an embedded Perl interpreter inside rTorrent then rTorrent would be the process) would be duplicated at exactly the point that the fork() happened.
I believe I found the problem by looking through rTorrent's source. For some processes, it will read all of the output sent to stdout before continuing. If this is happening to your process, rTorrent will block until you close the stdout process. Because you're forking, your child process shares the same stdout as the parent. Your parent process will exit, but the pipe remains open (because your child process is still running). If you did an strace of rTorrent, I'd bet that it'd be blocked on this read() call while executing your command.
Try closing/redirecting stdout in your perl script before the fork().
The entire process containing the interpreter forks. Fortunately memory is copy-on-write so it doesn't need to copy all the process memory in order to fork. However, things such as file descriptors remain open. This allows child processes to handle them, but may cause issues if they aren't closed appropriately. In general, fork() should not be used in an embedded interpreter except under extreme duress.
To answer the nominal question, since you commented that the accepted answer fails to do so, fork affects the process in which it is called. In your example of rTorrent spawning a Perl process which then calls fork, it is the Perl process which is duplicated, since it was the Perl process which called fork.
In the general case, there is no way for a process to fork any process other than itself. If it were possible to tell another arbitrary process to go fork itself, that would open up no end of security and performance issues.
My advice would be "don't do that".
If the Perl interpreter is embedded within the rtorrent process, you've almost certainly forked an entire rtorrent process, the effects of which are probably ill-defined at best. It's generally a bad idea to play with process-level stuff in an embedded interpreter regardless of language.
There's an excellent chance that some sort of lock is not being properly released, or that threads within the processes are proceeding in unintended and possibly competing ways.
When we create a process using fork the child process will have the copy of the address space.So the child also can use the address space.And it also can access the files which is opened by the parent.We can have the control over the child.To get the complete status of the child we can use wait.
I have an interesting (at least to me) problem: I can't manage to find a way to reliably and portably get information on grandchildren processes in certain cases. I have an application, AllTray, that I am trying to get to work in certain strange cases where its subprocess spawns a child and then dies. AllTray's job is essentially to dock an application to the task tray, which is (usually) specified as a command line for AllTray to invoke (i.e., alltray xterm would start xterm, and manage it in AllTray).
Most GUI software runs just fine under it. It sets the _NET_WM_PID property on its window (or a widget library does) and all's well, because _NET_WM_PID == fork()ed child. However, in some cases (such as when running oowriter, or software written to run under KDE such as K3b), the child process that AllTray runs is a wrapper, be it a shell script (as in OO.o's case) or a strange program that fork()s and exec()s itself and effectively backgrounds itself, since the parent process dies very early.
I had the idea to not reap my child processes, so as to preserve in the process table the parent process ID for my grandchildren, so that I could link them back to me by traversing the family tree from bottom-to-top. That doesn't work, though: once my child process dies and turns into a zombie, the system considers my grandchild process to be an orphan, and init adopts it. This appears to be the case on at least Linux 2.6 and NetBSD; I'd presume it's probably the norm, and POSIX doesn't seem to specify that to be the case, so I was hoping for the opposite.
Since that approach won't work, I thought about using LD_PRELOAD and intercepting my child process' call to fork(), and passing information back to my parent process. However, I'm concerned that won't be as portable as the ideal solution, because different systems have different rules on how the dynamic linker does things like LD_PRELOAD. It won't work for setuid/setgid GUI applications either without the helper library also being setuid or setgid, at least on Linux systems. Generally, it smells like a bad idea to me, and feels quite hackish.
So, I'm hoping that someone has an idea on how to do this, or if the idea of relying on a mechanism like LD_PRELOAD is really the only option I have short of patching kernels (which is not going to happen).
You could investigate the possibility of using process groups to keep track of, well, process groups. A process group is a property (just a number) which you can set before forking, and child processes then inherit it automatically.
AllTray can create a new process group for each application started with it. You can the send signals to all members of the process group. I suppose the most useful signals here would be TERM and KILL, in order to kill an application managed in AllTray.
I'm not sure if there is a convenient way to figure out if all members of the process group have already exited or not. You may have to resort to going through the entire process list and call getpgid for each process to see if there are any left in the process group.
Note that process groups won't work for applications which create new process groups themselves. But that's relatively rare and you probably don't need to worry about such applications.