How do I reliably track child/grandchild processes on a POSIX system? - linux

I have an interesting (at least to me) problem: I can't manage to find a way to reliably and portably get information on grandchildren processes in certain cases. I have an application, AllTray, that I am trying to get to work in certain strange cases where its subprocess spawns a child and then dies. AllTray's job is essentially to dock an application to the task tray, which is (usually) specified as a command line for AllTray to invoke (i.e., alltray xterm would start xterm, and manage it in AllTray).
Most GUI software runs just fine under it. It sets the _NET_WM_PID property on its window (or a widget library does) and all's well, because _NET_WM_PID == fork()ed child. However, in some cases (such as when running oowriter, or software written to run under KDE such as K3b), the child process that AllTray runs is a wrapper, be it a shell script (as in OO.o's case) or a strange program that fork()s and exec()s itself and effectively backgrounds itself, since the parent process dies very early.
I had the idea to not reap my child processes, so as to preserve in the process table the parent process ID for my grandchildren, so that I could link them back to me by traversing the family tree from bottom-to-top. That doesn't work, though: once my child process dies and turns into a zombie, the system considers my grandchild process to be an orphan, and init adopts it. This appears to be the case on at least Linux 2.6 and NetBSD; I'd presume it's probably the norm, and POSIX doesn't seem to specify that to be the case, so I was hoping for the opposite.
Since that approach won't work, I thought about using LD_PRELOAD and intercepting my child process' call to fork(), and passing information back to my parent process. However, I'm concerned that won't be as portable as the ideal solution, because different systems have different rules on how the dynamic linker does things like LD_PRELOAD. It won't work for setuid/setgid GUI applications either without the helper library also being setuid or setgid, at least on Linux systems. Generally, it smells like a bad idea to me, and feels quite hackish.
So, I'm hoping that someone has an idea on how to do this, or if the idea of relying on a mechanism like LD_PRELOAD is really the only option I have short of patching kernels (which is not going to happen).

You could investigate the possibility of using process groups to keep track of, well, process groups. A process group is a property (just a number) which you can set before forking, and child processes then inherit it automatically.
AllTray can create a new process group for each application started with it. You can the send signals to all members of the process group. I suppose the most useful signals here would be TERM and KILL, in order to kill an application managed in AllTray.
I'm not sure if there is a convenient way to figure out if all members of the process group have already exited or not. You may have to resort to going through the entire process list and call getpgid for each process to see if there are any left in the process group.
Note that process groups won't work for applications which create new process groups themselves. But that's relatively rare and you probably don't need to worry about such applications.

Related

How to identify if a long-running process died?

I'm working on a daemon that communicates with several processes. The daemon can't monitor the processes all the time, but it must be able to properly identify if a process dies to release scare resources it holds for it.
The processes can communicate with the daemon, giving it some information at the start, but not vice versa. So the daemon can't just ask a process its identity.
The simplest form would be to use just their PID. But eventually another process could be assigned the same PID without my tool noticing.
A better approach would be to use PID plus the time the process started. A new process with the same PID would have a distinct start time. But I couldn't find a way how to get the process start time in a POSIX way. Using ps or looking at /proc/<pid>/stat seems not portable enough.
A more complicated idea that seems POSIX-compliant would be:
Each process creates a temporary file.
Locks it using flock
Tells my daemon "my identity is connected with this file".
Any time the daemon can check the temporary file. If it's locked, the process is alive. If it's not, the process is dead.
But this seems unnecessarily complicated.
Is there a better, or standard way?
Edit: The daemon must be able to resume after a restart, so it's not possible to keep a persistent connection for each process.
But I couldn't find a way how to get the process start time in a POSIX way.
Try the standard "etime" format specifier: LC_ALL=C ps -eo etime= $PIDS
In fairness, I would probably construct my own table of live processes rather that relying on the process table and elapsed time. That's fundamentally your file-locking approach, though I'd probably aggregate all the lockfiles together in a known place and name them by PID, e.g., /var/run/my-app/8819.lock. Indeed, this might even be retrofitted on to the long-running processes, since file locks on file descriptors can be inherited across exec().
(Of course, if the long-running processes I cared about had a common parent, then I'd rather query the common parent, who can be a reliable authority on which processes are running and which are not.)
The standard way is the unnecessarily complicated one. That' life in a POSIX-compliant environment...
Other methods than the file exist and have various benefits/tradeoffs - most of the "standard" IPC mechanisms would work for this as well - a socket, pipe, message queue, shared memory... Basically pick one mechanism that allows your application to announce to the daemon that it has started (and maybe that it's exiting, for an orderly shutdown). In between, it could send periodic "I'm still here" messages and the daemon could notice when it doesn't get one, or the daemon could poll periodically or something... There's quite a few ways to accomplish what you want, but without knowing more about the exact architecture you're trying to achieve, it's difficult to point at the "one best way"...

Terminate all child process in LInux

I am developing a sandbox on linux. And now i am confused terminating all process in the sandbox.
My sandbox works as follows:
At first only one process run in the sandbox.
Then it can create several child process.
And child process will create their subprocess also.
And parent process may exit at some time before its children exited.
At last sandbox will terminate all the process.
I used to do this by using killall or pkill -u with a unique user attached to the sandbox.But it seems doesn't work on the program which uses fork() fastly.
Then I search for the source code of pkill and realized that pkill is lose of atomicity.
So how could i achieve my goal ?
You could use process groups setpgid(2) and sessions setsid(2), but I don't qualify what you do as a sandbox (in particular because if one of the processes is setuid or change its process group or session itself, you'll lose it; read execve(2) carefully and several times!). Notice that kill(2) with a negative pid kills an entire process group.
Read a good book like Advanced Linux Programming. Consider also using chroot(2).
And explain what and why you really want to do. sandboxing is harder that what you think. See also capabilities(7), credentials(7) and SElinux.

is it possible to run multiple executables in a shell script with the same PID?

I have a simple Shell script in which multiple executables will run sequentially. Every time a new executable starts to run, a new process with a new PID will start. Is it possible to run them with the same PID? I know for a shell script, we can use "source". But I do not know how to handle executables.
In principle, I believe it should be possible, but in practice it would be very complicated and brittle.
The exec family of system calls in Linux allows a process to replace itself with an entirely new process, which holds on to the same PID. The tricky part would be to somehow "return" from the second process back to the first. When exec is called, the OS loads everything it needs to start running the new process, and wipes out every piece of state related to the current process (the one being replaced). And when the new process terminates, the OS releases all resources (including the PID) associated with that process.
So if you really wanted to do this, you would have to hijack how processes terminate in order to restart your original process rather than let the OS clean everything up. How can you do this? Well, execle and execvpe functions allow a program to specify the environment of the new process before starting the process. Since every process depends on libc (or equivalent) to bring up/tear down a process, you should be able to provide a custom libc which would re-start execution of your script, or exec another process. The great difficulty would be hacking such a libc. Additionally, you would have to figure out a good way for your master program to keep state even though the OS wipes away any memory it might have been using when it called exec. You can probably accomplish this with temporary files.
So long story short, don't do it. While it's kind of fun for me to sit here and think about the massive hacks that would be required to pull this off, it would be a huge pain and I'm sure there is a much more elegant solution to whatever problem you are actually trying to solve.
The PID is assigned by OS when shell creates a new process. There is no way to tell the OS to use some specific PID. So it's not possible.

Is it possible to completely manage the life cycle of a process and its forks?

Consider a system that manages user-defined programs:
A program can be anything. Its command line is defined by non-privileged users in some configuration file. It could be /bin/ls, it could be /usr/sbin/apache; the user may specify whatever he is permitted to start.
Each program is run as a non-root user.
Any given user can configure any number of programs.
Each program runs for as long as it wants.
Each program may call fork(), exec() etc.
Each program may set itself as a session leader (ie., setsid()).
The system that starts the programs might not run continuously. It starts a program, then quits.
The action "stop all of program P's processes, including children/forks" must be possible.
The action "find all processes belonging to program P" must be possible.
Here's the question: How can one provide such a system within the Linux process model?
The naive method:
Start program with fork(), exec(), setuid(), etc..
Write the child PID (plus its start timestamp, from /proc/stat, to uniquely and permanently identify it) to a file.
To stop a single process, set SIGTERM to PID.
To find all processes, inspect /proc to build the process hiearchy based on the PID.
This method has a big hole: Any process may fork and break out of its process group. It's not sufficient to look at the process hierarchy. After a program has created new processes, it's not possible to trace their origin back to the original program.
A workaround would be to ensure that each program is started with a unique UID. This is not desirable or particularly workable, since a (human) user may define any number of programs; the system would then have to programmatically create new, unique users for each program.
My only idea so far is to inject a special, reserved environment variable into the program's initial process, ie., run the program with env PROGRAM=myprogram <command line>. The system could then mandate that all processes must inherit their parent's environment. At regular intervals, the system could trawl /proc and forcibly kill any process missing the PROGRAM environment variable.
Are there any secrets in the Linux syscall API that I could use?
(1) The action "stop all of program P's processes, including children/forks" must be possible. (2) The action "find all processes belonging to program P" must be possible.
cgroups implement this, and systemd is perhaps the heaviest user to date to make use of (2) to achieve (1). You can break out of progress groups, but not cgroups.

Faster forking of large processes on Linux?

What's the fastest, best way on modern Linux of achieving the same effect as a fork-execve combo from a large process ?
My problem is that the process forking is ~500MByte big, and a simple benchmarking test achieves only about 50 forks/s from the process (c.f ~1600 forks/s from a minimally sized process) which is too slow for the intended application.
Some googling turns up vfork as having being invented as the solution to this problem... but also warnings about not to use it. Modern Linux seems to have acquired related clone and posix_spawn calls; are these likely to help ? What's the modern replacement for vfork ?
I'm using 64bit Debian Lenny on an i7 (the project could move to Squeeze if posix_spawn would help).
On Linux, you can use posix_spawn(2) with the POSIX_SPAWN_USEVFORK flag to avoid the overhead of copying page tables when forking from a large process.
See Minimizing Memory Usage for Creating Application Subprocesses for a good summary of posix_spawn(2), its advantages and some examples.
To take advantage of vfork(2), make sure you #define _GNU_SOURCE before #include <spawn.h> and then simply posix_spawnattr_setflags(&attr, POSIX_SPAWN_USEVFORK)
I can confirm that this works on Debian Lenny, and provides a massive speed-up when forking from a large process.
benchmarking the various spawns over 1000 runs at 100M RSS
user system total real
fspawn (fork/exec): 0.100000 15.460000 40.570000 ( 41.366389)
pspawn (posix_spawn): 0.010000 0.010000 0.540000 ( 0.970577)
Outcome: I was going to go down the early-spawned helper subprocess route as suggested by other answers here, but then I came across this re using huge page support to improve fork performance.
Having tried it myself using libhugetlbfs to simply make all my app's mallocs allocate huge pages, I'm now getting around 2400 forks/s regardless of the process size (over the range I'm interested in anyway). Amazing.
Did you actually measure how much time forks take? Quoting the page you linked,
Linux never had this problem; because Linux used copy-on-write semantics internally, Linux only copies pages when they changed (actually, there are still some tables that have to be copied; in most circumstances their overhead is not significant)
So the number of forks doesn't really show how big the overhead will be. You should measure the time consumed by forks, and (which is a generic advice) consumed only by the forks you actually perform, not by benchmarking maximum performance.
But if you really figure out that forking a large process is a slow, you may spawn a small ancillary process, pipe master process to its input, and receive commands to exec from it. The small process will fork and exec these commands.
posix_spawn()
This function, as far as I understand, is implemented via fork/exec on desktop systems. However, in embedded systems (particularly, in those without MMU on board), processes are spawned via a syscall, interface to which is posix_spawn or a similar function. Quoting the informative section of POSIX standard describing posix_spawn:
Swapping is generally too slow for a realtime environment.
Dynamic address translation is not available everywhere that POSIX might be useful.
Processes are too useful to simply option out of POSIX whenever it must run without address translation or other MMU services.
Thus, POSIX needs process creation and file execution primitives that can be efficiently implemented without address translation or other MMU services.
I don't think that you will benefit from this function on desktop if your goal is to minimize time consumption.
If you know the number of subprocess ahead of time, it might be reasonable to pre-fork your application on startup then distribute the execv information via a pipe. Alternatively, if there is some sort of "lull" in your program it might be reasonable to fork ahead of time a subprocess or two for quick turnaround at a later time. Neither of these options would directly solve the problem but if either approach is suitable to your app, it might allow you to side-step the issue.
I've come across this blog post: http://blog.famzah.net/2009/11/20/a-much-faster-popen-and-system-implementation-for-linux/
pid = clone(fn, stack_aligned, CLONE_VM | SIGCHLD, arg);
Excerpt:
The system call clone() comes to the rescue. Using clone() we create a
child process which has the following features:
The child runs in the same memory space as the parent. This means that no memory structures are copied when the child process is
created. As a result of this, any change to any non-stack variable
made by the child is visible by the parent process. This is similar to
threads, and therefore completely different from fork(), and also very
dangerous – we don’t want the child to mess up the parent.
The child starts from an entry function which is being called right after the child was created. This is like threads, and unlike fork().
The child has a separate stack space which is similar to threads and fork(), but entirely different to vfork().
The most important: This thread-like child process can call exec().
In a nutshell, by calling clone in the following way, we create a
child process which is very similar to a thread but still can call
exec():
However I think it may still be subject to the setuid problem:
http://ewontfix.com/7/ "setuid and vfork"
Now we get to the worst of it. Threads and vfork allow you to get in a
situation where two processes are both sharing memory space and
running at the same time. Now, what happens if another thread in the
parent calls setuid (or any other privilege-affecting function)? You
end up with two processes with different privilege levels running in a
shared address space. And this is A Bad Thing.
Consider for example a multi-threaded server daemon, running initially
as root, that’s using posix_spawn, implemented naively with vfork, to
run an external command. It doesn’t care if this command runs as root
or with low privileges, since it’s a fixed command line with fixed
environment and can’t do anything harmful. (As a stupid example, let’s
say it’s running date as an external command because the programmer
couldn’t figure out how to use strftime.)
Since it doesn’t care, it calls setuid in another thread without any
synchronization against running the external program, with the intent
to drop down to a normal user and execute user-provided code (perhaps
a script or dlopen-obtained module) as that user. Unfortunately, it
just gave that user permission to mmap new code over top of the
running posix_spawn code, or to change the strings posix_spawn is
passing to exec in the child. Whoops.

Resources