Processes are terminated for one of three reasons: they've reached end of execution (nominal case), they contain an uncaught exception (synchronous crash), or they've received some signal that they are not handling (asynchronous crash). Within the design of a program, we can instrument ways of at least detecting each of these (e.g., catch statements, signal handlers, etc.).
Suppose I wanted to design a program that can monitor another program's execution in Linux. I can easily tell if the program terminated by noticing its PID disappearing from /proc, but I won't know why. Is there a way to observe the target program to determine the cause of termination?
The main limitation is the amount of details you need. At the operating system level, you can basically only count for exit code or signal that killed the process.
Depending on your restrictions, there are multiple options:
wait - allows for quick notification (blocking call or signal), but only works for immediate children.
ptrace, directly (rather tricky) or via the strace command; it has limitations, e.g. a single process can only be ptraced by one process at a time, but it allows to specify a list of syscalls to monitor, so it does not need to be as slow as default invocation of strace.
BSD Process Accounting. Typically requires root privileges to access, and definitely requires them to turn it on (it's global). Once it's running, you can effectively watch a file which grows an entry for every finishing process, including exit code / signal, either programmatically (the ac_exitcode field in the acct struct) or via the lastcomm command (c.f. http://man7.org/linux/man-pages/man1/lastcomm.1.html).
Processes are terminated for one of three reasons: (a) they've reached end of execution (nominal case), (b) they contain an uncaught exception (synchronous crash), or (c) they've received some signal that they are not handling (asynchronous crash).
I'm not sure what (a) should mean -- even if a program returns from its main() function, it's still terminating by explicitly calling the _exit(2) (or exit_group(2)) system call (from the C runtime code that called main() in the first place). If it doesn't call _exit(), it will crash.
Also, I don't see what difference is between (b) and (c): they would receive a signal in both cases -- which they could either catch, block or ignore (except for SIGKILL -- or SIGSTOP, but the latter won't terminate the process).
Suppose I wanted to design a program that can monitor another program's execution in Linux.
Then, you should imitate what strace(1) or gdb(1) are doing: use ptrace(PTRACE_ATTACH), etc. For instance, this will only monitor the exit of a process, not all its system calls:
strace -e trace=none -p PID
strace -e trace=exit,exit_group -p PID
The PTRACE_O_TRACEEXIT option of ptrace(2) is interesting:
The tracee is stopped early during process exit, when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notification is done after the process is finished exiting.
Linux also has the proc connector netlink interface, which allows you to monitor processes without stopping or affecting them in any way. It only works as root, though. A sample program using the proc connector interface is forkstat(1):
forkstat -e exit # will show all exiting processes
stdbuf -oL forkstat -e exit | grep -m1 PID # will only show when PID exits
Related
Process A opened && mmaped thousand of files when running. Then killl -9 <pid of process A> is issued. Then I have a question about the sequence of below two events.
a) /proc/<pid of process A> cannot be accessed.
b) all files opened by process A are closed.
More background about the question:
Process A is a multi-thread background service. It is started by cmd ./process_A args1 arg2 arg3.
There is also a watchdog process which checked whether process A is still alive periodically(every 1 second). If process A is dead, then restart it. The way watchdog checks process A is as below.
1) collect all numerical subdir under /proc/
2) compares /proc/<all-pids>/cmdline with cmdline of process A. If these is a /proc/<some-pid>/cmdline matches, then process A is alive and do nothing, otherwise restart process A.
process A will do below stuff when doing initialization.
1) open fileA
2) flock fileA
3) mmap fileA into memory
4) close fileA
process A will mmap thousand of files after initialization.
after several minutes, kill -9 <pid of process A> is issued.
watchdog detect the death of process A, restart it. But sometimes process A stuck at step 2 flock fileA. After some debugging, we found that unlock of fileA is executed when process A is killed. But sometimes this event will happen after step 2 flock fileA of new process.
So we guess the way to check process alive by monitor /proc/<pid of process A>
is not correct.
then kill -9 is issued
This is bad habit. You'll better send a SIGTERM first. Because well behaved processes and well designed programs can catch it (and exit nicely and properly when getting a SIGTERM...). In some cases, I even recommend: sending SIGTERM. Wait two or three seconds. sending SIGQUIT. Wait two seconds. At last, send a SIGKILL signal (for those bad programs who have not been written properly or are misbehaving). A few seconds later, you could send a SIGKILL. Read signal(7) and signal-safety(7). In multi-threaded, but Linux specific, programs, you might use signalfd(2) or the pipe(7) to self trick (well explained in Qt documentation, but not Qt specific).
If your Linux system is systemd based, you could imagine your program-A is started with systemd facilities. Then you'll use systemd facilities to "communicate" with it. In some ways (I don't know the details), systemd is making signals almost obsolete. Notice that signals are not multi-thread friendly and have been designed, in the previous century, for single-thread processes.
we guess the way to check process alive by monitor /proc/ is not correct.
The usual (and faster, and "atomic" enough) way to detect the existence of a process (on which you have enough privileges, e.g. which runs with your uid/gid) is to use kill(2) with a signal number (the second argument to kill) of 0. To quote that manpage:
If sig is 0, then no signal is sent, but existence and permission
checks are still performed; this can be used to check for the
existence of a process ID or process group ID that the caller is
permitted to signal.
Of course, that other process can still terminate before any further interaction with it. Because Linux has preemptive scheduling.
You watchdog process should better use kill(pid-of-process-A, 0) to check existence and liveliness of that process-A. Using /proc/pid-of-process-A/ is not the correct way for that.
And whatever you code, that process-A could disappear asynchronously (in particular, if it has some bug that gives a segmentation fault). When a process terminates (even with a segmentation fault) the kernel is acting on its file locks (and "releases" them).
Don't scan /proc/PID to find out if a specific process has terminated. There are lots of better ways to do that, such as having your watchdog program actually launch the server program and wait for it to terminate.
Or, have the watchdog listen on a TCP socket, and have the server process connect to that and send its PID. If either end dies, the other can notice the connect was closed (hint: send a heartbeat packet every so often, to a frozen peer). If the watchdog receives a connection from another server while the first is still running, it can decide to allow it or tell one of the instances to shut down (via TCP or kill()).
I am doing strace using time strace -p 54545 -fy 2>&1 | grep "xyz". I am looking for all the system calls happening to file xyz.
I killed strace by continuously pressing Ctrl-C. I saw a write system call fail with errno 4 in process 54545.
I am not sending any signal to process 54545. Can strace cause this?
In order to detach your process, the kernel is apparently waiting for a point at which it can get to a valid place in the process's in-kernel state to execute the detach.
Write is one of the system calls that the kernel can choose to interrupt if it likes. That is, Unix processes are supposed to handle an EINTR from the write system call by retrying it. So, in order to get to a point in kernel execution where it can process the detach request, the kernel chooses to do this
Sometimes whenever I write a program in Linux and it crashes due to a bug of some sort, it will become an uninterruptible process and continue running forever until I restart my computer (even if I log out). My questions are:
What causes a process to become uninterruptible?
How do I stop that from happening?
This is probably a dumb question, but is there any way to interrupt it without restarting my computer?
An uninterruptible process is a process which happens to be in a system call (kernel function) that cannot be interrupted by a signal.
To understand what that means, you need to understand the concept of an interruptible system call. The classic example is read(). This is a system call that can take a long time (seconds) since it can potentially involve spinning up a hard drive, or moving heads. During most of this time, the process will be sleeping, blocking on the hardware.
While the process is sleeping in the system call, it can receive a Unix asynchronous signal (say, SIGTERM), then the following happens:
The system call exits prematurely, and is set up to return -EINTR to user space.
The signal handler is executed.
If the process is still running, it gets the return value from the system call, and it can make the same call again.
Returning early from the system call enables the user space code to immediately alter its behavior in response to the signal. For example, terminating cleanly in reaction to SIGINT or SIGTERM.
On the other hand, some system calls are not allowed to be interrupted in this way. If the system calls stalls for some reason, the process can indefinitely remains in this unkillable state.
LWN ran a nice article that touched this topic in July.
To answer the original question:
How to prevent this from happening: figure out which driver is causing you trouble, and either stop using, or become a kernel hacker and fix it.
How to kill an uninterruptible process without rebooting: somehow make the system call terminate. Frequently the most effective manner to do this without hitting the power switch is to pull the power cord. You can also become a kernel hacker and make the driver use TASK_KILLABLE, as explained in the LWN article.
When a process is on user mode, it can be interrupted at any time (switching to kernel mode). When the kernel returns to user mode, it checks if there are any signals pending (including the ones which are used to kill the process, such as SIGTERM and SIGKILL). This means a process can be killed only on return to user mode.
The reason a process cannot be killed in kernel mode is that it could potentially corrupt the kernel structures used by all the other processes in the same machine (the same way killing a thread can potentially corrupt data structures used by other threads in the same process).
When the kernel needs to do something which could take a long time (waiting on a pipe written by another process or waiting for the hardware to do something, for instance), it sleeps by marking itself as sleeping and calling the scheduler to switch to another process (if there is no non-sleeping process, it switches to a "dummy" process which tells the cpu to slow down a bit and sits in a loop — the idle loop).
If a signal is sent to a sleeping process, it has to be woken up before it will return to user space and thus process the pending signal. Here we have the difference between the two main types of sleep:
TASK_INTERRUPTIBLE, the interruptible sleep. If a task is marked with this flag, it is sleeping, but can be woken by signals. This means the code which marked the task as sleeping is expecting a possible signal, and after it wakes up will check for it and return from the system call. After the signal is handled, the system call can potentially be automatically restarted (and I won't go into details on how that works).
TASK_UNINTERRUPTIBLE, the uninterruptible sleep. If a task is marked with this flag, it is not expecting to be woken up by anything other than whatever it is waiting for, either because it cannot easily be restarted, or because programs are expecting the system call to be atomic. This can also be used for sleeps known to be very short.
TASK_KILLABLE (mentioned in the LWN article linked to by ddaa's answer) is a new variant.
This answers your first question. As to your second question: you can't avoid uninterruptible sleeps, they are a normal thing (it happens, for instance, every time a process reads/writes from/to the disk); however, they should last only a fraction of a second. If they last much longer, it usually means a hardware problem (or a device driver problem, which looks the same to the kernel), where the device driver is waiting for the hardware to do something which will never happen. It can also mean you are using NFS and the NFS server is down (it is waiting for the server to recover; you can also use the "intr" option to avoid the problem).
Finally, the reason you cannot recover is the same reason the kernel waits until return to user mode to deliver a signal or kill the process: it would potentially corrupt the kernel's data structures (code waiting on an interruptible sleep can receive an error which tells it to return to user space, where the process can be killed; code waiting on an uninterruptible sleep is not expecting any error).
Uninterruptable processes are USUALLY waiting for I/O following a page fault.
Consider this:
The thread tries to access a page which is not in core (either an executable which is demand-loaded, a page of anonymous memory which has been swapped out, or a mmap()'d file which is demand loaded, which are much the same thing)
The kernel is now (trying to) load it in
The process can't continue until the page is available.
The process/task cannot be interrupted in this state, because it can't handle any signals; if it did, another page fault would happen and it would be back where it was.
When I say "process", I really mean "task", which under Linux (2.6) roughly translates to "thread" which may or may not have an individual "thread group" entry in /proc
In some cases, it may be waiting for a long time. A typical example of this would be where the executable or mmap'd file is on a network filesystem where the server has failed. If the I/O eventually succeeds, the task will continue. If it eventually fails, the task will generally get a SIGBUS or something.
To your 3rd question:
I think you can kill the uninterruptable processes by running
sudo kill -HUP 1.
It will restart init without ending the running processes and after running it, my uninterruptable processes were gone.
If you are talking about a "zombie" process (which is designated as "zombie" in ps output), then this is a harmless record in the process list waiting for someone to collect its return code and it could be safely ignored.
Could you please describe what and "uninterruptable process" is for you? Does it survives the "kill -9 " and happily chugs along? If that is the case, then it's stuck on some syscall, which is stuck in some driver, and you are stuck with this process till reboot (and sometimes it's better to reboot soon) or unloading of relevant driver (which is unlikely to happen). You could try to use "strace" to find out where your process is stuck and avoid it in the future.
I need to watch for a process with a known PID in Linux. Once it is terminated want to execute a command with reason of termination.
Questions
How to subscribe to process health rather than polling (e.g. watch
command)?
Where to inject the event handler in OS' user space?
How to detect the termination/failure reason inside handler?
Note
The process I intend to keep a tab on is not forked as a child process of some parent through which it can be monitored.
The process type is generic (good number of them are daemons)
The only way to get that kind of control over another process, is to use ptrace(2) to trace the target process. You would use ptrace(PTRACE_ATTACH, pid) to attach to the process, after which you effectively become the target process's parent (and can use wait or more ptrace calls to figure out what the process is doing).
In am writing an SDK in Go that integrators will communicate with via a local socket connection.
From the integrating application I need a way to start the SDK as a process but more importantly, I need to be able to cancel that process when the main application is closing too.
This question is language agnostic (I think) as I think the challenge is linux related. i.e. How to start a program and cancel it at a later stage.
Some possible approaches:
I am thinking that it's a case of starting the program via exec, getting it's PID or some ID then using that to kill later. Sudo may be required to do this, which is not ideal. Also, not good practice as you will be effectively force closing the SDK, offering no time for cleanup.
Start the program via any means. Once ready to close, just send a "shutdown" command via the SDK API which will allow the SDK to cleanup, manage state then exit the application.
What is best practice for this please?
Assuming you're using Linux or similar Unix:
You are on the right track. You won't need sudo. The comments thus far are pointing in the right direction, but not spelling out the details.
See section 2 of the manual pages (man 2 ...) for details on the functions mentioned here. They are documented for calling from C. I don't have experience with Go to help determine how to use them there.
The integrator application will be called the "parent" process. The SDK-as-a-process will be called the "child" process. A process creates a child and becomes its parent by calling fork(). The new process is a duplicate of the parent, running the same code, and having for the most part all the same state (data in memory). But fork() returns different values to parent and child, so each can determine its role in the relationship. This includes informing the parent of the process identifier (pid) of the child. Hang on to this value. Then, the child uses exec() to execute a different program within the existing process, i.e. your SDK binary. An alternative to fork-then-exec is posix_spawn(), which has rather involved parameters (but gives greater control if you need it).
Designing the child to shutdown in response to a signal, rather than a command through the API, will allow processes other than the parent to initiate clean shutdown in standard fashion. For example, this might be useful for the administrator or user; it enables sending the shutdown signal from the shell command-line or script.
The child installs a signal handler function, that will be called when the child process receives a signal, by calling signal() (or the more complex sigaction() recommended for its portability). There are different signals that can be sent/received, identified by different integer values (and also given names like SIGTERM). You indicate which you're interested in receiving when calling signal(). When your signal handler function is invoked, you've received the signal, and can initiate clean shutdown.
When the parent wants the child to shut down cleanly, the parent sends a signal to the child using the unfortunately named kill(). Unfortunately named because signals can be used for other purposes. Anyway, you pass to kill() the pid (returned by fork()) and the specific signal (e.g. SIGTERM) you want to send.
The parent can also determine when the child has completely shut down by calling waitpid(), again passing the pid returned by fork(); or alternately by registering to receive signal SIGCHLD. Register to receive SIGCHLD before fork()/exec() or you might miss the signal.
Actually, it's important that you do call waitpid(), optionally after receiving SIGCHLD, in order to deallocate a resource holding the child process's exit status, so the OS can cleanup that last remnant of the process. Failing to do so keeps the child as a "zombie" process, unable to be fully reclaimed. Too many zombies and the OS will be unable to launch new processes.
If a process refuses to shut down cleanly or as quickly as you require, you may force it to quit (without executing its cleanup code) by sending the signal SIGKILL.
There are variants of exec(), waitpid() and posix_spawn(), with different names and behaviors, mentioned in their man pages.