kill -9 and production application - linux

Which problem can cause kill -9 in production application (in linux to be exact)?
I have application which do some periodical work, stopping these takes long time, and I don't care if some jobs will be aborted - work can be finished by new processes. So can I use kill -9 just to stop it immediately or this can cause serious OS problems?
For example, Unicorn, uses it as normal working procedure:
When your application goes awry, a BOFH can just "kill -9" the runaway worker process without worrying about tearing all clients down, just one.
But this article claims:
The -9 (or KILL) argument to kill(1) should never be used on Unix systems
PS: I understand that kill -9 cannot be handled by application, but I know that for may application it doesnt cause any problems, I just intrested can it cause some problems on OS level? shared memory segments active, lingering sockets sounds dangerous to me.

kill -9 doesn't give an application a chance to shut down cleanly.
Normally an application can catch a SIGINT/SIGTERM and shut down cleanly (close files, save data etc.). An application can't catch a SIGKILL (which occurs with a kill -9) and so it can't do any of this (optional) cleanup.
A better approach is to use a standard kill, and if the application remains unresponsive, then use kill -9.

kill -9 won't cause any "serious OS problems". But the process will stop immediately, which means it might leave data in an odd state.

It depends what kind of application it is.
Something like a database may either lose data (if it does not write all its data to a persistent transaction log at once), or take longer to start up next time, or both.
Although Crash-only is a good principle, few application currently conform to it.
For example, the mysql database is not "crash only" and killing it with a kill -9 will result in either significantly longer startup time (than a clean shutdown), data loss, or both, depending on the settings (and to some extent, luck).
On the other hand, Cassandra actually encourages the use of kill -9 as a shutdown mechanism; it supports nothing else.

The KILL signal cannot be caught by the application. If the application is in the middle of writing some complex data structure to disk when you kill it, the structure may be only half-written, resulting in a corrupted data file. it is usually best to implement some other signal such as USER1 as the "stop" signal, as this can be caught and allows the application to shut down in a controlled manner.

Related

when will /proc/<pid> be removed?

Process A opened && mmaped thousand of files when running. Then killl -9 <pid of process A> is issued. Then I have a question about the sequence of below two events.
a) /proc/<pid of process A> cannot be accessed.
b) all files opened by process A are closed.
More background about the question:
Process A is a multi-thread background service. It is started by cmd ./process_A args1 arg2 arg3.
There is also a watchdog process which checked whether process A is still alive periodically(every 1 second). If process A is dead, then restart it. The way watchdog checks process A is as below.
1) collect all numerical subdir under /proc/
2) compares /proc/<all-pids>/cmdline with cmdline of process A. If these is a /proc/<some-pid>/cmdline matches, then process A is alive and do nothing, otherwise restart process A.
process A will do below stuff when doing initialization.
1) open fileA
2) flock fileA
3) mmap fileA into memory
4) close fileA
process A will mmap thousand of files after initialization.
after several minutes, kill -9 <pid of process A> is issued.
watchdog detect the death of process A, restart it. But sometimes process A stuck at step 2 flock fileA. After some debugging, we found that unlock of fileA is executed when process A is killed. But sometimes this event will happen after step 2 flock fileA of new process.
So we guess the way to check process alive by monitor /proc/<pid of process A>
is not correct.
then kill -9 is issued
This is bad habit. You'll better send a SIGTERM first. Because well behaved processes and well designed programs can catch it (and exit nicely and properly when getting a SIGTERM...). In some cases, I even recommend: sending SIGTERM. Wait two or three seconds. sending SIGQUIT. Wait two seconds. At last, send a SIGKILL signal (for those bad programs who have not been written properly or are misbehaving). A few seconds later, you could send a SIGKILL. Read signal(7) and signal-safety(7). In multi-threaded, but Linux specific, programs, you might use signalfd(2) or the pipe(7) to self trick (well explained in Qt documentation, but not Qt specific).
If your Linux system is systemd based, you could imagine your program-A is started with systemd facilities. Then you'll use systemd facilities to "communicate" with it. In some ways (I don't know the details), systemd is making signals almost obsolete. Notice that signals are not multi-thread friendly and have been designed, in the previous century, for single-thread processes.
we guess the way to check process alive by monitor /proc/ is not correct.
The usual (and faster, and "atomic" enough) way to detect the existence of a process (on which you have enough privileges, e.g. which runs with your uid/gid) is to use kill(2) with a signal number (the second argument to kill) of 0. To quote that manpage:
If sig is 0, then no signal is sent, but existence and permission
checks are still performed; this can be used to check for the
existence of a process ID or process group ID that the caller is
permitted to signal.
Of course, that other process can still terminate before any further interaction with it. Because Linux has preemptive scheduling.
You watchdog process should better use kill(pid-of-process-A, 0) to check existence and liveliness of that process-A. Using /proc/pid-of-process-A/ is not the correct way for that.
And whatever you code, that process-A could disappear asynchronously (in particular, if it has some bug that gives a segmentation fault). When a process terminates (even with a segmentation fault) the kernel is acting on its file locks (and "releases" them).
Don't scan /proc/PID to find out if a specific process has terminated. There are lots of better ways to do that, such as having your watchdog program actually launch the server program and wait for it to terminate.
Or, have the watchdog listen on a TCP socket, and have the server process connect to that and send its PID. If either end dies, the other can notice the connect was closed (hint: send a heartbeat packet every so often, to a frozen peer). If the watchdog receives a connection from another server while the first is still running, it can decide to allow it or tell one of the instances to shut down (via TCP or kill()).

Why linux process with status 'D' can be killed ? [duplicate]

Sometimes whenever I write a program in Linux and it crashes due to a bug of some sort, it will become an uninterruptible process and continue running forever until I restart my computer (even if I log out). My questions are:
What causes a process to become uninterruptible?
How do I stop that from happening?
This is probably a dumb question, but is there any way to interrupt it without restarting my computer?
An uninterruptible process is a process which happens to be in a system call (kernel function) that cannot be interrupted by a signal.
To understand what that means, you need to understand the concept of an interruptible system call. The classic example is read(). This is a system call that can take a long time (seconds) since it can potentially involve spinning up a hard drive, or moving heads. During most of this time, the process will be sleeping, blocking on the hardware.
While the process is sleeping in the system call, it can receive a Unix asynchronous signal (say, SIGTERM), then the following happens:
The system call exits prematurely, and is set up to return -EINTR to user space.
The signal handler is executed.
If the process is still running, it gets the return value from the system call, and it can make the same call again.
Returning early from the system call enables the user space code to immediately alter its behavior in response to the signal. For example, terminating cleanly in reaction to SIGINT or SIGTERM.
On the other hand, some system calls are not allowed to be interrupted in this way. If the system calls stalls for some reason, the process can indefinitely remains in this unkillable state.
LWN ran a nice article that touched this topic in July.
To answer the original question:
How to prevent this from happening: figure out which driver is causing you trouble, and either stop using, or become a kernel hacker and fix it.
How to kill an uninterruptible process without rebooting: somehow make the system call terminate. Frequently the most effective manner to do this without hitting the power switch is to pull the power cord. You can also become a kernel hacker and make the driver use TASK_KILLABLE, as explained in the LWN article.
When a process is on user mode, it can be interrupted at any time (switching to kernel mode). When the kernel returns to user mode, it checks if there are any signals pending (including the ones which are used to kill the process, such as SIGTERM and SIGKILL). This means a process can be killed only on return to user mode.
The reason a process cannot be killed in kernel mode is that it could potentially corrupt the kernel structures used by all the other processes in the same machine (the same way killing a thread can potentially corrupt data structures used by other threads in the same process).
When the kernel needs to do something which could take a long time (waiting on a pipe written by another process or waiting for the hardware to do something, for instance), it sleeps by marking itself as sleeping and calling the scheduler to switch to another process (if there is no non-sleeping process, it switches to a "dummy" process which tells the cpu to slow down a bit and sits in a loop — the idle loop).
If a signal is sent to a sleeping process, it has to be woken up before it will return to user space and thus process the pending signal. Here we have the difference between the two main types of sleep:
TASK_INTERRUPTIBLE, the interruptible sleep. If a task is marked with this flag, it is sleeping, but can be woken by signals. This means the code which marked the task as sleeping is expecting a possible signal, and after it wakes up will check for it and return from the system call. After the signal is handled, the system call can potentially be automatically restarted (and I won't go into details on how that works).
TASK_UNINTERRUPTIBLE, the uninterruptible sleep. If a task is marked with this flag, it is not expecting to be woken up by anything other than whatever it is waiting for, either because it cannot easily be restarted, or because programs are expecting the system call to be atomic. This can also be used for sleeps known to be very short.
TASK_KILLABLE (mentioned in the LWN article linked to by ddaa's answer) is a new variant.
This answers your first question. As to your second question: you can't avoid uninterruptible sleeps, they are a normal thing (it happens, for instance, every time a process reads/writes from/to the disk); however, they should last only a fraction of a second. If they last much longer, it usually means a hardware problem (or a device driver problem, which looks the same to the kernel), where the device driver is waiting for the hardware to do something which will never happen. It can also mean you are using NFS and the NFS server is down (it is waiting for the server to recover; you can also use the "intr" option to avoid the problem).
Finally, the reason you cannot recover is the same reason the kernel waits until return to user mode to deliver a signal or kill the process: it would potentially corrupt the kernel's data structures (code waiting on an interruptible sleep can receive an error which tells it to return to user space, where the process can be killed; code waiting on an uninterruptible sleep is not expecting any error).
Uninterruptable processes are USUALLY waiting for I/O following a page fault.
Consider this:
The thread tries to access a page which is not in core (either an executable which is demand-loaded, a page of anonymous memory which has been swapped out, or a mmap()'d file which is demand loaded, which are much the same thing)
The kernel is now (trying to) load it in
The process can't continue until the page is available.
The process/task cannot be interrupted in this state, because it can't handle any signals; if it did, another page fault would happen and it would be back where it was.
When I say "process", I really mean "task", which under Linux (2.6) roughly translates to "thread" which may or may not have an individual "thread group" entry in /proc
In some cases, it may be waiting for a long time. A typical example of this would be where the executable or mmap'd file is on a network filesystem where the server has failed. If the I/O eventually succeeds, the task will continue. If it eventually fails, the task will generally get a SIGBUS or something.
To your 3rd question:
I think you can kill the uninterruptable processes by running
sudo kill -HUP 1.
It will restart init without ending the running processes and after running it, my uninterruptable processes were gone.
If you are talking about a "zombie" process (which is designated as "zombie" in ps output), then this is a harmless record in the process list waiting for someone to collect its return code and it could be safely ignored.
Could you please describe what and "uninterruptable process" is for you? Does it survives the "kill -9 " and happily chugs along? If that is the case, then it's stuck on some syscall, which is stuck in some driver, and you are stuck with this process till reboot (and sometimes it's better to reboot soon) or unloading of relevant driver (which is unlikely to happen). You could try to use "strace" to find out where your process is stuck and avoid it in the future.

How to identify if a long-running process died?

I'm working on a daemon that communicates with several processes. The daemon can't monitor the processes all the time, but it must be able to properly identify if a process dies to release scare resources it holds for it.
The processes can communicate with the daemon, giving it some information at the start, but not vice versa. So the daemon can't just ask a process its identity.
The simplest form would be to use just their PID. But eventually another process could be assigned the same PID without my tool noticing.
A better approach would be to use PID plus the time the process started. A new process with the same PID would have a distinct start time. But I couldn't find a way how to get the process start time in a POSIX way. Using ps or looking at /proc/<pid>/stat seems not portable enough.
A more complicated idea that seems POSIX-compliant would be:
Each process creates a temporary file.
Locks it using flock
Tells my daemon "my identity is connected with this file".
Any time the daemon can check the temporary file. If it's locked, the process is alive. If it's not, the process is dead.
But this seems unnecessarily complicated.
Is there a better, or standard way?
Edit: The daemon must be able to resume after a restart, so it's not possible to keep a persistent connection for each process.
But I couldn't find a way how to get the process start time in a POSIX way.
Try the standard "etime" format specifier: LC_ALL=C ps -eo etime= $PIDS
In fairness, I would probably construct my own table of live processes rather that relying on the process table and elapsed time. That's fundamentally your file-locking approach, though I'd probably aggregate all the lockfiles together in a known place and name them by PID, e.g., /var/run/my-app/8819.lock. Indeed, this might even be retrofitted on to the long-running processes, since file locks on file descriptors can be inherited across exec().
(Of course, if the long-running processes I cared about had a common parent, then I'd rather query the common parent, who can be a reliable authority on which processes are running and which are not.)
The standard way is the unnecessarily complicated one. That' life in a POSIX-compliant environment...
Other methods than the file exist and have various benefits/tradeoffs - most of the "standard" IPC mechanisms would work for this as well - a socket, pipe, message queue, shared memory... Basically pick one mechanism that allows your application to announce to the daemon that it has started (and maybe that it's exiting, for an orderly shutdown). In between, it could send periodic "I'm still here" messages and the daemon could notice when it doesn't get one, or the daemon could poll periodically or something... There's quite a few ways to accomplish what you want, but without knowing more about the exact architecture you're trying to achieve, it's difficult to point at the "one best way"...

Health check for application

I wish to know what are the methods exist to check the Health of a process. Considering that on a system
10000 process are running and you have to make sure that in case any of these process goes down we need to make the process UP.
Use the Process ID (PID) and poll whether the process is still alive or is dead periodically; and if it's dead, then revive it.
However, if you have 10000 process, you will probably hit the OS's process limit first. I suggest redesigning your program so you don't need that much processes in the first place.
Re-spawning processes that go down is usually handled by having specific launcher programs to exec() the program and wait for a SIGCHILD to indicate the child process ended.
For boot time applications (servers etc) daemons like upstart can do this for you automatically.
While others are pointing out that applications already exists (which you really should use unless you have a clear reason not to) I'll throw out a random idea for a custom solution.
If you control all N processes then make them all have one shared memory area N bits large (so, 10000 processes ~ 1KB, not bad). When starting each process give it a number, i, ranging from 0 to N. Every T seconds have each process will set bit i in the shared memory to 1. A monitoring process can check that all N bits are 1 every k*T seconds, resetting them all to 0 in the process.
This is still O(n), which you won't avoid, but the primitives are all really fast and should scale fine up to the OS thread limit.
An alternate idea for obtaining i would be just to use the PID, but then the shared memory will have to be larger (probably will still be OK though; for example, the Linux PID range is small).
there is an utility called monit which does what you are looking for. But it is for certain important processes in Linux.. all 10000 processes are important !!!

What is an uninterruptible process?

Sometimes whenever I write a program in Linux and it crashes due to a bug of some sort, it will become an uninterruptible process and continue running forever until I restart my computer (even if I log out). My questions are:
What causes a process to become uninterruptible?
How do I stop that from happening?
This is probably a dumb question, but is there any way to interrupt it without restarting my computer?
An uninterruptible process is a process which happens to be in a system call (kernel function) that cannot be interrupted by a signal.
To understand what that means, you need to understand the concept of an interruptible system call. The classic example is read(). This is a system call that can take a long time (seconds) since it can potentially involve spinning up a hard drive, or moving heads. During most of this time, the process will be sleeping, blocking on the hardware.
While the process is sleeping in the system call, it can receive a Unix asynchronous signal (say, SIGTERM), then the following happens:
The system call exits prematurely, and is set up to return -EINTR to user space.
The signal handler is executed.
If the process is still running, it gets the return value from the system call, and it can make the same call again.
Returning early from the system call enables the user space code to immediately alter its behavior in response to the signal. For example, terminating cleanly in reaction to SIGINT or SIGTERM.
On the other hand, some system calls are not allowed to be interrupted in this way. If the system calls stalls for some reason, the process can indefinitely remains in this unkillable state.
LWN ran a nice article that touched this topic in July.
To answer the original question:
How to prevent this from happening: figure out which driver is causing you trouble, and either stop using, or become a kernel hacker and fix it.
How to kill an uninterruptible process without rebooting: somehow make the system call terminate. Frequently the most effective manner to do this without hitting the power switch is to pull the power cord. You can also become a kernel hacker and make the driver use TASK_KILLABLE, as explained in the LWN article.
When a process is on user mode, it can be interrupted at any time (switching to kernel mode). When the kernel returns to user mode, it checks if there are any signals pending (including the ones which are used to kill the process, such as SIGTERM and SIGKILL). This means a process can be killed only on return to user mode.
The reason a process cannot be killed in kernel mode is that it could potentially corrupt the kernel structures used by all the other processes in the same machine (the same way killing a thread can potentially corrupt data structures used by other threads in the same process).
When the kernel needs to do something which could take a long time (waiting on a pipe written by another process or waiting for the hardware to do something, for instance), it sleeps by marking itself as sleeping and calling the scheduler to switch to another process (if there is no non-sleeping process, it switches to a "dummy" process which tells the cpu to slow down a bit and sits in a loop — the idle loop).
If a signal is sent to a sleeping process, it has to be woken up before it will return to user space and thus process the pending signal. Here we have the difference between the two main types of sleep:
TASK_INTERRUPTIBLE, the interruptible sleep. If a task is marked with this flag, it is sleeping, but can be woken by signals. This means the code which marked the task as sleeping is expecting a possible signal, and after it wakes up will check for it and return from the system call. After the signal is handled, the system call can potentially be automatically restarted (and I won't go into details on how that works).
TASK_UNINTERRUPTIBLE, the uninterruptible sleep. If a task is marked with this flag, it is not expecting to be woken up by anything other than whatever it is waiting for, either because it cannot easily be restarted, or because programs are expecting the system call to be atomic. This can also be used for sleeps known to be very short.
TASK_KILLABLE (mentioned in the LWN article linked to by ddaa's answer) is a new variant.
This answers your first question. As to your second question: you can't avoid uninterruptible sleeps, they are a normal thing (it happens, for instance, every time a process reads/writes from/to the disk); however, they should last only a fraction of a second. If they last much longer, it usually means a hardware problem (or a device driver problem, which looks the same to the kernel), where the device driver is waiting for the hardware to do something which will never happen. It can also mean you are using NFS and the NFS server is down (it is waiting for the server to recover; you can also use the "intr" option to avoid the problem).
Finally, the reason you cannot recover is the same reason the kernel waits until return to user mode to deliver a signal or kill the process: it would potentially corrupt the kernel's data structures (code waiting on an interruptible sleep can receive an error which tells it to return to user space, where the process can be killed; code waiting on an uninterruptible sleep is not expecting any error).
Uninterruptable processes are USUALLY waiting for I/O following a page fault.
Consider this:
The thread tries to access a page which is not in core (either an executable which is demand-loaded, a page of anonymous memory which has been swapped out, or a mmap()'d file which is demand loaded, which are much the same thing)
The kernel is now (trying to) load it in
The process can't continue until the page is available.
The process/task cannot be interrupted in this state, because it can't handle any signals; if it did, another page fault would happen and it would be back where it was.
When I say "process", I really mean "task", which under Linux (2.6) roughly translates to "thread" which may or may not have an individual "thread group" entry in /proc
In some cases, it may be waiting for a long time. A typical example of this would be where the executable or mmap'd file is on a network filesystem where the server has failed. If the I/O eventually succeeds, the task will continue. If it eventually fails, the task will generally get a SIGBUS or something.
To your 3rd question:
I think you can kill the uninterruptable processes by running
sudo kill -HUP 1.
It will restart init without ending the running processes and after running it, my uninterruptable processes were gone.
If you are talking about a "zombie" process (which is designated as "zombie" in ps output), then this is a harmless record in the process list waiting for someone to collect its return code and it could be safely ignored.
Could you please describe what and "uninterruptable process" is for you? Does it survives the "kill -9 " and happily chugs along? If that is the case, then it's stuck on some syscall, which is stuck in some driver, and you are stuck with this process till reboot (and sometimes it's better to reboot soon) or unloading of relevant driver (which is unlikely to happen). You could try to use "strace" to find out where your process is stuck and avoid it in the future.

Resources