Problems with killing jobs

Problems with killing jobs - linux

I would like to kill a job serial, which contains several calculations. With the command 'kill PID', where PID refers to process ID, the currently running calculation cancels, but the process has not been stopped. Instead, the next calculation starts, but I would like to kill the entire job, the entire process.

The kill -9 <pid> should do the job. Unfortunately this might not always work if program is poorly programmed you might be unable to kill it.

Related

Bash: Is it possible to stop a PID from being reused?

Is it possible to stop a PID from being reused?
For example if I run a job myjob in the background with myjob &, and get the PID using PID=$!, is it possible to prevent the linux system from re-using that PID until I have checked that the PID no longer exists (the process has finished)?
In other words I want to do something like:
myjob &
PID=$!
do_not_use_this_pid $PID
wait $PID
allow_use_of_this_pid $PID
The reasons for wanting to do this do not make much sense in the example given above, but consider launching multiple background jobs in series and then waiting for them all to finish.
Some programmer dude rightly points out that no 2 processes may share the same PID. That is correct, but not what I am asking here. I am asking for a method of preventing a PID from being re-used after a process has been launched with a particular PID. And then also a method of re-enabling its use later after I have finished using it to check whether my original process finished.
Since it has been asked for, here is a use case:
launch multiple background jobs
get PID's of background jobs
prevent PID's from being re-used by another process after background job terminates
check for PID's of "background jobs" - ie, to ensure background jobs finish
[note if disabled PID re-use for the PID's of the background jobs those PIDs could not be used by a new process which was launched after a background process terminated]*
re-enable PID of background jobs
repeat
*Further explanation:
Assume 10 jobs launched
Job 5 exits
New process started by another user, for example, they login to a tty
New process has same PID as Job 5!
Now our script checks for Job 5 termination, but sees PID in use by tty!

You can't "block" a PID from being reused by the kernel. However, I am inclined to think this isn't really a problem for you.
but consider launching multiple background jobs in series and then waiting for them all to finish.
A simple wait (without arguments) would wait for all the child processes to complete. So, you don't need to worry about the
PIDs being reused.
When you launch several background process, it's indeed possible that PIDs may be reused by other processes.
But it's not a problem because you can't wait on a process unless it's your child process.
Otherwise, checking whether one of the background jobs you started is completed by any means other than wait is always going to unreliable.

Unless you've retrieved the return value of the child process it will exist in the kernel. That also means that it's pid is bound to it and can't being re-used during that time.

Further suggestion to work around this - if you suspect that a PID assigned to one of your background jobs is reassigned, check it in ps to see if it still is your process with your executable and has PPID (parent PID) 1.

If you are afraid of reusing PID's, which won't happen if you wait as other answers explain, you can use
echo 4194303 > /proc/sys/kernel/pid_max
to decrease your fear ;-)

Can not kill process by kill -9?

I try to kill process by using command "kill -9 pid", but can not succeed. Anybody know how could I kill such process and why I can't kill it ?

The process could be zombie? Its good to check process state using ps command as well if you have permission.

If your process is in an uninterruptable sleep (D) due to hanging in some hardware access, you indeed cannot terminate that process.
Here is another explanation.
Personally, I saw such D states for example when accessing files on a SD card or USB stick when there was a hardware problem. But there are many other scenarios where such a state might occur.

What special precautions must I make for docker apps running as pid 1?

From what I gather, programs that run as pid 1 may need to take special precautions such as capturing certain signals.
It's not altogether clear how to correctly write a pid 1. I'd rather not use runit or supervisor in my case. For example, supervisor is written in python and if you install that, it'll result in a much larger container. I'm not a fan of runit.
Looking at the source code for runit is intersting but as usual, comments are virtually non-existent and don't explain what's being done for what reason.

There is a good discussion here:
When the process with pid 1 die for any reason, all other processes
are killed with KILL signal
When any process having children dies for any reason, its children are reparented to process with PID 1
Many signals which have default action of Term do not have one for PID 1.
The relevant part for your question:
you can’t stop process by sending SIGTERM or SIGINT, if process have not installed a signal handler

Why child process still alive after parent process was killed in Linux?

Someone told me that when you killed a parent process in linux, the child would die.
But I doubt it. So I wrote two bash scripts, where father.shwould invoke child.sh
Here is my script:
Now I run bash father.sh, you could check it ps -alf
Then I killed the father.sh by kill -9 24588, and I guessed the child process should be terminated but unfortunately I was wrong.
Could anyone explain why?
thx

No, when you kill a process alone, it will not kill the children.
You have to send the signal to the process group if you want all processes for a given group to receive the signal
For example, if your parent process id has the code 1234, you will have to specify the parentpid adding the symbol minus followed by your parent process id:
kill -9 -1234
Otherwise, orphans will be linked to init, as shown by your third screenshot (PPID of the child has become 1).

-bash: kill: (-123) - No such process
In an interactive Terminal.app session the foreground process group id number and background process group id number are different by design when job control/monitor mode is enabled. In other words, if you background a command in a job-control enabled Terminal.app session, the $! pid of the backgrounded process is in fact a new process group id number (pgid).
In a script having no job control enabled, however, this may not be the case! The pid of the backgrounded process may not be a new pgid but a normal pid! And this is, what causes the error message -bash: kill: (-123) - No such process, trying to kill a process group but only specifying a normal pid (instead of a pgid) to the kill command.
# the following code works in Terminal.app because $! == $pgid
{
sleep 100 &
IFS=" " read -r pgid <<EOF
$(ps -p $! -o pgid=)
EOF
echo $$ $! $pgid
sleep 10
kill -HUP -- -$!
#kill -HUP -- -${pgid} # use in script
}

pkill -TERM -P <ProcessID>
This will kill both Parent as well as child

Generally killing the parent also kills the child.
The reason that you are seeing the child still alive after killing the father is because the child only will die after it "chooses" (the kernel chooses) to handle the SIGKILL event. It doesn't have to handle it right away. Your script is running a sleep() command (i.e. in the kernel), which will not wake up to handle any events whatsoever until the sleep is completed.
Why is PPID #1? The parent has died and is no longer in the process table. child.sh isn't linked inexplicably to init now. It simply has no running parent. Saying it is linked to init creates the impression that if we somehow leave init, that init has control over shutting down the process. It also creates the impression that killing a parent will make the grandparent the owner of a child. Both are not true. That child process still exists in the process table and is running, but no new events based upon it's process ID will be handled until it handles SIGKILL. Which means that the child is a pre-zombie, walking dead, in danger of being labeled .
Killing in the process group is different, and is used to kill the siblings, and the parent by the process group #. It's probably also important to note that "killing a process" is not "killing" per se, in the human way, where you expect the process to be destroyed and all memory returned as though it never was. It just sends a particular event, among many, to the process for it to handle. If the process does not handle it properly, then after a while the OS will come along and "clean it up" forcibly.
It (killing) doesn't happen right away because the child (or even the parent) could have written something to disk and be waiting for I/O to complete or doing some other critical task that could compromise system stability or file integrity.

Are there suspend\resume signals in Linux?

My application needs to react on hibernation mode so it can do some action on suspending and other actions on resuming. I've found some distributive-specific ways to achieve it(Upower + DBus) but didn't find anything universal. Is there a way to do it?
Thanks!

A simple solution to this is to use a self-pipe. Open up a pipe and periodically write timestamps to it. select on this pipe to read the timestamps and compare them to the current time. When there is a big gap, that means you have just woken up from system suspension or hibernate mode.
As for the other way around, there is not much time when the lid is closed and it flips the switch.
If you really need to act on suspend, then you will need to set powersave hooks like this https://help.ubuntu.com/community/PowerManagement/ReducedPower in pm-utils. It could be as simple as
kill -1 `cat mypid` ; sleep 1
Your process would then trap SIGHUP and do what needs to be done to prepare for suspension. The sleep delays the process long enough for your program to react to the signal.

I believe you are looking for SIGSTOP and SIGCONT signals. You can send these to a running process like so:
kill -STOP pid
sleep 60
kill -CONT pid

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string