Bash script dd and fsck not killed - linux

I need to remove a lv which is under use from a script which is doing a dd, fcsk and some more tasks on some peer node.so am just killing the script and trying to remove the lv from that peer, but it seems not working with error open lvs.
but for the same case if i kill the dd then removing the lv its working fine.
may be am asking a silly question but need to know why so??

If you kill your script sending SIGTERM or SIGKILL to its PID, normally you would send the signal only to the parent process. dd is running in a child process which won't receive the signal. Once the parent is killed, the child is inherited by init and it will keep on running.
To send the signal to the whole process group use:
kill -- -PID
or
kill -9 -PID
where PID is the PID of your script. Note the minus sign prepended to PID.
From man kill
-n where n is larger than 1. All processes in process group n are signaled.
Example
I am running dd from a shell script:
PID TGID TID PGID PPID COMMAND
5828 5828 5828 5828 20127 sh
5829 5829 5829 5828 5828 dd
The Process Group ID (PGID) is the PID of the parent (5828).
If I run the following command:
kill -9 5828
I obtain the following situation:
PID TGID TID PGID PPID COMMAND
5829 5829 5829 5828 1 dd
dd is still running and it has been inherited by init (PPID is 1).
If instead I run:
kill -9 -5828
both the script and dd are killed.
EDIT: The process you want to kill is launched remotely via ssh.
Having ssh launch dd/fsck remotely changes everything. A simple but not recommended solution could be the following.
A script launches dd/fsck remotely via ssh in the background and gets the PID.
remote_pid=$(ssh user#host 'dd if=/dev/urandom of=/dev/zero & echo $!')
Then the script must not return and trap your signal. Your handler opens a new ssh connection and signals remote_pid.
Cleaning up in a second ssh connection is not recommended, as it could fail and leave a lot of mess behind.
For a more advanced solution please see https://unix.stackexchange.com/questions/40023/get-ssh-to-forward-signals

Related

kill -s SIGTERM kills parent process and one level child process only

I've been doing some experimenting with writing a command to kill parent and all it's children recursively. I've a script as below
parent.sh:
#!/bin/bash
/home/oracle/child.sh &
sleep infinity
child.sh:
#!/bin/bash
sleep infinity
Started command using
su oracle -c parent.sh &
I see a process tree like below
[root#source ~]# ps -ef | grep "/home/oracle"
root 14129 1171 0 12:39 pts/1 00:00:00 su oracle -c /home/oracle/parent.sh
oracle 14130 14129 0 12:39 ? 00:00:00 /bin/bash /home/oracle/parent.sh
oracle 14131 14130 0 12:39 ? 00:00:00 /bin/bash /home/oracle/child.sh
When I send sigterm to 14129 using kill -s SIGTERM 14129 it appears to kill 14129 and then 14130 goes down as well immediately; but 14131 stays up for a very long time. The last level child appears to have been reparented and has become a zombie.
oracle 14131 1 0 12:39 ? 00:00:00 /bin/bash /home/oracle/child.sh
If kill doesn't terminate any child processes why did 14130 get killed when I sent a SIGTERM to 14129?
If kill can kill child processes, why would does it go only one level down? Is the behavior here guaranteed?
The relevant part of what pilcrow provided is this:
SIGNALS top
Upon receiving either SIGINT, SIGQUIT or SIGTERM, su terminates
its child and afterwards terminates itself with the received
signal.
>> The child is terminated by SIGTERM,
>> (then) after unsuccessful attempt (to kill with SIGTERM) and
>> (after) 2 seconds of delay (,) the child is (then) killed
>> by SIGKILL [a second, harsher method].
That harsher method, SIGKILL, prevents that child process from attempting to kill its own children, hence the zombie state.
I haven't used it myself, but it seems that something like
killall --process-group parent.sh
would kill all processes tied to the process group associated with the "parent.sh" script. BUT ... not sure if "--wait" will serve you well, if the method used in the attempt to terminate is not being accepted.

How to close a dd process that was started in a dash script with -SIGINT?

When I start a dd process directly from the Terminal with
dd if=/dev/zero of=/dev/null &
command, and send to it a -SIGINT with
kill -SIGINT <pid>
command, it closes successfully.
But when I start the process from a script
#!/bin/sh
dd if=/dev/zero of=/dev/null &
Then do
kill -SIGINT <pid>
it doesn't affect the process.
I wonder why this is so.
I didn't find any related information on the internet.
POSIX says:
If job control is disabled (see the description of set -m) when the shell executes an asynchronous list, the commands in the list shall inherit from the shell a signal action of ignored (SIG_IGN) for the SIGINT and SIGQUIT signals.
This is likely because Ctrl+C sends a sigint to every process in the group, so without this behavior, any backgrounded processes would unexpectedly be killed when you try to interrupt the main script.

Linux cannot kill a PID: invalid signal

IN ubuntu Amazon EC2 instances with root access
when do
ps -e
The process shows up with a valid PID and process name. The database table also suggests the process is still ongoing.
PID TTY TIME CMD
32194 ? 00:00:00 test
32253 ? 00:00:00 mysql
However, any of the following commands kill the process, but returns nothing or "invalid signal".
top
kill
Type in PID
y
returns "invalid signal"
or
kill -9 PID
kill -s PID
etc.
Could any guru enlighten how to deal with the "ghost jobs"?
Did you use the correct rights to kill the process? With root you should be able to kill the process using either:
$ su -
# kill -9 PID
or
$ sudo kill -9 PID
You have the id of the process, say 32194, I suggest you run:
pgrep -l a | grep 32194
If the process name has an 'a', the output will show the line and the process name. if not have an 'a', change the letter by other one.
When the process appear, just kill it with:
pkill <process name>

In unix I used kill command by providing a ppid then it close the terminal . why? kill -9 ppid

sleep 5000
In one terminal and in second terminal I'm running:
ps -ef | grep sleep
Then I'm killing this process in second terminal by using the ppid. Then it will close the first terminal where I run the sleep command. It will not create sleep command as an orphan.
$ ps -ef | grep sleep
trainee 4887 4864 0 17:05 pts/0 00:00:00 sleep 5000
trainee 4889 4264 0 17:05 pts/1 00:00:00 grep --color=auto sleep
kill -9 4864
Why?
Presumably the parent of the sleep is your shell. When you kill that your login is terminated and your terminal closes.
The Wikipedia article on Orphan process reads (in part),
An orphan process is a computer process whose parent process has finished or terminated, though it remains running itself.
and
A process can be orphaned unintentionally, such as when the parent process terminates or crashes. The process group mechanism in most Unix-like operation systems can be used to help protect against accidental orphaning, where in coordination with the user's shell will try to terminate all the child processes with the SIGHUP process signal, rather than letting them continue to run as orphans.

How to get the process ID to kill a nohup process?

I'm running a nohup process on the server. When I try to kill it my putty console closes instead.
this is how I try to find the process ID:
ps -ef |grep nohup
this is the command to kill
kill -9 1787 787
When using nohup and you put the task in the background, the background operator (&) will give you the PID at the command prompt. If your plan is to manually manage the process, you can save that PID and use it later to kill the process if needed, via kill PID or kill -9 PID (if you need to force kill). Alternatively, you can find the PID later on by ps -ef | grep "command name" and locate the PID from there. Note that nohup keyword/command itself does not appear in the ps output for the command in question.
If you use a script, you could do something like this in the script:
nohup my_command > my.log 2>&1 &
echo $! > save_pid.txt
This will run my_command saving all output into my.log (in a script, $! represents the PID of the last process executed). The 2 is the file descriptor for standard error (stderr) and 2>&1 tells the shell to route standard error output to the standard output (file descriptor 1). It requires &1 so that the shell knows it's a file descriptor in that context instead of just a file named 1. The 2>&1 is needed to capture any error messages that normally are written to standard error into our my.log file (which is coming from standard output). See I/O Redirection for more details on handling I/O redirection with the shell.
If the command sends output on a regular basis, you can check the output occasionally with tail my.log, or if you want to follow it "live" you can use tail -f my.log. Finally, if you need to kill the process, you can do it via:
kill -9 `cat save_pid.txt`
rm save_pid.txt
I am using red hat linux on a VPS server (and via SSH - putty), for me the following worked:
First, you list all the running processes:
ps -ef
Then in the first column you find your user name; I found it the following three times:
One was the SSH connection
The second was an FTP connection
The last one was the nohup process
Then in the second column you can find the PID of the nohup process and you only type:
kill PID
(replacing the PID with the nohup process's PID of course)
And that is it!
I hope this answer will be useful for someone I'm also very new to bash and SSH, but found 95% of the knowledge I need here :)
suppose i am running ruby script in the background with below command
nohup ruby script.rb &
then i can get the pid of above background process by specifying command name. In my case command is ruby.
ps -ef | grep ruby
output
ubuntu 25938 25742 0 05:16 pts/0 00:00:00 ruby test.rb
Now you can easily kill the process by using kill command
kill 25938
jobs -l should give you the pid for the list of nohup processes.
kill (-9) them gently.
;)
You could try
kill -9 `pgrep [command name]`
Suppose you are executing a java program with nohup you can get java process id by
`ps aux | grep java`
output
xxxxx 9643 0.0 0.0 14232 968 pts/2
then you can kill the process by typing
sudo kill 9643
or lets say that you need to kill all the java processes then just use
sudo killall java
this command kills all the java processes. you can use this with process. just give the process name at the end of the command
sudo killall {processName}
If your application always uses the same port, you can kill all the processes in that port like this.
kill -9 $(lsof -t -i:8080)
This works in Ubuntu
Type this to find out the PID
ps aux | grep java
All the running process regarding to java will be shown
In my case is
johnjoe 3315 9.1 4.0 1465240 335728 ? Sl 09:42 3:19 java -jar batch.jar
Now kill it kill -9 3315
The zombie process finally stopped.
when you create a job in nohup it will tell you the process ID !
nohup sh test.sh &
the output will show you the process ID like
25013
you can kill it then :
kill 25013
I started django server with the following command.
nohup manage.py runserver <localhost:port>
This works on CentOS:
:~ ns$netstat -ntlp
:~ ns$kill -9 PID
This works for mi fine on mac
kill -9 `ps -ef | awk '/nohup/{ print \$2 }'`
I often do this way. Try this way :
ps aux | grep script_Name
Here, script_Name could be any script/file run by nohup.
This command gets you a process ID. Then use this command below to kill the script running on nohup.
kill -9 1787 787
Here, 1787 and 787 are Process ID as mentioned in the question as an example.
This should do what was intended in the question.
If you are unaware of the PID, then first find it using TOP command
top -U userid
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
You will get the PID using top, then perform the kill operation.
$ kill -9 <PID>
Today I met the same problem. And since it was a long time ago, I totally forgot which command I used and when. I tried three methods:
Using the STIME shown in ps -ef command. This shows the time you start your process, and it's very likely that you nohup you command just before you close ssh(depends on you) . Unfortunately I don't think the latest command is the command I run using nohup, so this doesn't work for me.
Second is the PPID, also shown in ps -ef command. It means Parent Process ID, the ID of process that creates the process. The ppid is 1 in ubuntu for process that using nohup to run. Then you can use ps --ppid "1" to get the list, and check TIME(the total CPU time your process use) or CMD to find the process's PID.
Use lsof -i:port if the process occupy some ports, and you will get the command. Then just like the answer above, use ps -ef | grep command and you will get the PID.
Once you find the PID of the process, then can use kill pid to terminal the process.
About losing your putty: often the ps ... | awk/grep/perl/... process gets matched, too! So the old school trick is like this
ps -ef | grep -i [n]ohup
That way the regex search doesn't match the regex search process!
if you are on a remote server, check memory usage with top , and find your process and its ID. After that, just execute kill [your process ID] .

Resources