I want to output "<PID> Killed ~" to logfile when it kill -9 <PID> - linux

I want to output this message /usr/local/ex1.sh: line xxx: <PID> Killed ex2.sh >> $LOG_FILE 2>&1 to logfile.
however
The "ex1.sh" output /usr/local/ex1.sh: line xxx: <PID> Killed ex2.sh >> $LOG_FILE 2>&1 to console when I executed ex1.sh in console.
The result that i want is that "ex1.sh" output to file, not that output to console.
This source is "ex1.sh".
ex2.sh >> $LOG_FILE 2>&1 &
PID=`ps -ef | grep ex2.sh | grep -v grep | gawk '{print $2}'`
/bin/kill -9 $PID >> $LOG_FILE 2>&1 &
Why does "ex1.sh" output this message to console?

The reason is that message '/usr/local/ex1.sh: line xxx: <PID> Killed ex2.sh >> $LOG_FILE 2>&1 is given by bash shell, not by kill command.
So if you redirect kill command output to a file, you will not get the message in the file.
If running like ./ex1.sh >> $LOG_FILE 2&>1, the message will be in the log file. Because ./ex1.sh forks a new bash process, the bash process will give out the message.

The output is in fact not written by the kill command or ex2.sh. It is written by the shell executing the background process ex2.sh.
The shell executing the script started the script ex2.sh in the background as a child process and is monitoring it. When the script is killed, the shell acts on this by printing the message.
In your special case the shell knows more about the killed process and the process executing kill. So it prints a rather verbose message.
If you start ex2.sh (without '&') in terminal 1 and kill it from terminal 2, the shell in terminal 1 will just print "Killed".

Related

`nohup` issue with submitting `SLURM` job

I have a python code main.py that runs bash script, the bash script inturn submits a job job.bash and obtains its JOBID using echo $JOBID | awk {'print $4'}. If I run python in the terminal, the bash script works and I am able to obtain and echo the JOBID as follows:
#!/bin/bash
JOBID=`sbatch ~/job.bash | tee output.log`
JOBID=`echo $JOBID | awk {'print $4'}`
echo $JOBID
Running above as part of python works in terminal python main.py, but doing nohup python main.py &, the echo does not print or store JOBID.
Any reason for this?
I am submitting a slurm job hence the JOBID is the pid from slurm
(Update Jul 17) Looks like the issue is with the command sbatch ~/job.bash | tee output.log, it doesnt get submitted using nohup and hence JOBID never gets stored and echo'd.
(Update Jul 18) As per the comments from #pynexj adding set -x in the script results:
nohup: ignoring input and redirecting stderr to stdout
+ date
Mon Jul 18 21:46:35 +03 2022
++ sbatch ~/job.bash
++ tee output.log
+ JOBID=
++ echo
++ awk '{print $4}'
+ JOBID=
+ echo
The issue still persists. It appears that nohup is incompatible with sbatch.
Question: Why should nohup prevent submission of slurm job? Its objective is merely to capture terminate signal?
If this problem only happens with nohup present, you can get the benefits of nohup without actually using it with:
yourscript </dev/null >file.log 2>&1 & disown -h "$!"
This does the following:
Redirects stdin from /dev/null with </dev/null
Redirects stdout and stderr to a log file with >file.log 2>&1
Tells the shell not to forward HUP signals to the background process with disown -h "$!"
...which is everything nohup does.

How to kill a process by reading from pid file using bash script in Jenkins?

Inside Jenkins, I have to run 2 separate scripts: start.sh and stop.sh. These scripts are inside my application which is fetched from a SCM . They are inside same directory.
The start.sh script runs a process in the background using nohup, and writes the processId to save_pid.pid. This script works fine. It successfully starts my application.
Then inside stop.sh, I am trying to read the processId from save_pid.pid to delete the process. But,I am unable to delete the process and the application keeps running until I kill the process manually using: sudo kill {processId}.
Here are the approaches that I have tried so far inside stop.sh but none of these work:
kill $(cat /path/to/save_pid.pid)
kill `cat /path/to/save_pid.pid`
kill -9 $(cat /path/to/save_pid.pid)
kill -9 `cat /path/to/save_pid.pid`
pkill -F /path/to/save_pid.pid
I have also tried all of these steps with sudo as well. But, it just doesn't work. I have kept an echo statement inside stop.sh, which prints and then there is nothing.
What am I doing wrong here ?
UPDATE:
The nohup command that I am using inside start.sh is something like this:
nohup deploy_script > $WORKSPACE/app.log 2>&1 & echo $! > $WORKSPACE/save_pid.pid
Please Note:
In my case, the value written inside save_pid.pid is surprisingly
always less by 1 than the value of actual processId. !!!
I think the reason why this happens is because you are not getting the PID of the process that you are interested in, but the PID of the shell executing your command.
Look:
$ echo "/bin/sleep 10" > /tmp/foo
$ chmod +x /tmp/foo
$ nohup /tmp/foo & echo $!
[1] 26787
26787
nohup: ignoring input and appending output to 'nohup.out'
$ pgrep sleep
26789
So 'nohup' will exec the 'shell', the 'shell' will fork a second 'shell' to exec 'sleep' in, however I can only count two processes here, so I am unable to account for one created PID.
Note that, if you put the nohup and the pgrep on one line, then pgrep will apparently be started faster than the shell that 'exec's 'sleep' and thus pgrep will yield nothing, which somewhat confirms my theory:
$ nohup /tmp/foo & echo $! ; pgrep sleep
[2] 26899
nohup: ignoring input and appending output to 'nohup.out'
$
If you launch your process directly, then nohup will "exec" your process and thus keep the same PID for the process as nohup itself had (see http://sources.debian.net/src/coreutils/8.23-4/src/nohup.c/#L225):
$ nohup /bin/sleep 10 & echo "$!"; pgrep sleep
[1] 27130
27130
nohup: ignoring input and appending output to 'nohup.out'
27130
Also, if you 'exec' 'sleep' inside the script, then there's only one process that's created (as expected):
$ echo "exec /bin/sleep 10" > /tmp/foo
$ nohup /tmp/foo & echo "$!"; pgrep sleep
[1] 27309
27309
nohup: ignoring input and appending output to 'nohup.out'
27309
Thus, according to my theory, if you'd 'exec' your process inside the script, then you'd be getting the correct PID.

Bash: Killing all processes in subprocess

In bash I can get the process ID (pid) of the last subprocess through the $! variable. I can then kill this subprocess before it finishes:
(sleep 5) & pid=$!
kill -9 $pid
This works as advertised. If I now extend the subprocess with more commands after the sleep, the sleep command continues after the subprocess is killed, even though the other commands never get executed.
As an example, consider the following, which spins up a subprocess and monitor its assassination using ps:
# Start subprocess and get its pid
(sleep 5; echo done) & pid=$!
# grep for subprocess
echo "grep before kill:"
ps aux | grep "$pid\|sleep 5"
# Kill the subprocess
echo
echo "Killing process $pid"
kill -9 $pid
# grep for subprocess
echo
echo "grep after kill:"
ps aux | grep "$pid\|sleep 5"
# Wait for sleep to finish
sleep 6
# grep for subprocess
echo
echo "grep after sleep is finished:"
ps aux | grep "$pid\|sleep 5"
If I save this to a file named filename and run it, I get this printout:
grep before kill:
username 7464 <...> bash filename
username 7466 <...> sleep 5
username 7467 <...> grep 7464\|sleep 5
Killing process 7464
grep after kill:
username 7466 <...> sleep 5
username 7469 <...> grep 7464\|sleep 5
grep after sleep is finished:
username 7472 <...> grep 7464\|sleep 5
where unimportant information from the ps command is replaced with <...>. It looks like the kill has killed the overall bash execution of filename, while leaving sleep running.
How can I correctly kill the entire subprocess?
You can set a trap in the subshell to kill any active jobs before exiting:
(trap 'kill $(jobs -p)' EXIT; sleep 5; echo done ) & pid=$!
I don't know exactly why that sleep process gets orphaned, anyway instead kill you can use pkill with -P flag to also kill all children
pkill -TERM -P $pid
EDIT:
that means that in order to kill a process and all it's children you should use instead
CPIDS=`pgrep -P $pid` # gets pids of child processes
kill -9 $pid
for cpid in $CPIDS ; do kill -9 $cpid ; done
You can have a look at rkill that seems to meet your requirements :
http://www.unix.com/man-page/debian/1/rkill/
rkill [-SIG] pid/name...
When invoked as rkill, this utility does not display information about the processes, but
sends them all a signal instead. If not specified on the command line, a terminate
(SIGTERM) signal is sent.

How to send signal to a bash script from another shell

I start the following script which I run in a bash shell(let's say shell1) in foreground and from another shell(shell2) I send the kill -SIGUSR1 pidof(scriptA). Nothing happens. What am I doing wrong ? I tried other signals(SIGQUIT etc) but the result is same.
test_trap.sh
function iAmDone { echo "Trapped Signal"; exit 0 }
trap iAmDone SIGUSR1
echo "Running... "
tail -f /dev/null # Do nothing
In shell1
./test_trap.sh
In shell2
kill -SIGUSR1 ps aux | grep [t]est_trap | awk '{print $2}'
The trap is not executed until tail finishes. But tail never finishes. Try:
tail -f /dev/null &
wait
The trap will execute without waiting for tail to complete, but if you exit the tail will be left running. So you'll probably want a kill $! in the trap.

Can i wait for a process termination that is not a child of current shell terminal?

I have a script that has to kill a certain number of times a resource managed by a high avialability middelware. It basically checks whether the resource is running and kills it afterwards, i need the timestamp of when the proc is really killed. So i have done this code:
#!/bin/bash
echo "$(date +"%T,%N") :New measures Run" > /home/hassan/logs/measures.log
for i in {1..50}
do
echo "Iteration: $i"
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
if [ -n "$PID" ]; then
echo "$(date +"%T,%N") :Killing $PID" >> /home/hassan/logs/measures.log
ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print "kill -9 " $2'} | sh
wait $PID
else
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
until [ -n "$PID" ]; do
sleep 2
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
done
fi
done
But with my wait command i get the following error message: wait: pid xxxx is not a child of this shell
I assume that You started the child processes from bash and then start this script to wait for. The problem is that the child processes are not the children of the bash running the script, but the children of its parent!
If You want to launch a script inside the the current bash You should start with ..
An example. You start a vim and then You make is stop pressing ^Z (later you can use fg to get back to vim). Then You can get the list of jobs by using the˙jobs command.
$ jobs
[1]+ Stopped vim myfile
Then You can create a script called test.sh containing just one command, called jobs. Add execute right (e.g. chmod 700 test.sh), then start it:
$ cat test.sh
jobs
~/dev/fi [3:1]$ ./test.sh
~/dev/fi [3:1]$ . ./test.sh
[1]+ Stopped vim myfile
As the first version creates a new bash session no jobs are listed. But using . the script runs in the present bash script having exactly one chold process (namely vim). So launch the script above using the . so no child bash will be created.
Be aware that defining any variables or changing directory (and a lot more) will affect to your environment! E.g. PID will be visible by the calling bash!
Comments:
Do not use ...|grep ...|grep -v ... |awk --- pipe snakes! Use ...|awk... instead!
In most Linux-es you can use something like this ps -o pid= -C pcmAppBin to get just the pid, so the complete pipe can be avoided.
To call an external program from awk you could try system("mycmd"); built-in
I hope this helps a bit!

Resources